language-learning-app/api/docs/design-doc-choose-your-own-adventure.md

# Feature design doc: Choose your own adventure

This is a semi-technical design document to detail some enhancements to the _Choose Your Own Adventure_ functionality of the Langauge Learning App.

## Purpose

Introduce structured linguistic data around entries in a "choose your own adventure", the purpose of which is to create a structured pathway from the user reading/listening to an entry, and then putting words in their vocab bank / word bank, and possibly then creating flashcards around them, to help them learn the words.

## Feature Description

The app already has the idea of an Adventure (i.e. a single story), for which there are many Entries, each of which have Possible Choices (4, for now), which the user selects and then the story continues to be generated.

Entries are generated by a LLM (Claude), have a translation text generated by the DeepL translator, and are converted to audio by another LLM (Google's Gemini).

You are to change the functionality to:

1. Use the SpaCy natural language processing to break downt he generated (i.e. foreign language) text for an entry into their parts of speech and their sentences.
2. We are to translate these sentenses one at a time, and then the results from that translation are passed into the same SpaCy pipeline.
3. We need to end up with a data structure of `paragraphs` each of which has 1..n `sentences`, and the tokens (words) in that sentence have gone through the part-of-speech tagging system, as well as lemmatisation (these are already configured with how SpaCy is used elsewhere).
4. This structured data should be stored alongside the full-text as they are currently generated in the API, i.e. we need both the structured linguistic data as well as the original body text.

## Technical components

The `AdventureService` (`/app/domain/service/adventure_service.py`) contains a method called `run_entry_pipeline` - this is the highly asynchronous orchestrator of calls to various external parties (e.g. LLMs, translators, TTS), we should use this existing entrypoint to run the code.

We will need to inject a `SpacyClient` (`app/outbound/spacy/spacy_client`) into the `AdventureService`

After the generation of the text (through the call to `anthropic_client.complete` in that method) we should (at a relevant point)

Running the NLP pipeline in SpaCy won't get us the paragraphs, so we may need to split the incoming raw text by the `\n\n` separator, and then call the pipeline on each paragraph in turn.

We will therefore need a JSON new field on the `AdventureEntryEntity`, which I think we should call `story_text_linguistic_data`, which should look like the following:

```json
{
    "source_language": "en",
    "target_language": "fr",
    "paragraphs": [
        "index": 0,
        "source_text": "\""Since forever, no?  It's normal.  Everyone is together here.\"",
        "target_text": "« Depuis toujours, non ? C'est normal. Tout le monde est ensemble ici. »",
        "sentences": [
            {
                "index": 0,
                "source_text": "\"Since forever, no?",
                "target_text": "« Depuis toujours, non ?",
                "target_tokens": [..],
                "source_tokens": [..],

            },
            {
                "index": 1,
                "source_text": "It's normal",
                "target_text": "C'est normal.",
                "target_tokens": [..],
                "source_tokens": [..],
            },
            {
                "index": 2,
                "source_text": "Everyone is together here.\"",
                "target_text": "Tout le mond est ensemble ici. »",
                "target_tokens": [..],
                "source_tokens": [..],
            },
        ]
    ]
}
```

Where the `tokens` fields are the same data structure as the tags specified in the `get_parts_of_speech` method in the SpacyClient.

We will then need to feed this data through to the front-end, which will use it to create a more structured set of data in the UI, which will aid in creating a better "translate" experience (i.e. click on a single word in the target language, and go to the relevant word(s) in the source language; be able to add words from that translation via a more automated pathway, with the option for manual intervention; linking of words with their dictionary entries, which we have)

This may have an impact on performance, can we therefore introduce a simple tracing mechanism into the `run_entry_pipeline` method, to give visibility about how long it take (in seconds) to run each individual step. Can we store this as JSON in the `AdventureEntryEntity`, so we'll need to createa migration to create those fields, I imagine some data that looks like:

```json
{
  "durations": {
    "text_generation": 10, // for the text itself
    "translations_total": 5, // for all calls to DeepL combined,
    "nlp_total": 7, // for all run s of SpaCy
    "tts": 15, // call to generate the audio file
    "file_uploading": 1 // To upload the .wav
  }
}
```

## IGNORE: Monetisation and payment strategy

The following text is present as a reminder to me, to consider how adventure generation fits into the monetisation

See the [pricing.md](./design-doc-pricing.md) doc for more info.

The use of LLMs creates a cost on Language Learning App per entry that is generated (initial generation, translation, text-to-speech). This will likely be as high as 50-60p per adventure, per user this could add up to a lot of money.

Users who wish to operate on the subscription model will get a certain number of Adventure entries per subscription period. We should round this up to the nearest adventure (you don't want to be waiting for your next renewal to finsih an adventure).

Users on a metered billing will pay for a whole adventure up-front, i.e. aprox. $1.20/adventure.

For this reason, it's very important that the system tracks the costs (in money, and in tokens) taken to generate the content for an adventure, so these figures can be adjusted to reflect reality.
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00			`# Feature design doc: Choose your own adventure`

feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`This is a semi-technical design document to detail some enhancements to the _Choose Your Own Adventure_ functionality of the Langauge Learning App.`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`## Purpose`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`Introduce structured linguistic data around entries in a "choose your own adventure", the purpose of which is to create a structured pathway from the user reading/listening to an entry, and then putting words in their vocab bank / word bank, and possibly then creating flashcards around them, to help them learn the words.`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
			`## Feature Description`

feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`The app already has the idea of an Adventure (i.e. a single story), for which there are many Entries, each of which have Possible Choices (4, for now), which the user selects and then the story continues to be generated.`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`Entries are generated by a LLM (Claude), have a translation text generated by the DeepL translator, and are converted to audio by another LLM (Google's Gemini).`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`You are to change the functionality to:`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`1. Use the SpaCy natural language processing to break downt he generated (i.e. foreign language) text for an entry into their parts of speech and their sentences.`
			`2. We are to translate these sentenses one at a time, and then the results from that translation are passed into the same SpaCy pipeline.`
			3. We need to end up with a data structure of `paragraphs` each of which has 1..n `sentences`, and the tokens (words) in that sentence have gone through the part-of-speech tagging system, as well as lemmatisation (these are already configured with how SpaCy is used elsewhere).
			`4. This structured data should be stored alongside the full-text as they are currently generated in the API, i.e. we need both the structured linguistic data as well as the original body text.`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`## Technical components`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			The `AdventureService` (`/app/domain/service/adventure_service.py`) contains a method called `run_entry_pipeline` - this is the highly asynchronous orchestrator of calls to various external parties (e.g. LLMs, translators, TTS), we should use this existing entrypoint to run the code.
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			We will need to inject a `SpacyClient` (`app/outbound/spacy/spacy_client`) into the `AdventureService`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			After the generation of the text (through the call to `anthropic_client.complete` in that method) we should (at a relevant point)
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			Running the NLP pipeline in SpaCy won't get us the paragraphs, so we may need to split the incoming raw text by the `\n\n` separator, and then call the pipeline on each paragraph in turn.
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			We will therefore need a JSON new field on the `AdventureEntryEntity`, which I think we should call `story_text_linguistic_data`, which should look like the following:
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
			```json
			`{`
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`"source_language": "en",`
			`"target_language": "fr",`
			`"paragraphs": [`
			`"index": 0,`
			`"source_text": "\""Since forever, no? It's normal. Everyone is together here.\"",`
			`"target_text": "« Depuis toujours, non ? C'est normal. Tout le monde est ensemble ici. »",`
			`"sentences": [`
			`{`
			`"index": 0,`
			`"source_text": "\"Since forever, no?",`
			`"target_text": "« Depuis toujours, non ?",`
			`"target_tokens": [..],`
			`"source_tokens": [..],`

			`},`
			`{`
			`"index": 1,`
			`"source_text": "It's normal",`
			`"target_text": "C'est normal.",`
			`"target_tokens": [..],`
			`"source_tokens": [..],`
			`},`
			`{`
			`"index": 2,`
			`"source_text": "Everyone is together here.\"",`
			`"target_text": "Tout le mond est ensemble ici. »",`
			`"target_tokens": [..],`
			`"source_tokens": [..],`
			`},`
			`]`
			`]`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00			`}`
			```

feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			Where the `tokens` fields are the same data structure as the tags specified in the `get_parts_of_speech` method in the SpacyClient.
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`We will then need to feed this data through to the front-end, which will use it to create a more structured set of data in the UI, which will aid in creating a better "translate" experience (i.e. click on a single word in the target language, and go to the relevant word(s) in the source language; be able to add words from that translation via a more automated pathway, with the option for manual intervention; linking of words with their dictionary entries, which we have)`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			This may have an impact on performance, can we therefore introduce a simple tracing mechanism into the `run_entry_pipeline` method, to give visibility about how long it take (in seconds) to run each individual step. Can we store this as JSON in the `AdventureEntryEntity`, so we'll need to createa migration to create those fields, I imagine some data that looks like:
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
			```json
			`{`
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`"durations": {`
			`"text_generation": 10, // for the text itself`
			`"translations_total": 5, // for all calls to DeepL combined,`
			`"nlp_total": 7, // for all run s of SpaCy`
			`"tts": 15, // call to generate the audio file`
			`"file_uploading": 1 // To upload the .wav`
			`}`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00			`}`
			```

feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`## IGNORE: Monetisation and payment strategy`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`The following text is present as a reminder to me, to consider how adventure generation fits into the monetisation`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`See the [pricing.md](./design-doc-pricing.md) doc for more info.`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`The use of LLMs creates a cost on Language Learning App per entry that is generated (initial generation, translation, text-to-speech). This will likely be as high as 50-60p per adventure, per user this could add up to a lot of money.`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`Users who wish to operate on the subscription model will get a certain number of Adventure entries per subscription period. We should round this up to the nearest adventure (you don't want to be waiting for your next renewal to finsih an adventure).`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`Users on a metered billing will pay for a whole adventure up-front, i.e. aprox. $1.20/adventure.`
docs: [api] Add design docs for the Articles concept and the Adventure contepts 2026-05-03 12:33:33 +00:00
feat: [api] Integrate SpaCy into generation step for CYOA entries 2026-05-08 09:58:46 +00:00			`For this reason, it's very important that the system tracks the costs (in money, and in tokens) taken to generate the content for an adventure, so these figures can be adjusted to reflect reality.`