language-learning-app/api/docs/design-doc-choose-your-own-adventure.md

101 lines
5.9 KiB
Markdown
Raw Permalink Normal View History

# Feature design doc: Choose your own adventure
This is a semi-technical design document to detail some enhancements to the _Choose Your Own Adventure_ functionality of the Langauge Learning App.
## Purpose
Introduce structured linguistic data around entries in a "choose your own adventure", the purpose of which is to create a structured pathway from the user reading/listening to an entry, and then putting words in their vocab bank / word bank, and possibly then creating flashcards around them, to help them learn the words.
## Feature Description
The app already has the idea of an Adventure (i.e. a single story), for which there are many Entries, each of which have Possible Choices (4, for now), which the user selects and then the story continues to be generated.
Entries are generated by a LLM (Claude), have a translation text generated by the DeepL translator, and are converted to audio by another LLM (Google's Gemini).
You are to change the functionality to:
1. Use the SpaCy natural language processing to break downt he generated (i.e. foreign language) text for an entry into their parts of speech and their sentences.
2. We are to translate these sentenses one at a time, and then the results from that translation are passed into the same SpaCy pipeline.
3. We need to end up with a data structure of `paragraphs` each of which has 1..n `sentences`, and the tokens (words) in that sentence have gone through the part-of-speech tagging system, as well as lemmatisation (these are already configured with how SpaCy is used elsewhere).
4. This structured data should be stored alongside the full-text as they are currently generated in the API, i.e. we need both the structured linguistic data as well as the original body text.
## Technical components
The `AdventureService` (`/app/domain/service/adventure_service.py`) contains a method called `run_entry_pipeline` - this is the highly asynchronous orchestrator of calls to various external parties (e.g. LLMs, translators, TTS), we should use this existing entrypoint to run the code.
We will need to inject a `SpacyClient` (`app/outbound/spacy/spacy_client`) into the `AdventureService`
After the generation of the text (through the call to `anthropic_client.complete` in that method) we should (at a relevant point)
Running the NLP pipeline in SpaCy won't get us the paragraphs, so we may need to split the incoming raw text by the `\n\n` separator, and then call the pipeline on each paragraph in turn.
We will therefore need a JSON new field on the `AdventureEntryEntity`, which I think we should call `story_text_linguistic_data`, which should look like the following:
```json
{
"source_language": "en",
"target_language": "fr",
"paragraphs": [
"index": 0,
"source_text": "\""Since forever, no? It's normal. Everyone is together here.\"",
"target_text": "« Depuis toujours, non ? C'est normal. Tout le monde est ensemble ici. »",
"sentences": [
{
"index": 0,
"source_text": "\"Since forever, no?",
"target_text": "« Depuis toujours, non ?",
"target_tokens": [..],
"source_tokens": [..],
},
{
"index": 1,
"source_text": "It's normal",
"target_text": "C'est normal.",
"target_tokens": [..],
"source_tokens": [..],
},
{
"index": 2,
"source_text": "Everyone is together here.\"",
"target_text": "Tout le mond est ensemble ici. »",
"target_tokens": [..],
"source_tokens": [..],
},
]
]
}
```
Where the `tokens` fields are the same data structure as the tags specified in the `get_parts_of_speech` method in the SpacyClient.
We will then need to feed this data through to the front-end, which will use it to create a more structured set of data in the UI, which will aid in creating a better "translate" experience (i.e. click on a single word in the target language, and go to the relevant word(s) in the source language; be able to add words from that translation via a more automated pathway, with the option for manual intervention; linking of words with their dictionary entries, which we have)
This may have an impact on performance, can we therefore introduce a simple tracing mechanism into the `run_entry_pipeline` method, to give visibility about how long it take (in seconds) to run each individual step. Can we store this as JSON in the `AdventureEntryEntity`, so we'll need to createa migration to create those fields, I imagine some data that looks like:
```json
{
"durations": {
"text_generation": 10, // for the text itself
"translations_total": 5, // for all calls to DeepL combined,
"nlp_total": 7, // for all run s of SpaCy
"tts": 15, // call to generate the audio file
"file_uploading": 1 // To upload the .wav
}
}
```
## IGNORE: Monetisation and payment strategy
The following text is present as a reminder to me, to consider how adventure generation fits into the monetisation
See the [pricing.md](./design-doc-pricing.md) doc for more info.
The use of LLMs creates a cost on Language Learning App per entry that is generated (initial generation, translation, text-to-speech). This will likely be as high as 50-60p per adventure, per user this could add up to a lot of money.
Users who wish to operate on the subscription model will get a certain number of Adventure entries per subscription period. We should round this up to the nearest adventure (you don't want to be waiting for your next renewal to finsih an adventure).
Users on a metered billing will pay for a whole adventure up-front, i.e. aprox. $1.20/adventure.
For this reason, it's very important that the system tracks the costs (in money, and in tokens) taken to generate the content for an adventure, so these figures can be adjusted to reflect reality.