language-learning-app/api/docs/design-articles.md

52 lines
3 KiB
Markdown

# Technical Design Doc: Articles
> You may wish to review documentation about [architecture](./architecture.md) and the [domain](./domain.md) of this application to help make sense of this document.
An Article represents a single piece of content that a learner can read and/or listen to. It might be, for example, a 300-word fictional piece about a baker in Lyon, or it could be a 500-word summary of recent news events.
Articles will be accompanies by a (AI-generated) text-to-speech.
Because this is a language-learning app, Articles will be authored in one language (e.g. French) and there will be a parallel set of content in another language (e.g. English).
Not every learner will have access to every Article at the same time. For example, learners who are studying French won't access Italian language Articles. Intermediate French learners won't access advanced or basic French language Articles.
Because Articles can be available in Audio, Articles will also form the basis of a podcast-style RSS feed for each learner. Allowing them to _just_ listen.
The Article is therefore the primitive of the content, but not how a learner will access, or receive, their content. There will need to be a separate piece of architecture which makes Articles available to the learner, e.g. through a daily or weekly "edition" of content from the website (similar to a newspaper)
A separate role of Users will author, edit, and publish Articles - using a traditional CMS-like interface. Articles can therefore be in a _draft_ state, before they are published. Articles are also versioned entities, i.e. if I wish to make a change to an article, as an author, I would log in, make that change, and then click "update" or "publish", which would then kick off an async process to replace the previous article version with the new one. Primarily this is because of the audio-generation pipeline of an Article.
## Foundational data model
In the interest of delivering value incrementally, as opposed to "all at once", let's create the following entities:
The Article entity is the Header that contains a reference to the content, describing the article itself:
```json
{
"id": "article_id",
"source_language": "fr",
"target_language": "en",
"title": "Le boulangerie",
"subtitle": null, // nullable string
"subject_tags": ["fiction", "france"],
"length_descriptor": "short" // short,medium,long,
}
```
And this is the "record" row of the Article, which we could call the ArticleVersion:"
```json
{
"id": "some-uuid",
"article_id": "article_id",
"created_at": "2026-04-22T19:00Z",
"published_at": "2026-04-24T09:00Z", // nullable, if not published,
"deleted_at": null,
"source_language_markdown_text": "This is where the article is",
"target_language_markdown_text": "voila la langue franciase", // nullable if not generated
"source_language_natural_language_data": {..}, // nullable, output from SpaCy tokensation
"target_language_natural_language_data": {..}, // nullable, output from SpaCy tokensation
"source_language_audio_url": "http://", // nullable if not generated
}
```