467 lines
19 KiB
Markdown
467 lines
19 KiB
Markdown
|
|
# Technical Design: Persistent Job Queue with Procrastinate
|
|||
|
|
|
|||
|
|
**Status:** Draft — drafted by LLM, reviewed by human developer.
|
|||
|
|
**Scope:** Migration of adventure entry pipeline from in-process `asyncio.Queue` to Procrastinate (PostgreSQL-backed), plus groundwork for future scheduled jobs.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Problem
|
|||
|
|
|
|||
|
|
`app/worker.py` is a plain `asyncio.Queue` running inside the API process. Its limitations:
|
|||
|
|
|
|||
|
|
- **No persistence.** Any enqueued jobs are silently lost if the API process restarts or is redeployed.
|
|||
|
|
- **No retry.** A transient failure (network blip calling Anthropic/DeepL/Gemini) permanently sets the entry status to `'error'`.
|
|||
|
|
- **No scheduling.** We want to run nightly jobs (e.g. news digest generation) on a cron trigger.
|
|||
|
|
- **Contention.** Long-running LLM + TTS + NLP pipelines share the same process and event loop as the HTTP API.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Solution: Procrastinate + separate worker container
|
|||
|
|
|
|||
|
|
[Procrastinate](https://procrastinate.readthedocs.io) is a Python asyncio task queue backed by PostgreSQL. It uses `LISTEN/NOTIFY` for fast job dispatch with a polling fallback. No new infrastructure is needed — the existing PostgreSQL instance is the queue backing store.
|
|||
|
|
|
|||
|
|
A dedicated `worker` Docker container is added to every compose file. It shares the same Docker image as `api` (same `./api` build context) but runs a different command. Both containers connect to the same PostgreSQL instance.
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────┐ defer_async() ┌──────────────────────┐
|
|||
|
|
│ api (FastAPI) │ ────────────────→ │ PostgreSQL │
|
|||
|
|
│ port 8000 │ │ procrastinate_jobs │
|
|||
|
|
└─────────────────────┘ │ procrastinate_events │
|
|||
|
|
└──────────────────────┘
|
|||
|
|
┌─────────────────────┐ LISTEN/NOTIFY + │
|
|||
|
|
│ worker │ ←──────────────── │
|
|||
|
|
│ (Procrastinate) │ polling fallback │
|
|||
|
|
└─────────────────────┘ └
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Files Changed
|
|||
|
|
|
|||
|
|
### New: `api/app/tasks.py`
|
|||
|
|
|
|||
|
|
Single source of truth for all task definitions. Both the API (to _defer_ tasks) and the worker (to _execute_ them) import this module.
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import uuid
|
|||
|
|
import logging
|
|||
|
|
import procrastinate
|
|||
|
|
from procrastinate.contrib.sqlalchemy import SQLAlchemyAsyncConnector
|
|||
|
|
|
|||
|
|
from .config import settings
|
|||
|
|
from .outbound.postgres.database import engine, AsyncSessionLocal
|
|||
|
|
from .outbound.anthropic.anthropic_client import AnthropicClient
|
|||
|
|
from .outbound.deepl.deepl_client import DeepLClient
|
|||
|
|
from .outbound.gemini.gemini_client import GeminiClient
|
|||
|
|
from .outbound.spacy.spacy_client import SpacyClient
|
|||
|
|
from .outbound.postgres.repositories.adventure_repository import (
|
|||
|
|
PostgresAdventureRepository,
|
|||
|
|
PostgresAdventureEntryRepository,
|
|||
|
|
PostgresAdventureEntryChoiceRepository,
|
|||
|
|
PostgresAdventureEntryDecisionRepository,
|
|||
|
|
PostgresAdventureEntryTranslationRepository,
|
|||
|
|
PostgresAdventureEntryAudioRepository,
|
|||
|
|
)
|
|||
|
|
from .domain.services.adventure_service import AdventureService
|
|||
|
|
|
|||
|
|
logger = logging.getLogger(__name__)
|
|||
|
|
|
|||
|
|
procrastinate_app = procrastinate.App(
|
|||
|
|
connector=SQLAlchemyAsyncConnector(engine)
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
|
|||
|
|
def _make_adventure_service(db) -> AdventureService:
|
|||
|
|
"""
|
|||
|
|
Moved here from adventures.py so the worker can construct the service
|
|||
|
|
without importing the router module.
|
|||
|
|
"""
|
|||
|
|
if settings.stub_generation:
|
|||
|
|
from .routers.api.adventures import ( # avoid circular import at module level
|
|||
|
|
_StubAnthropicClient, _StubDeepLClient, _StubGeminiClient, _StubSpacyClient
|
|||
|
|
)
|
|||
|
|
anthropic = _StubAnthropicClient()
|
|||
|
|
deepl = _StubDeepLClient()
|
|||
|
|
gemini = _StubGeminiClient()
|
|||
|
|
spacy = _StubSpacyClient()
|
|||
|
|
else:
|
|||
|
|
anthropic = AnthropicClient.new(settings.anthropic_api_key)
|
|||
|
|
deepl = DeepLClient(settings.deepl_api_key)
|
|||
|
|
gemini = GeminiClient(settings.gemini_api_key)
|
|||
|
|
spacy = SpacyClient()
|
|||
|
|
|
|||
|
|
return AdventureService(
|
|||
|
|
adventure_repo=PostgresAdventureRepository(db),
|
|||
|
|
entry_repo=PostgresAdventureEntryRepository(db),
|
|||
|
|
choice_repo=PostgresAdventureEntryChoiceRepository(db),
|
|||
|
|
decision_repo=PostgresAdventureEntryDecisionRepository(db),
|
|||
|
|
translation_repo=PostgresAdventureEntryTranslationRepository(db),
|
|||
|
|
audio_repo=PostgresAdventureEntryAudioRepository(db),
|
|||
|
|
anthropic_client=anthropic,
|
|||
|
|
deepl_client=deepl,
|
|||
|
|
gemini_client=gemini,
|
|||
|
|
spacy_client=spacy,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
|
|||
|
|
@procrastinate_app.task(queue="adventure_pipeline")
|
|||
|
|
async def generate_adventure_entry(
|
|||
|
|
adventure_id: str, entry_id: str, user_id: str
|
|||
|
|
) -> None:
|
|||
|
|
async with AsyncSessionLocal() as db:
|
|||
|
|
service = _make_adventure_service(db)
|
|||
|
|
await service.run_entry_pipeline(
|
|||
|
|
uuid.UUID(adventure_id),
|
|||
|
|
uuid.UUID(entry_id),
|
|||
|
|
uuid.UUID(user_id),
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**EDITOR'S NOTE** There's no good reason why the stubs should live in `routers.api.adventures`, and therefore risk circular dependencies. They should be moved to the `outbound.SERVICE_NAME` (e.g. `outbound.bunny.stub_bunny_client`). This will involve updating the dependencies in the API router.
|
|||
|
|
|
|||
|
|
**Notes on retry strategy:**
|
|||
|
|
`run_entry_pipeline` currently catches all exceptions internally and writes `status='error'` to the DB — it never raises. From Procrastinate's point of view the job always succeeds, so retry is not wired up in this first pass. This preserves the existing behaviour exactly.
|
|||
|
|
|
|||
|
|
A follow-up improvement (out of scope here) would be to remove the internal catch-all, let exceptions propagate, and configure:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
@procrastinate_app.task(
|
|||
|
|
queue="adventure_pipeline",
|
|||
|
|
retry=procrastinate.RetryStrategy(
|
|||
|
|
max_attempts=3,
|
|||
|
|
wait_minimum=30,
|
|||
|
|
wait_multiplier=2,
|
|||
|
|
wait_jitter=30,
|
|||
|
|
),
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
with an `on_abort` hook responsible for writing the `'error'` status after all attempts are exhausted.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### New: `api/app/worker_main.py`
|
|||
|
|
|
|||
|
|
The worker process entrypoint. The Docker `command` points here.
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
import asyncio
|
|||
|
|
import logging
|
|||
|
|
from . import tasks # side-effect: registers all task definitions
|
|||
|
|
|
|||
|
|
logger = logging.getLogger(__name__)
|
|||
|
|
|
|||
|
|
|
|||
|
|
async def _run() -> None:
|
|||
|
|
async with tasks.procrastinate_app.open_async():
|
|||
|
|
logger.info("Procrastinate worker started, queue=adventure_pipeline")
|
|||
|
|
await tasks.procrastinate_app.run_worker_async(
|
|||
|
|
queues=["adventure_pipeline"],
|
|||
|
|
concurrency=4,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
logging.basicConfig(level=logging.INFO)
|
|||
|
|
asyncio.run(_run())
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Run command (in docker-compose): `python -m app.worker_main`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### New: Alembic migration `YYYYMMDD_0019_add_procrastinate_schema.py`
|
|||
|
|
|
|||
|
|
Procrastinate manages its own schema independently, but we embed it in Alembic to keep all DB changes in one place and ensure it runs as part of `alembic upgrade head`.
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from alembic import op
|
|||
|
|
import procrastinate.contrib.sqlalchemy
|
|||
|
|
|
|||
|
|
def upgrade() -> None:
|
|||
|
|
# Procrastinate provides its DDL via the schema manager.
|
|||
|
|
# Run: `procrastinate schema --app=app.tasks.procrastinate_app print-sql`
|
|||
|
|
# to get the SQL and paste it here, or use:
|
|||
|
|
op.execute(procrastinate.contrib.sqlalchemy.SQLAlchemyAsyncConnector.get_schema_manager().get_schema())
|
|||
|
|
|
|||
|
|
def downgrade() -> None:
|
|||
|
|
op.execute("DROP SCHEMA procrastinate CASCADE;")
|
|||
|
|
# or the equivalent table-by-table drops if procrastinate uses public schema
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Alternative (simpler):** Add `await tasks.procrastinate_app.schema.apply_schema_async()` in `worker_main.py` before starting the worker. This runs Procrastinate's own migration tool, which is idempotent. It's less consistent with the project's Alembic-only convention but simpler to maintain as Procrastinate is upgraded.
|
|||
|
|
|
|||
|
|
**REVIEW NOTE** - yes, let's stick to Alembic, two solutions for migration management would add complexity.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Modified: `api/app/main.py`
|
|||
|
|
|
|||
|
|
Remove the `worker_loop` asyncio task; open the Procrastinate connector in lifespan so that `defer_async` calls from API routes work.
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Before
|
|||
|
|
from . import worker
|
|||
|
|
|
|||
|
|
@asynccontextmanager
|
|||
|
|
async def lifespan(app: FastAPI):
|
|||
|
|
init_storage()
|
|||
|
|
setup_observability(app)
|
|||
|
|
worker_task = asyncio.create_task(worker.worker_loop())
|
|||
|
|
yield
|
|||
|
|
worker_task.cancel()
|
|||
|
|
try:
|
|||
|
|
await worker_task
|
|||
|
|
except asyncio.CancelledError:
|
|||
|
|
pass
|
|||
|
|
|
|||
|
|
# After
|
|||
|
|
from . import tasks
|
|||
|
|
|
|||
|
|
@asynccontextmanager
|
|||
|
|
async def lifespan(app: FastAPI):
|
|||
|
|
async with tasks.procrastinate_app.open_async():
|
|||
|
|
init_storage()
|
|||
|
|
setup_observability(app)
|
|||
|
|
yield
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The `import asyncio` line can be removed from `main.py` if it has no other uses after this change.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Modified: `api/app/routers/api/adventures.py`
|
|||
|
|
|
|||
|
|
Two changes:
|
|||
|
|
|
|||
|
|
1. **Remove** the `_make_service`, `_run_entry_pipeline_task`, and stub client definitions (they move to `tasks.py`). Keep `_make_service` as a thin shim that delegates to `tasks._make_adventure_service` if any remaining synchronous use in the router still needs it (e.g. the `can_translate_to` check in `create_adventure` — this needs a `DeepLClient` instance, which is currently built inline anyway, so no change needed there).
|
|||
|
|
|
|||
|
|
2. **Replace `worker.enqueue(...)` with `defer_async`** in the two endpoints that trigger pipeline work:
|
|||
|
|
|
|||
|
|
**`POST /adventures` (create_adventure)**
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Before
|
|||
|
|
await worker.enqueue(
|
|||
|
|
partial(
|
|||
|
|
_run_entry_pipeline_task, uuid.UUID(adventure.id), uuid.UUID(first_entry.id), user_id
|
|||
|
|
)
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# After
|
|||
|
|
await tasks.generate_adventure_entry.defer_async(
|
|||
|
|
adventure_id=str(adventure.id),
|
|||
|
|
entry_id=str(first_entry.id),
|
|||
|
|
user_id=str(user_id),
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**`POST /adventures/{adventure_id}/decisions` (record_decision)**
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Before
|
|||
|
|
await worker.enqueue(
|
|||
|
|
partial(
|
|||
|
|
_run_entry_pipeline_task,
|
|||
|
|
uuid.UUID(next_entry.adventure_id),
|
|||
|
|
uuid.UUID(next_entry.id),
|
|||
|
|
user_id,
|
|||
|
|
)
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# After
|
|||
|
|
await tasks.generate_adventure_entry.defer_async(
|
|||
|
|
adventure_id=str(next_entry.adventure_id),
|
|||
|
|
entry_id=str(next_entry.id),
|
|||
|
|
user_id=str(user_id),
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Procrastinate task arguments must be JSON-serialisable. `uuid.UUID` objects are converted to `str` at the call site; the task function converts them back with `uuid.UUID(...)`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Deleted: `api/app/worker.py`
|
|||
|
|
|
|||
|
|
Removed entirely once the migration is complete. No other callers exist outside `adventures.py` and `main.py`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Docker Compose Changes
|
|||
|
|
|
|||
|
|
### `docker-compose.yml` (base / local dev)
|
|||
|
|
|
|||
|
|
Add a `worker` service after `api`:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
worker:
|
|||
|
|
build: ./api
|
|||
|
|
volumes:
|
|||
|
|
- ./api:/app:z # hot-reload on code change (same as api)
|
|||
|
|
command: python -m worker.main
|
|||
|
|
environment:
|
|||
|
|
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-langlearn}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB:-langlearn}
|
|||
|
|
JWT_SECRET: ${JWT_SECRET}
|
|||
|
|
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
|
|||
|
|
DEEPL_API_KEY: ${DEEPL_API_KEY}
|
|||
|
|
DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
|
|||
|
|
GEMINI_API_KEY: ${GEMINI_API_KEY}
|
|||
|
|
PYTHONPATH: /app
|
|||
|
|
STORAGE_ENDPOINT_URL: http://storage:9000
|
|||
|
|
STORAGE_ACCESS_KEY: ${STORAGE_ACCESS_KEY:-langlearn}
|
|||
|
|
STORAGE_SECRET_KEY: ${STORAGE_SECRET_KEY}
|
|||
|
|
STORAGE_BUCKET: ${STORAGE_BUCKET:-langlearn}
|
|||
|
|
OTEL_SERVICE_NAME: ${OTEL_SERVICE_NAME:-language-learning-worker}
|
|||
|
|
depends_on:
|
|||
|
|
db:
|
|||
|
|
condition: service_healthy
|
|||
|
|
storage:
|
|||
|
|
condition: service_healthy
|
|||
|
|
restart: unless-stopped
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The worker does not need `ports`, the Prometheus exporter config, or `API_BASE_URL`. OTEL service name is changed so traces are distinguishable in Grafana.
|
|||
|
|
|
|||
|
|
### `docker-compose-dev.yml`
|
|||
|
|
|
|||
|
|
Same addition as above. The `volumes: - ./api:/app:z` mount means worker code reloads on save — but note that `python -m worker.main` does **not** hot-reload automatically (unlike uvicorn). For local dev, just restart the worker container after code changes: `docker compose restart worker`.
|
|||
|
|
|
|||
|
|
If hot-reload matters during development, the command can be wrapped with `watchfiles`:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
command: watchfiles --filter python "python -m worker.main" /app
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
(This requires `watchfiles` in the Python dependencies.)
|
|||
|
|
|
|||
|
|
### `docker-compose-prod.yml`
|
|||
|
|
|
|||
|
|
Add a `worker` service. The production command does _not_ run `alembic upgrade head` because migrations are already applied by the `api` container's startup command. The worker just starts:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
worker:
|
|||
|
|
build: ./api
|
|||
|
|
command: python -m worker.main
|
|||
|
|
environment:
|
|||
|
|
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-langlearn}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB:-langlearn}
|
|||
|
|
JWT_SECRET: ${JWT_SECRET}
|
|||
|
|
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
|
|||
|
|
DEEPL_API_KEY: ${DEEPL_API_KEY}
|
|||
|
|
DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
|
|||
|
|
GEMINI_API_KEY: ${GEMINI_API_KEY}
|
|||
|
|
PYTHONPATH: /app
|
|||
|
|
STORAGE_PROVIDER: bunny
|
|||
|
|
BUNNY_ZONE: ${BUNNY_ZONE}
|
|||
|
|
BUNNY_API_KEY: ${BUNNY_API_KEY}
|
|||
|
|
BUNNY_CDN_BASE_URL: ${BUNNY_CDN_BASE_URL}
|
|||
|
|
BUNNY_TOKEN_AUTH_KEY: ${BUNNY_TOKEN_AUTH_KEY}
|
|||
|
|
BUNNY_STORAGE_ENDPOINT: ${BUNNY_STORAGE_ENDPOINT}
|
|||
|
|
OTEL_SERVICE_NAME: ${OTEL_SERVICE_NAME:-language-learning-worker}
|
|||
|
|
depends_on:
|
|||
|
|
db:
|
|||
|
|
condition: service_healthy
|
|||
|
|
restart: unless-stopped
|
|||
|
|
deploy:
|
|||
|
|
resources:
|
|||
|
|
limits:
|
|||
|
|
cpus: "1"
|
|||
|
|
memory: 1G
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The worker depends on `db` only (no `storage` healthcheck needed since storage is Bunny CDN in prod, not a local container).
|
|||
|
|
|
|||
|
|
### `docker-compose.test.yml`
|
|||
|
|
|
|||
|
|
Add a `worker` service. Critically, it must receive `STUB_GENERATION: "true"` so it uses stub clients, matching what the API does in tests.
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
worker:
|
|||
|
|
build: ./api
|
|||
|
|
command: python -m worker.main
|
|||
|
|
environment:
|
|||
|
|
DATABASE_URL: postgresql+asyncpg://langlearn_test:testpassword@db:5432/langlearn_test
|
|||
|
|
JWT_SECRET: test-jwt-secret-not-for-production
|
|||
|
|
ANTHROPIC_API_KEY: test-key
|
|||
|
|
DEEPL_API_KEY: test-key
|
|||
|
|
DEEPGRAM_API_KEY: test-key
|
|||
|
|
GEMINI_API_KEY: test-key
|
|||
|
|
STORAGE_ENDPOINT_URL: http://storage:9000
|
|||
|
|
STORAGE_ACCESS_KEY: langlearn_test
|
|||
|
|
STORAGE_SECRET_KEY: testpassword123
|
|||
|
|
STORAGE_BUCKET: langlearn-test
|
|||
|
|
PYTHONPATH: /app
|
|||
|
|
STUB_GENERATION: "true"
|
|||
|
|
depends_on:
|
|||
|
|
db:
|
|||
|
|
condition: service_healthy
|
|||
|
|
storage:
|
|||
|
|
condition: service_healthy
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
No healthcheck needed — the worker has no HTTP endpoint, and if it starts late, pending jobs simply wait in the queue until it picks them up.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Testing
|
|||
|
|
|
|||
|
|
### Integration tests — no changes required
|
|||
|
|
|
|||
|
|
`tests/test_adventures.py` already polls with `_wait_for_adventure_status(client, id, "active", timeout=30)`. This pattern is compatible with the worker being a separate process: the test enqueues a job, the worker processes it asynchronously, and the test polls until the adventure status reflects completion.
|
|||
|
|
|
|||
|
|
With stub generation, the pipeline completes in milliseconds. The 30-second timeout is more than sufficient even accounting for worker container startup time.
|
|||
|
|
|
|||
|
|
The one risk is a **startup race**: if the first test creates an adventure before the worker container has opened its Procrastinate connection, the job sits in the queue unprocessed until the worker is ready. Since `docker compose up --wait` waits for containers with healthchecks to pass (i.e. `api` is healthy before tests run), and the worker starts immediately after `db` is healthy (a prerequisite already met before `api` is healthy), the worker will typically be ready before the first test fires. No action needed, but if this proves flaky in CI, adding a short `pg_isready`-style healthcheck to the worker is the fix.
|
|||
|
|
|
|||
|
|
### What the tests implicitly verify after migration
|
|||
|
|
|
|||
|
|
- `POST /adventures` returns `201` and adventure status is `'awaiting_first_entry'` ✓
|
|||
|
|
- Worker picks up the job and calls `run_entry_pipeline` ✓
|
|||
|
|
- Adventure transitions to `'active'`, entry to `'complete'` (polled via `_wait_for_adventure_status`) ✓
|
|||
|
|
- Decision endpoint triggers a second pipeline job ✓
|
|||
|
|
|
|||
|
|
The existing tests cover all of this already. No new tests are required for the migration itself, though a test that verifies behaviour on worker restart (jobs not lost) would be a nice addition.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Implementation Order
|
|||
|
|
|
|||
|
|
1. DONE: Add `procrastinate` (and `procrastinate[sqlalchemy]`) to `api/pyproject.toml` / `requirements`.
|
|||
|
|
2. Write Alembic migration for Procrastinate schema. Run it locally.
|
|||
|
|
3. Move stub services into their `app/outbound` directories
|
|||
|
|
4. Create `app/tasks.py` with the `procrastinate_app` instance and the `generate_adventure_entry` task.
|
|||
|
|
5. Create `app/worker_main.py`.
|
|||
|
|
6. Modify `app/main.py`: remove `worker_loop`, add `procrastinate_app.open_async()` to lifespan.
|
|||
|
|
7. Modify `app/routers/api/adventures.py`: replace `worker.enqueue(...)` with `tasks.generate_adventure_entry.defer_async(...)`.
|
|||
|
|
8. Add `worker` service to all four compose files.
|
|||
|
|
9. Run the test suite: `docker compose -f docker-compose.test.yml up --build --wait -d && pytest`.
|
|||
|
|
10. Delete `app/worker.py`.
|
|||
|
|
|
|||
|
|
Steps 1–4 can be done before touching the API, so the migration can be tested end-to-end before cutting over.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Open Questions
|
|||
|
|
|
|||
|
|
1. [ANSWERED] As mentioned above, so this small refactor, it makes sense. **Stub client placement.** The stub classes inside `adventures.py` need to be reachable from `tasks.py`. The proposal above lazy-imports them; the cleaner fix is to extract them to `app/outbound/stubs.py`. Doing this in the same PR keeps scope small but is worth doing if it avoids the circular-import smell.
|
|||
|
|
|
|||
|
|
2. **Worker concurrency.** `concurrency=4` is a placeholder. Adventure pipeline jobs are I/O-heavy (network calls), not CPU-heavy, so higher concurrency is fine. Tune based on Anthropic/DeepL API rate limits.
|
|||
|
|
|
|||
|
|
3. **Procrastinate schema management.** Procrastinate has its own versioned migration system (separate from Alembic). When upgrading Procrastinate in future, run `procrastinate schema --app=app.tasks.procrastinate_app migrate` (or wrap it in an Alembic migration). Don't forget this step.
|
|||
|
|
|
|||
|
|
4. **Observability.** Procrastinate emits structured log lines per job. These will appear in Loki automatically. A future improvement would be to add the `job_id` to the OpenTelemetry trace context inside `generate_adventure_entry`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Future: Scheduled Jobs (News Digest)
|
|||
|
|
|
|||
|
|
With Procrastinate in place, cron-style jobs are first-class citizens. Once `tasks.py` exists, adding a nightly job is:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
@procrastinate_app.periodic(cron="0 2 * * *") # 2am daily UTC
|
|||
|
|
async def generate_nightly_news_digest() -> None:
|
|||
|
|
async with AsyncSessionLocal() as db:
|
|||
|
|
await NewsDigestService(db, ...).run()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The worker process runs periodic tasks automatically; no additional scheduler container is needed. Procrastinate tracks the last fire time in the `procrastinate_periodic_defers` table, so missed runs (e.g. worker was down) fire once on the next startup.
|