language-learning-app/api/docs/object-storage.md

93 lines
3 KiB
Markdown
Raw Permalink Normal View History

# Object Storage
This document explains how object storage works today, and how to control it across environments.
## TL;DR
- The app has one storage interface (`StorageClient`) and two implementations:
- `MinioClient` for local/dev (S3-compatible MinIO)
- `BunnyClient` for deployed environments (Bunny Storage + Bunny CDN)
- Provider selection is controlled by `STORAGE_PROVIDER`:
- `local` -> MinIO
- `bunny` -> Bunny
- The client is initialised once at API startup and stored as a process-level singleton.
## Runtime Lifecycle
1. API startup runs `init_storage()` from `app.outbound.storage_factory`.
2. `init_storage()` reads config from `app.config.settings`.
3. It creates either `MinioClient` or `BunnyClient`.
4. The client instance is set via `_set_storage_client(...)`.
5. App code calls `get_storage_client()` anywhere it needs object URLs or file operations.
If storage is used before startup initialisation, `get_storage_client()` raises an assertion error.
## Interface Contract
`StorageClient` currently exposes:
- `upload(path, data) -> bool`
- `get_url(path) -> str`
- `get_public_url(path) -> str`
- `delete(path) -> bool`
- `download(path) -> (bytes, content_type)`
Important behavior differences:
- MinIO supports `download(...)` for API media proxy routes.
- Bunny does not support direct download in this adapter and raises `NotImplementedError`; callers should use signed CDN URLs from `get_url(...)`.
## URL Behavior
### Local/MinIO mode (`STORAGE_PROVIDER=local`)
- `get_url(path)` returns API-proxied URLs under `/media/...`.
- Browser requests go through the API media router.
- Media router validates DB ownership/existence, then streams bytes from storage.
### Bunny mode (`STORAGE_PROVIDER=bunny`)
- `get_url(path)` returns a signed Bunny CDN URL.
- Signature uses token auth key + path + expiry (currently 1 hour).
- Browser requests go directly to Bunny CDN (no API proxy hop).
## Configuration
### Local/MinIO settings
- `STORAGE_PROVIDER=local`
- `STORAGE_ENDPOINT_URL` (for Docker dev: `http://storage:9000`)
- `STORAGE_ACCESS_KEY`
- `STORAGE_SECRET_KEY`
- `STORAGE_BUCKET`
- `API_BASE_URL` (used to build `/media/...` URLs)
On startup, `MinioClient.ensure_bucket_exists()` is called.
### Bunny settings
- `STORAGE_PROVIDER=bunny`
- `BUNNY_ZONE`
- `BUNNY_API_KEY`
- `BUNNY_CDN_BASE_URL`
- `BUNNY_TOKEN_AUTH_KEY`
- `BUNNY_STORAGE_ENDPOINT`
On startup, Bunny client runs `list_directory("")` as a connection test.
## Where Storage Is Used
- BFF routers call `get_storage_client().get_url(...)` to expose audio URLs.
- Media router calls `get_storage_client().download(...)` to stream files for `/media/...` routes.
Practically:
- In local mode, `/media/...` endpoints are expected and functional.
- In Bunny mode, clients should consume returned CDN URLs directly.
## Operational Notes
- Upload content type is currently fixed to `audio/wav` in both adapters.
- Bunny signed URL expiry is `_SIGNED_URL_EXPIRY_SECONDS = 3600`.
- The storage client is per-process; each API process initialises its own instance at boot.