83 lines
4.8 KiB
Markdown
83 lines
4.8 KiB
Markdown
|
|
# Design Document: Object Storage with Bunny CDN
|
||
|
|
|
||
|
|
This is a technical design document for implementing object (e.g. audio file) storage with Bunny CDN. This directory (`api/docs`) contains other similar files, notably `architecture.md` and `domain.md`. When you have worked through the change described here, please update `architecture.md`
|
||
|
|
|
||
|
|
## The problem
|
||
|
|
|
||
|
|
Language Learning App has audio as a core component, which requires files to be delivered to the end user. When developing locally, these files have been stored in a min.io service, mimicking an S3-like storage bucket.
|
||
|
|
|
||
|
|
Using this approach on a deployed instance (e.g. on a VPS using Docker), would result in high bandwidth and therefore a high cost. Using a dedicated, EU-based service like Bunny allows us to offload the delivery of content to a third-party at reduced cost (great!)
|
||
|
|
|
||
|
|
## The current implementation
|
||
|
|
|
||
|
|
Object storage was one of the first features built into this software in MVP state, as such it does not fit within the current architecture.
|
||
|
|
|
||
|
|
Right now `api/app/storage.py` contains some helper functions, notably the `upload_audio` and `download_audio` functions.
|
||
|
|
|
||
|
|
Users (through the web client) retrieve the media through two URLs (detailed in `api/app/routers/media.py`):
|
||
|
|
|
||
|
|
- `GET /media/adventure-audio/{filename:path}` for the choose-your-own-adventure file names
|
||
|
|
- `GET /media/{filename:path}`, used for the summary transcriptions
|
||
|
|
|
||
|
|
## The solution
|
||
|
|
|
||
|
|
We are going to use Bunny (bunny.net) as the CDN for all objects in deployed environments (right now, just production — in the future preprod or staging may exist).
|
||
|
|
|
||
|
|
Locally, for development purposes, we retain the use of MinIO. To decide which backend to use, we introduce an environment variable `STORAGE_PROVIDER` with a default value of `local` and an accepted alternative of `bunny`.
|
||
|
|
|
||
|
|
In situations where we use `local`, the existing `/media/..` proxy endpoints are returned when constructing audio URLs (e.g. in `api/app/routers/bff/articles.py` and `api/app/routers/bff/adventure.py`). When we use `bunny`, the Bunny CDN URL is returned directly so the request is never proxied through our service.
|
||
|
|
|
||
|
|
### Client interface
|
||
|
|
|
||
|
|
We will create a `BunnyClient` in `api/app/outbound/bunny/bunny_client.py` and extract the current MinIO logic into a `MinioClient` in `api/app/outbound/minio/minio_client.py`. Both implement a shared `StorageClient` protocol.
|
||
|
|
|
||
|
|
The interface is **generic** — the clients are storage adapters and must not encode domain concepts. Path construction (which directory, which filename) is the responsibility of the caller (the service layer), not the client.
|
||
|
|
|
||
|
|
```python
|
||
|
|
class StorageClient(Protocol):
|
||
|
|
def upload(self, path: str, data: bytes) -> bool: ...
|
||
|
|
def get_url(self, path: str) -> str: ...
|
||
|
|
def delete(self, path: str) -> bool: ...
|
||
|
|
```
|
||
|
|
|
||
|
|
Services construct paths using hardcoded directory prefixes (e.g. `"adventure-audio/"`, `"audio/"`). These are constants, not environment variables — they are not environment-specific and do not belong in config.
|
||
|
|
|
||
|
|
### Factory and instantiation
|
||
|
|
|
||
|
|
A factory function reads `STORAGE_PROVIDER` and returns the appropriate `StorageClient` implementation. The client is instantiated **once at app startup** (e.g. in `main.py`) as a module-level singleton — not per-request. This is consistent with how other outbound clients (`AnthropicClient`, `GeminiClient`, etc.) are handled.
|
||
|
|
|
||
|
|
### Bunny configuration
|
||
|
|
|
||
|
|
Bunny requires the following environment variables:
|
||
|
|
|
||
|
|
- `BUNNY_ZONE` — the storage zone name (the zone `languagelearningapp` has been created in the Bunny UI). No "DEFAULT" suffix; there is one zone.
|
||
|
|
- `BUNNY_API_KEY` — the Bunny API key for upload/delete operations.
|
||
|
|
- `BUNNY_CDN_BASE_URL` — the public CDN hostname used to construct delivery URLs.
|
||
|
|
|
||
|
|
### Signed vs. public URLs
|
||
|
|
|
||
|
|
Audio files are user-specific (i.e. one user should not be able to use another user's audio URL), Bunny signed URLs are required. Public CDN URLs are shareable by anyone who has the link.
|
||
|
|
|
||
|
|
As per Bunny's own documentation they recommend the token.py package:
|
||
|
|
|
||
|
|
```py
|
||
|
|
from token import sign_url
|
||
|
|
|
||
|
|
url = sign_url(
|
||
|
|
"https://myzone.b-cdn.net/videos/stream1/playlist.m3u8",
|
||
|
|
"your-security-key",
|
||
|
|
expiration_time=3600,
|
||
|
|
is_directory=True,
|
||
|
|
path_allowed="/videos/stream1/",
|
||
|
|
countries_allowed="GB",
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
`get_url(path)` on the `BunnyClient` must generate a time-limited (pick a sensible default for audio content here) signed URL using the Bunny Token Authentication feature. The MinIO implementation would use pre-signed S3 URLs for consistency.
|
||
|
|
|
||
|
|
Create a sibling method that explicitely creates public URLs for any future public content, call this `get_public_url`.
|
||
|
|
|
||
|
|
### Misc
|
||
|
|
|
||
|
|
`pcm_to_wav()` currently lives in `api/app/storage.py` but is a Gemini output concern. Move it to the Gemini client module (`api/app/outbound/gemini/`) when carrying out this refactor.
|