4.8 KiB
Design Document: Object Storage with Bunny CDN
This is a technical design document for implementing object (e.g. audio file) storage with Bunny CDN. This directory (api/docs) contains other similar files, notably architecture.md and domain.md. When you have worked through the change described here, please update architecture.md
The problem
Language Learning App has audio as a core component, which requires files to be delivered to the end user. When developing locally, these files have been stored in a min.io service, mimicking an S3-like storage bucket.
Using this approach on a deployed instance (e.g. on a VPS using Docker), would result in high bandwidth and therefore a high cost. Using a dedicated, EU-based service like Bunny allows us to offload the delivery of content to a third-party at reduced cost (great!)
The current implementation
Object storage was one of the first features built into this software in MVP state, as such it does not fit within the current architecture.
Right now api/app/storage.py contains some helper functions, notably the upload_audio and download_audio functions.
Users (through the web client) retrieve the media through two URLs (detailed in api/app/routers/media.py):
GET /media/adventure-audio/{filename:path}for the choose-your-own-adventure file namesGET /media/{filename:path}, used for the summary transcriptions
The solution
We are going to use Bunny (bunny.net) as the CDN for all objects in deployed environments (right now, just production — in the future preprod or staging may exist).
Locally, for development purposes, we retain the use of MinIO. To decide which backend to use, we introduce an environment variable STORAGE_PROVIDER with a default value of local and an accepted alternative of bunny.
In situations where we use local, the existing /media/.. proxy endpoints are returned when constructing audio URLs (e.g. in api/app/routers/bff/articles.py and api/app/routers/bff/adventure.py). When we use bunny, the Bunny CDN URL is returned directly so the request is never proxied through our service.
Client interface
We will create a BunnyClient in api/app/outbound/bunny/bunny_client.py and extract the current MinIO logic into a MinioClient in api/app/outbound/minio/minio_client.py. Both implement a shared StorageClient protocol.
The interface is generic — the clients are storage adapters and must not encode domain concepts. Path construction (which directory, which filename) is the responsibility of the caller (the service layer), not the client.
class StorageClient(Protocol):
def upload(self, path: str, data: bytes) -> bool: ...
def get_url(self, path: str) -> str: ...
def delete(self, path: str) -> bool: ...
Services construct paths using hardcoded directory prefixes (e.g. "adventure-audio/", "audio/"). These are constants, not environment variables — they are not environment-specific and do not belong in config.
Factory and instantiation
A factory function reads STORAGE_PROVIDER and returns the appropriate StorageClient implementation. The client is instantiated once at app startup (e.g. in main.py) as a module-level singleton — not per-request. This is consistent with how other outbound clients (AnthropicClient, GeminiClient, etc.) are handled.
Bunny configuration
Bunny requires the following environment variables:
BUNNY_ZONE— the storage zone name (the zonelanguagelearningapphas been created in the Bunny UI). No "DEFAULT" suffix; there is one zone.BUNNY_API_KEY— the Bunny API key for upload/delete operations.BUNNY_CDN_BASE_URL— the public CDN hostname used to construct delivery URLs.
Signed vs. public URLs
Audio files are user-specific (i.e. one user should not be able to use another user's audio URL), Bunny signed URLs are required. Public CDN URLs are shareable by anyone who has the link.
As per Bunny's own documentation they recommend the token.py package:
from token import sign_url
url = sign_url(
"https://myzone.b-cdn.net/videos/stream1/playlist.m3u8",
"your-security-key",
expiration_time=3600,
is_directory=True,
path_allowed="/videos/stream1/",
countries_allowed="GB",
)
get_url(path) on the BunnyClient must generate a time-limited (pick a sensible default for audio content here) signed URL using the Bunny Token Authentication feature. The MinIO implementation would use pre-signed S3 URLs for consistency.
Create a sibling method that explicitely creates public URLs for any future public content, call this get_public_url.
Misc
pcm_to_wav() currently lives in api/app/storage.py but is a Gemini output concern. Move it to the Gemini client module (api/app/outbound/gemini/) when carrying out this refactor.