Embedding Worker Deployment
Docker build, runtime configuration, and production deployment guide for the embedding worker.
Docker Build
The Dockerfile uses a multi-stage build to minimize the final image size:
Stage 1: Builder
- Installs
uvfrom the official image - Syncs dependencies with
--frozen --no-dev(no dev deps, locked versions) - Downloads the LaBSE model and converts it to ONNX format
- Strips PyTorch weights (
.bin,.safetensors) to save ~1.5 GB — only ONNX artifacts are kept
Stage 2: Runtime
- Copies the virtual environment, source code, and cached model from the builder
- Sets
PATHto use the venv's Python - Runs
uvicornon port 5201
# Build
docker build -t embedding-worker .
# Run
docker run -p 5201:5201 embedding-workerThe model is baked into the image at build time, so the container starts without needing network access to HuggingFace.
Docker Compose
For local development with the full stack:
docker compose upFor production-like deployment:
docker compose -f docker-compose.prod.yml upEnvironment Variables
Override defaults by passing environment variables to the container:
docker run -p 5201:5201 \
-e LOG_LEVEL=DEBUG \
-e OPENAPI_MODE=false \
embedding-workerSee Architecture > Configuration for the full list.
Local Development
# Install dependencies
uv sync
# Run with hot reload
uv run uvicorn src.main:app --reload --port 5201
# Run tests
uv run pytest
# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/Health Monitoring
The /health endpoint reports model readiness:
200 OK— Model loaded, ready to serve503 Service Unavailable— Model still loading or failed to load
Use this for container health checks and load balancer probes:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:5201/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60sThe start_period should account for model loading time (~30-45s on first cold start).