Embedding Worker Deployment

Docker build, runtime configuration, and production deployment guide for the embedding worker.

Docker Build

The Dockerfile uses a multi-stage build to minimize the final image size:

Stage 1: Builder

Installs uv from the official image
Syncs dependencies with --frozen --no-dev (no dev deps, locked versions)
Downloads the LaBSE model and converts it to ONNX format
Strips PyTorch weights (.bin, .safetensors) to save ~1.5 GB — only ONNX artifacts are kept

Stage 2: Runtime

Copies the virtual environment, source code, and cached model from the builder
Sets PATH to use the venv's Python
Runs uvicorn on port 5201

# Build
docker build -t embedding-worker .
 
# Run
docker run -p 5201:5201 embedding-worker

The model is baked into the image at build time, so the container starts without needing network access to HuggingFace.

Docker Compose

For local development with the full stack:

docker compose up

For production-like deployment:

docker compose -f docker-compose.prod.yml up

Environment Variables

Override defaults by passing environment variables to the container:

docker run -p 5201:5201 \
  -e LOG_LEVEL=DEBUG \
  -e OPENAPI_MODE=false \
  embedding-worker

See Architecture > Configuration for the full list.

Local Development

# Install dependencies
uv sync
 
# Run with hot reload
uv run uvicorn src.main:app --reload --port 5201
 
# Run tests
uv run pytest
 
# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

Health Monitoring

The /health endpoint reports model readiness:

200 OK — Model loaded, ready to serve
503 Service Unavailable — Model still loading or failed to load

Use this for container health checks and load balancer probes:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:5201/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 60s

The start_period should account for model loading time (~30-45s on first cold start).