Faculytics Docs

Deployment

Docker image build, RunPod serverless configuration, and production deployment.

The worker is deployed as a RunPod serverless endpoint running on GPU instances.

Docker Image

The Dockerfile uses RunPod's PyTorch base image with CUDA support:

FROM runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
 
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
 
WORKDIR /app
 
COPY pyproject.toml .python-version ./
COPY uv.loc[k] ./
 
RUN uv sync --frozen --no-dev --no-install-project || uv sync --no-dev --no-install-project
 
# Bake LaBSE into image (~1.8 GB) to avoid cold-start download
RUN uv run python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/LaBSE')"
 
COPY src/ src/
 
CMD ["uv", "run", "python", "-m", "src.handler"]

Key Build Decisions

  • LaBSE baked in — the model is downloaded during build (~1.8 GB). This eliminates cold-start latency from model downloads on fresh containers.
  • uv for dependency management — faster than pip, with lockfile support. Falls back to non-frozen install if no lockfile exists.
  • No dev dependencies--no-dev keeps the image lean (no pytest, ruff).
  • Source copied last — Docker layer caching means dependency installation only reruns when pyproject.toml or uv.lock change.

Building

docker build -t topic-worker .

The image is ~8-10 GB due to CUDA runtime + PyTorch + LaBSE model.

Pushing to Registry

docker tag topic-worker <registry>/topic-worker:latest
docker push <registry>/topic-worker:latest

RunPod Configuration

Serverless Endpoint Setup

  1. Create a serverless endpoint on RunPod
  2. Point it to the Docker image in your registry
  3. Configure GPU type (any CUDA-capable GPU works; 16GB+ VRAM recommended)
  4. Set the endpoint URL in the API's .env:
TOPIC_MODEL_WORKER_URL=https://api.runpod.ai/v2/<endpoint-id>/runsync
RUNPOD_API_KEY=<your-key>

Request Flow

API → POST /v2/<endpoint-id>/runsync
     Body: { input: { items: [...], params: {...} } }
     Headers: { Authorization: Bearer <RUNPOD_API_KEY> }

RunPod → Starts container (or uses warm instance)
       → Calls handler({ input: { items: [...], params: {...} } })

Worker → Returns result dict

RunPod → Wraps in { id, status: "COMPLETED", output: <result> }
       → Returns to API

Scaling

SettingRecommended
Min workers0 (scale to zero when idle)
Max workers1-2 (topic modeling is a batch operation, not high-throughput)
Idle timeout30s (keep warm for short periods between pipeline stages)
Execution timeout300s (matches BULLMQ_TOPIC_MODEL_HTTP_TIMEOUT_MS)

Configuration

The worker has no environment variables — all configuration is in src/config.py:

ConfigValuePurpose
LABSE_MODELsentence-transformers/LaBSEEmbedding model for KeyBERTInspired
DEVICEcuda or cpu (auto-detected)PyTorch device
WORKER_VERSION1.0.0Returned in responses, stored on TopicModelRun.workerVersion
DEFAULT_PARAMSRUN 012 valuesHyperparameter defaults

Dependencies

Core runtime dependencies (pyproject.toml):

PackageVersionPurpose
runpod≥ 1.7.0RunPod serverless handler framework
pydantic≥ 2.0Request/response validation
sentence-transformers≥ 3.0LaBSE model loading
bertopic≥ 0.16.0Topic modeling pipeline
umap-learn≥ 0.5.6Dimensionality reduction
hdbscan≥ 0.8.33Density-based clustering
scikit-learn≥ 1.4.0Silhouette score, CountVectorizer
gensim≥ 4.3.0NPMI coherence computation
numpy≥ 1.26.0Array operations