API Contract
Request and response schemas for the topic modeling worker — field definitions, types, and examples.
Source of truth: api.faculytics/src/modules/analysis/dto/topic-model-worker.dto.ts (Zod schemas)
Worker schemas: src/models.py (Pydantic, must stay in sync with Zod)
Endpoint
POST {TOPIC_MODEL_WORKER_URL}
When deployed on RunPod, the actual endpoint is:
POST https://api.runpod.ai/v2/<endpoint-id>/runsync
Headers: { Authorization: Bearer <RUNPOD_API_KEY> }
Body: { input: <request payload> }
The RunPod envelope (input wrapper, output unwrapping) is handled by the API's RunPodBatchProcessor.
Request
{
"items": [
{
"submissionId": "uuid-string",
"text": "The pace was too fast, couldn't follow along.",
"embedding": [0.123, -0.456, 0.789, "... (768 floats)"]
}
],
"params": {
"min_topic_size": 15,
"nr_topics": 20,
"umap_n_neighbors": 20,
"umap_n_components": 10
}
}Request Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
items | array | Yes | — | Submissions that passed the sentiment gate |
items[].submissionId | string | Yes | — | Unique submission identifier |
items[].text | string | Yes | — | Pre-cleaned qualitative comment (cleanedComment) |
items[].embedding | number[768] | Yes | — | Pre-computed LaBSE 768-dim embedding |
params | object | No | RUN 012 defaults | BERTopic hyperparameters |
params.min_topic_size | int | No | 15 | Minimum documents per topic cluster |
params.nr_topics | int | No | 20 | Target topic count (merges until reached) |
params.umap_n_neighbors | int | No | 20 | UMAP local neighborhood size |
params.umap_n_components | int | No | 10 | UMAP output dimensions |
The worker uses ConfigDict(extra="ignore") on all Pydantic models, so additional envelope fields sent by the API (jobId, version, type, metadata, publishedAt) are silently ignored during validation.
Response — Success
{
"version": "1.0.0",
"status": "completed",
"topics": [
{
"topicIndex": 0,
"rawLabel": "0_fast_rushed_pace",
"keywords": ["fast", "rushed", "pace", "speed", "hurry", "quick", "follow", "slow", "behind", "catch"],
"docCount": 45
}
],
"assignments": [
{
"submissionId": "uuid-string",
"topicIndex": 0,
"probability": 0.7234
}
],
"metrics": {
"npmi_coherence": 0.1523,
"topic_diversity": 0.8200,
"outlier_ratio": 0.1150,
"silhouette_score": 0.2341,
"embedding_coherence": 0.6102
},
"outlierCount": 12,
"completedAt": "2026-03-21T10:35:00.000Z"
}Response — Failure
{
"version": "1.0.0",
"status": "failed",
"error": "Received 8 items, need at least 15 (min_topic_size) for topic modeling",
"completedAt": "2026-03-21T10:35:00.000Z"
}Response Fields
| Field | Type | Present | Description |
|---|---|---|---|
version | string | Always | Worker version (from config.WORKER_VERSION) |
status | "completed" | "failed" | Always | Outcome status |
topics | array | On success | Discovered topic clusters |
topics[].topicIndex | int | — | BERTopic topic ID (0, 1, 2, ...) |
topics[].rawLabel | string | — | Auto-generated label (e.g., "0_fast_rushed_pace") |
topics[].keywords | string[] | — | Top 10 keywords from KeyBERTInspired |
topics[].docCount | int | — | Documents in this cluster |
assignments | array | On success | Per-document topic assignments |
assignments[].submissionId | string | — | Matches input submissionId |
assignments[].topicIndex | int | — | Assigned topic index |
assignments[].probability | number (0-1) | — | Assignment confidence (4 decimal places) |
metrics | object | On success | Model quality metrics (see Metrics) |
outlierCount | int | On success | Documents assigned to topic -1 |
error | string | On failure | Human-readable error message |
completedAt | ISO datetime | Always | Processing completion timestamp |
API-Side Processing
After receiving the response, the TopicModelProcessor in the API:
- Validates the response against
topicModelWorkerResponseSchema(Zod) - Creates
Topicentities for each topic (withrawLabel,keywords,docCount) - Creates
TopicAssignmententities — filters out assignments with probability at or below 0.01 - Marks the highest-probability assignment per submission as
isDominant - Persists metrics on the
TopicModelRunentity - Calls the orchestrator to advance the pipeline to topic labeling
Notes
- Outlier documents (topic -1) are not included in the
assignmentsarray - The
rawLabelis later enriched with a human-readablelabelby the topic labeling stage (LLM) - Embeddings must be 768-dim LaBSE vectors — the same model used by the embedding worker