Faculytics Docs

Topic Modeling Worker

BERTopic-based topic discovery contract with embeddings, hyperparameters, and quality metrics.

Source of Truth: src/modules/analysis/dto/topic-model-worker.dto.ts

Endpoint

POST {TOPIC_MODEL_WORKER_URL}

Request

{
  "items": [
    {
      "submissionId": "uuid-string",
      "text": "The pace was too fast, couldn't follow along.",
      "embedding": [0.123, -0.456, 0.789, "... (768 floats)"]
    }
  ],
  "params": {
    "min_topic_size": 5,
    "nr_topics": null,
    "umap_n_neighbors": 15,
    "umap_n_components": 5
  },
  "metadata": {
    "pipelineId": "uuid-string",
    "runId": "uuid-string"
  }
}

Fields

FieldTypeRequiredDescription
itemsarrayYesFiltered submissions (post sentiment gate)
items[].submissionIdstringYesUnique submission identifier
items[].textstring (min 1)YesQualitative comment text
items[].embeddingnumber[768]YesLaBSE 768-dim embedding vector
paramsobjectNoBERTopic hyperparameters
params.min_topic_sizeintNoMinimum cluster size
params.nr_topicsintNoTarget topic count (auto if omitted)
params.umap_n_neighborsintNoUMAP neighbor count
params.umap_n_componentsintNoUMAP output dimensions
metadata.pipelineIdstringYesParent pipeline ID
metadata.runIdstringYesCurrent topic model run ID

Response

{
  "version": "1.0",
  "status": "completed",
  "topics": [
    {
      "topicIndex": 0,
      "rawLabel": "0_fast_rushed_pace",
      "keywords": [
        "fast",
        "rushed",
        "pace",
        "speed",
        "hurry",
        "quick",
        "follow",
        "slow",
        "behind",
        "catch"
      ],
      "docCount": 45
    }
  ],
  "assignments": [
    {
      "submissionId": "uuid-string",
      "topicIndex": 0,
      "probability": 0.72
    },
    {
      "submissionId": "uuid-string",
      "topicIndex": 1,
      "probability": 0.15
    }
  ],
  "metrics": {
    "npmi_coherence": 0.42,
    "topic_diversity": 0.85,
    "outlier_ratio": 0.08,
    "silhouette_score": 0.31,
    "embedding_coherence": 0.67
  },
  "outlierCount": 12,
  "completedAt": "2026-03-13T10:35:00.000Z"
}

Fields

FieldTypeRequiredDescription
versionstringYesWorker/model version identifier
statusenumYescompleted or failed
topicsarrayOn successDiscovered topic clusters
topics[].topicIndexintYesBERTopic topic_id (0, 1, 2...)
topics[].rawLabelstringYesBERTopic auto-generated label
topics[].keywordsstring[]YesTop keywords from c-TF-IDF
topics[].docCountintYesNumber of documents in cluster
assignmentsarrayOn successSoft topic assignments per submission
assignments[].submissionIdstringYesMatches input submissionId
assignments[].topicIndexintYesAssigned topic index
assignments[].probabilitynumber (0-1)YesAssignment probability
metricsobjectOn successModel quality metrics
outlierCountintOn successCount of outlier documents (topic -1)
errorstringOn failureError message
completedAtISO datetimeYesProcessing completion timestamp

Notes

  • Based on CtrlAltElite-Devs/topic-modeling.faculytics (BERTopic + LaBSE + UMAP + HDBSCAN)
  • Embeddings must be LaBSE 768-dim (same model used for similarity)
  • Assignments with probability of 0.01 or below are filtered out on persistence
  • isDominant is computed server-side (highest probability per submission)
  • Topic index -1 represents outliers (not persisted as topics)

Versioning

The version field tracks the BERTopic pipeline version. Stored on TopicModelRun.workerVersion.