Quality Metrics
Five evaluation metrics computed on every topic model run — NPMI coherence, diversity, outlier ratio, silhouette score, and embedding coherence.
Every successful topic model run computes five quality metrics via compute_metrics() in src/evaluate.py. These metrics are returned in the response, persisted on the TopicModelRun entity, and surfaced in the analysis dashboard.
Metrics Overview
| Metric | Range | Target | What It Measures |
|---|---|---|---|
| NPMI Coherence | -1 to 1 | > 0.1 | Do topic keywords co-occur in the same documents? |
| Topic Diversity | 0 to 1 | > 0.7 | Are topics distinct from each other? |
| Outlier Ratio | 0 to 1 | < 0.2 | What fraction of documents couldn't be clustered? |
| Silhouette Score | -1 to 1 | > 0.0 | Are clusters well-separated in embedding space? |
| Embedding Coherence | 0 to 1 | > 0.5 | Are topic keywords semantically related? |
Metric Details
NPMI Coherence
Normalized Pointwise Mutual Information measures whether the top keywords for each topic actually co-occur in the input documents.
NPMI(w1, w2) = (log P(w1,w2) / (P(w1) * P(w2))) / -log P(w1,w2)
- Uses a sliding window of 10 tokens for co-occurrence counting
- Computed via Gensim's
CoherenceModelwithcoherence="c_npmi" - Text is tokenized with a custom tokenizer that strips punctuation and keeps only alphabetic tokens > 1 character (handles Latin-script Cebuano/Tagalog)
- Averaged across all non-outlier topics
Interpretation:
> 0.1— keywords meaningfully co-occur (good)~0.0— keywords are independent (random)< 0.0— keywords anti-correlate (bad)
Topic Diversity
Ratio of unique keywords across all topics. Measures whether topics are discovering distinct themes or recycling the same words.
diversity = |unique keywords| / |total keywords|
- Uses top 10 keywords per topic
- A value of 1.0 means every keyword is unique to one topic
- Low diversity suggests topics are too similar and could be merged
Outlier Ratio
Fraction of input documents assigned to BERTopic's outlier topic (-1). These are documents that HDBSCAN couldn't confidently assign to any cluster.
outlier_ratio = count(topic == -1) / total_documents
- High outlier ratios (above 0.3) suggest
min_topic_sizemay be too large or the data lacks clear clusters - The API filters out outlier assignments during persistence — they're not surfaced to users
Silhouette Score
Measures how well-separated the clusters are in the original 768-dim embedding space (not the UMAP-reduced space).
silhouette = mean((b - a) / max(a, b)) for each non-outlier document
Where a = mean distance to same-cluster documents, b = mean distance to nearest other cluster.
- Uses cosine distance (matching the embedding space)
- Outlier documents (topic -1) are excluded
- Returns 0.0 if fewer than 2 topics or fewer than 10 non-outlier documents
- Range: -1 (wrong clusters) to 1 (dense, well-separated clusters)
Embedding Coherence
Measures whether the top keywords for each topic are semantically related, using the LaBSE model to embed keywords and compute pairwise cosine similarity.
For each topic:
1. Encode top 10 keywords with LaBSE (normalized)
2. Compute all pairwise cosine similarities
3. Average the similarities
embedding_coherence = mean(per_topic_averages)
- Requires the LaBSE
embed_model(passed from the handler) - Returns 0.0 if no model is available
- More robust than NPMI for multilingual text, since it captures semantic rather than surface co-occurrence
NaN Handling
All metric values pass through _safe_float() which replaces NaN and Inf with 0.0. This prevents JSON serialization errors when metrics can't be computed (e.g., too few documents for silhouette, division by zero in diversity).
Example Response
{
"metrics": {
"npmi_coherence": 0.1523,
"topic_diversity": 0.8200,
"outlier_ratio": 0.1150,
"silhouette_score": 0.2341,
"embedding_coherence": 0.6102
}
}