Faculytics Docs

Quality Metrics

Five evaluation metrics computed on every topic model run — NPMI coherence, diversity, outlier ratio, silhouette score, and embedding coherence.

Every successful topic model run computes five quality metrics via compute_metrics() in src/evaluate.py. These metrics are returned in the response, persisted on the TopicModelRun entity, and surfaced in the analysis dashboard.

Metrics Overview

MetricRangeTargetWhat It Measures
NPMI Coherence-1 to 1> 0.1Do topic keywords co-occur in the same documents?
Topic Diversity0 to 1> 0.7Are topics distinct from each other?
Outlier Ratio0 to 1< 0.2What fraction of documents couldn't be clustered?
Silhouette Score-1 to 1> 0.0Are clusters well-separated in embedding space?
Embedding Coherence0 to 1> 0.5Are topic keywords semantically related?

Metric Details

NPMI Coherence

Normalized Pointwise Mutual Information measures whether the top keywords for each topic actually co-occur in the input documents.

NPMI(w1, w2) = (log P(w1,w2) / (P(w1) * P(w2))) / -log P(w1,w2)
  • Uses a sliding window of 10 tokens for co-occurrence counting
  • Computed via Gensim's CoherenceModel with coherence="c_npmi"
  • Text is tokenized with a custom tokenizer that strips punctuation and keeps only alphabetic tokens > 1 character (handles Latin-script Cebuano/Tagalog)
  • Averaged across all non-outlier topics

Interpretation:

  • > 0.1 — keywords meaningfully co-occur (good)
  • ~0.0 — keywords are independent (random)
  • < 0.0 — keywords anti-correlate (bad)

Topic Diversity

Ratio of unique keywords across all topics. Measures whether topics are discovering distinct themes or recycling the same words.

diversity = |unique keywords| / |total keywords|
  • Uses top 10 keywords per topic
  • A value of 1.0 means every keyword is unique to one topic
  • Low diversity suggests topics are too similar and could be merged

Outlier Ratio

Fraction of input documents assigned to BERTopic's outlier topic (-1). These are documents that HDBSCAN couldn't confidently assign to any cluster.

outlier_ratio = count(topic == -1) / total_documents
  • High outlier ratios (above 0.3) suggest min_topic_size may be too large or the data lacks clear clusters
  • The API filters out outlier assignments during persistence — they're not surfaced to users

Silhouette Score

Measures how well-separated the clusters are in the original 768-dim embedding space (not the UMAP-reduced space).

silhouette = mean((b - a) / max(a, b))  for each non-outlier document

Where a = mean distance to same-cluster documents, b = mean distance to nearest other cluster.

  • Uses cosine distance (matching the embedding space)
  • Outlier documents (topic -1) are excluded
  • Returns 0.0 if fewer than 2 topics or fewer than 10 non-outlier documents
  • Range: -1 (wrong clusters) to 1 (dense, well-separated clusters)

Embedding Coherence

Measures whether the top keywords for each topic are semantically related, using the LaBSE model to embed keywords and compute pairwise cosine similarity.

For each topic:
  1. Encode top 10 keywords with LaBSE (normalized)
  2. Compute all pairwise cosine similarities
  3. Average the similarities

embedding_coherence = mean(per_topic_averages)
  • Requires the LaBSE embed_model (passed from the handler)
  • Returns 0.0 if no model is available
  • More robust than NPMI for multilingual text, since it captures semantic rather than surface co-occurrence

NaN Handling

All metric values pass through _safe_float() which replaces NaN and Inf with 0.0. This prevents JSON serialization errors when metrics can't be computed (e.g., too few documents for silhouette, division by zero in diversity).

Example Response

{
  "metrics": {
    "npmi_coherence": 0.1523,
    "topic_diversity": 0.8200,
    "outlier_ratio": 0.1150,
    "silhouette_score": 0.2341,
    "embedding_coherence": 0.6102
  }
}