Analysis Pipeline
End-to-end lifecycle of an analysis pipeline from creation through completion.
This document describes the end-to-end lifecycle of an analysis pipeline — from creation through completion.
Overview
An analysis pipeline processes all qualitative feedback for a given scope (semester + optional filters) through four sequential AI stages, producing actionable recommendations.
1. Create Pipeline
Endpoint: POST /analysis/pipelines
The caller provides a scope:
| Parameter | Required | Description |
|---|---|---|
semesterId | Yes | Target semester |
facultyId | No | Filter to a specific faculty member |
questionnaireVersionId | No | Filter to a specific version |
departmentId | No | Filter to a department |
programId | No | Filter to a program |
campusId | No | Filter to a campus |
courseId | No | Filter to a course |
The orchestrator:
- Deduplicates — If an active (non-terminal) pipeline with the same scope exists, returns it instead of creating a new one.
- Computes coverage stats — Counts submissions, comments, and enrollments within scope. Calculates response rate.
- Generates warnings — Flags low response rate (< 25%), insufficient submissions (< 30), insufficient comments (< 10), or stale enrollment data (> 24h since last sync).
- Returns the pipeline in
AWAITING_CONFIRMATIONstatus.
2. Confirm Pipeline
Endpoint: POST /analysis/pipelines/:id/confirm
The orchestrator:
- Validates the pipeline is in
AWAITING_CONFIRMATIONstatus. - Checks that
SENTIMENT_WORKER_URLis configured. - Embedding backfill (best-effort): If
EMBEDDINGS_WORKER_URLis configured and some submissions withcleanedCommentlack embeddings, enqueues individual embedding jobs using the cleaned text. These run alongside sentiment analysis. - Creates a
SentimentRunentity and dispatches a batch job to the sentiment queue. - Advances pipeline to
SENTIMENT_ANALYSIS.
3. Sentiment Analysis
The SentimentProcessor:
- Sends all
cleanedCommenttexts as a batch HTTP POST to the sentiment worker. - Validates each result item against
sentimentResultItemSchema. - Determines the dominant label (positive/neutral/negative) from scores.
- Creates
SentimentResultentities. - Marks the
SentimentRunasCOMPLETED. - Calls
OnSentimentComplete()to advance the pipeline.
4. Sentiment Gate
The orchestrator applies an in-memory filter:
- Always include: Negative and neutral comments (most actionable).
- Conditionally include: Positive comments with ≥ 10 words.
- Exclude: Short positive comments (noise for topic modeling).
Gate results are persisted via bulk nativeUpdate on SentimentResult.passedTopicGate. Statistics are stored on the pipeline (sentimentGateIncluded, sentimentGateExcluded).
If the post-gate corpus is < 30 submissions, a warning is appended.
5. Topic Modeling
The orchestrator:
- Fetches gate-passing submissions with their embeddings from
SubmissionEmbedding. - Skips submissions without embeddings (logs a warning if some are missing).
- Creates a
TopicModelRunand dispatches a batch job with text + embedding vectors.
The TopicModelProcessor:
- Validates the response against
topicModelWorkerResponseSchema. - Creates
Topicentities for each discovered cluster. - Filters assignments by probability > 0.01.
- Computes
isDominantper submission (highest probability assignment). - Persists
TopicAssignmententities in chunks of 500. - Updates run metadata (topic count, outlier count, quality metrics).
- Calls
OnTopicModelComplete().
6. Topic Labeling
After topic modeling completes and before recommendations are dispatched, the orchestrator runs an inline enrichment step:
- Fetches the latest
TopicModelRunand all itsTopicentities. - Calls
TopicLabelService.generateLabels(topics), which sends topics (raw labels + keywords) to OpenAIgpt-4o-mini. - The LLM returns short, human-readable labels (2-4 words, title case) via Zod-validated structured output.
- Labels are written to
Topic.labeland flushed to the database.
Fallback: If the LLM call fails, topics keep their BERTopic-generated rawLabel. This step is non-blocking.
7. Recommendations
The orchestrator creates a RecommendationRun and dispatches a lightweight job to the recommendations queue (containing only pipeline and run IDs).
The RecommendationsProcessor calls RecommendationGenerationService.Generate(pipelineId), which:
- Loads the pipeline with all scope relations.
- Aggregates dimension scores via SQL (
AVG(numeric_value) GROUP BY dimension_code). - Loads the top 10 topics and computes per-topic sentiment breakdowns by cross-referencing topic assignments with sentiment results.
- Selects sample quotes from dominant topic assignments (sorted by sentiment strength).
- Selects up to 20 sample comments proportionally across sentiment labels.
- Constructs a system + user prompt and calls OpenAI with
zodResponseFormatfor structured output. - The LLM returns 3-7 recommendations split between STRENGTH (positive patterns) and IMPROVEMENT (areas to work on).
- Each recommendation is enriched with supporting evidence:
- Topic-level sources (label, comment count, sentiment breakdown, sample quotes)
- Dimension score sources (dimension code + average score pairs)
- Computed confidence level (HIGH/MEDIUM/LOW based on comment count and sentiment agreement)
The processor then:
- Creates
RecommendedActionentities with category, headline, description, actionPlan, priority, and supportingEvidence (JSONB). - Marks the
RecommendationRunasCOMPLETED. - Calls
OnRecommendationsComplete().
Retrieving Recommendations
Endpoint: GET /analysis/pipelines/:id/recommendations
Returns the latest RecommendationRun for the pipeline with all actions. If the run is still processing, returns an empty actions array with the current run status.
8. Completion
Pipeline status moves to COMPLETED with completedAt timestamp.
Error Handling
- Stage failure: Any processor can call
OnStageFailed(), which sets pipeline status toFAILEDwith an error message identifying the stage. - Exhausted retries: After all BullMQ retry attempts are exhausted, the processor's
onFailedhandler callsOnStageFailed(). - Missing worker URL: Pipeline fails immediately with a descriptive error.
- Empty corpus: If no submissions have comments or no submissions pass the sentiment gate, the pipeline fails gracefully.
Cancellation
Endpoint: POST /analysis/pipelines/:id/cancel
Sets pipeline to CANCELLED. Only works on non-terminal pipelines. In-flight BullMQ jobs will still complete but their callbacks detect the terminal status and no-op.
Status Inspection
Endpoint: GET /analysis/pipelines/:id/status
Returns a structured response with:
- Pipeline status and scope
- Coverage stats (totalEnrolled, submissionCount, commentCount, responseRate)
- Per-stage status (pending/processing/completed/failed/skipped)
- Sentiment gate statistics
- Warnings and error messages
- Timestamps (created, confirmed, completed)