RAG Query Reference
This page is a reference for Unfault’s RAG query endpoint and the routing logic used by unfault ask.
It focuses on what the system does and what shapes it returns. For usage examples, see the CLI reference.
Endpoint
Section titled “Endpoint”POST /rag/query
Returns a context pack suitable for:
- direct CLI display (structured context)
- optional LLM synthesis (the CLI sends the context to your configured provider)
Request (RAGQueryRequest)
Section titled “Request (RAGQueryRequest)”| Field | Type | Default | Notes |
|---|---|---|---|
query | string | - | Required. 3 to 1000 characters. |
workspace_id | string | null | null | Optional scope. Most queries are answered within a single workspace. |
max_sessions | int | 5 | 1 to 20. Session summaries are project-level context. |
max_findings | int | 10 | 1 to 50. Findings are issue-level context. |
similarity_threshold | float | 0.5 | 0.0 to 1.0. Applies to vector retrieval. |
graph_data | object | null | null | Optional ClientGraphData from the CLI. Used for local graph-aware answers (flow, enumerate, SLOs) without relying on stored graph fidelity. |
Response (RAGQueryResponse)
Section titled “Response (RAGQueryResponse)”The response is intentionally structured. It is not “the final answer”. The CLI can:
- print it as-is
- render it into a narrative with
--llm
Common fields
Section titled “Common fields”| Field | Type | Notes |
|---|---|---|
query | string | The original query. |
context_summary | string | A short summary of retrieved context. |
topic_label | string | null | Coarse label like “Workspace” or “Integration” when applicable. |
sessions | array | Session-level contexts (similar sessions). |
findings | array | Finding-level contexts (similar findings and enrichments). |
sources | array | Attribution for retrieved items. Used to explain provenance. |
routing_confidence | float | Confidence for the selected intent (best-effort). |
hint | string | null | Human hint when data is missing or the query is too vague. |
disambiguation | object | null | Structured follow-up guidance when a target cannot be resolved. |
Optional contexts
Section titled “Optional contexts”These are populated based on the routed intent and available data.
| Field | Type | When it appears |
|---|---|---|
workspace_context | object | null | Structural overview for workspace description queries (prefer client graph_data). |
graph_context | object | null | Usage/impact/dependencies/centrality slices from stored graph (and some client fallbacks). |
flow_context | object | null | Call paths for flow questions (prefer client graph_data, fall back to stored graph). |
slo_context | object | null | SLO coverage and unmonitored routes (prefer client SLO nodes). |
enumerate_context | object | null | Lists/counts (routes, workspaces, coverage summaries). |
graph_stats | object | null | Basic graph stats when graph_data is present. |
Routing and intents
Section titled “Routing and intents”Routing produces an execution plan with an intent (RouteIntent). Unfault uses:
- a small ML classifier when available
- regex-based fallback when ML is unavailable or low-confidence
Two thresholds are relevant:
ML_CONFIDENCE_THRESHOLD: 0.7 (below this, fall back to regex)ROUTING_CONFIDENCE_THRESHOLD: 0.35 (below this, routing is treated as weak)
Intent catalog
Section titled “Intent catalog”Each intent selects which contexts to attempt.
| Intent | Primary context(s) | What it is for |
|---|---|---|
overview | workspace_context | ”Describe this workspace”. Structural overview. |
coverage | enumerate_context | Cross-workspace endpoint coverage, best-effort. |
relationship | enumerate_context | Cross-workspace dependency direction, best-effort. |
flow | flow_context | Call-path tracing: “how does X work”. |
usage | graph_context | ”Who calls this” and “where is this used”. |
impact | graph_context | Change blast radius: “what breaks if”. |
dependencies | graph_context | Imports and external dependency view. |
centrality | graph_context | Hotspots and most-connected files/functions. |
observability | slo_context | SLO coverage and unmonitored routes. |
enumerate | enumerate_context | ”List all routes” / “how many endpoints”. |
semantic | sessions, findings | General semantic search over findings and session summaries. |
Retrieval behavior
Section titled “Retrieval behavior”Lazy embeddings
Section titled “Lazy embeddings”On a query, the API generates embeddings for a small number of recently completed sessions that are missing them (bounded to avoid a slow first query).
Semantic retrieval
Section titled “Semantic retrieval”Vector search returns:
- similar sessions (
sessions) - similar findings (
findings)
Filters can include:
- workspace scope
- language/framework hints parsed from the query
- rule id patterns derived from concept extraction
Concept filtering
Section titled “Concept filtering”If the query matches a known concept (for example “error handling” or “timeout”), the API applies rule id pattern filters so that results stay on-topic.
When a concept-targeted query returns no matching findings, the API emits an “all clear” hint and clears unrelated session context to avoid confusing output.
Diversification
Section titled “Diversification”For broad semantic queries (intent semantic without a targeted concept), the API may diversify results by rule type (for example max 2 findings per rule). Targeted concept queries do not diversify.
File-scoped fallback
Section titled “File-scoped fallback”If the query explicitly references a file and uses words like “risks” or “issues”, the API may override semantic results and return findings for that file token from the latest session.
This is a UX safeguard: users usually mean “filter by this file”.
Graph enrichment
Section titled “Graph enrichment”When an impact/usage slice returns dependent files, the API may fetch findings for those dependent files and attach them with source metadata marking the origin as dependent-file enrichment.
Disambiguation
Section titled “Disambiguation”When the plan needs a concrete target and cannot resolve it unambiguously, the response includes disambiguation.
It provides:
- a reason (why the system could not resolve the target)
- safe tokens to paste into a follow-up query
- candidate matches
This is the expected response shape for queries like “what breaks if I change auth” when there are multiple plausible auth targets.
Notes on fidelity
Section titled “Notes on fidelity”Some intents prefer graph_data from the CLI and only fall back to stored graph data.
This matters when:
- the stored graph is stale (review not re-run after a refactor)
- dynamic languages produce multiple plausible resolutions
- flow/SLO data exists locally but was not persisted
If an answer seems inconsistent with the current working tree, re-running unfault review usually resolves the mismatch.