RAG Query Reference

This page is a reference for Unfault’s RAG query endpoint and the routing logic used by unfault ask.

It focuses on what the system does and what shapes it returns. For usage examples, see the CLI reference.

Endpoint

POST /rag/query

Returns a context pack suitable for:

direct CLI display (structured context)
optional LLM synthesis (the CLI sends the context to your configured provider)

Request (`RAGQueryRequest`)

Field	Type	Default	Notes
`query`	string	-	Required. 3 to 1000 characters.
`workspace_id`	string \| null	null	Optional scope. Most queries are answered within a single workspace.
`max_sessions`	int	5	1 to 20. Session summaries are project-level context.
`max_findings`	int	10	1 to 50. Findings are issue-level context.
`similarity_threshold`	float	0.5	0.0 to 1.0. Applies to vector retrieval.
`graph_data`	object \| null	null	Optional `ClientGraphData` from the CLI. Used for local graph-aware answers (flow, enumerate, SLOs) without relying on stored graph fidelity.

Response (`RAGQueryResponse`)

The response is intentionally structured. It is not “the final answer”. The CLI can:

print it as-is
render it into a narrative with --llm

Common fields

Field	Type	Notes
`query`	string	The original query.
`context_summary`	string	A short summary of retrieved context.
`topic_label`	string \| null	Coarse label like “Workspace” or “Integration” when applicable.
`sessions`	array	Session-level contexts (similar sessions).
`findings`	array	Finding-level contexts (similar findings and enrichments).
`sources`	array	Attribution for retrieved items. Used to explain provenance.
`routing_confidence`	float	Confidence for the selected intent (best-effort).
`hint`	string \| null	Human hint when data is missing or the query is too vague.
`disambiguation`	object \| null	Structured follow-up guidance when a target cannot be resolved.

Optional contexts

These are populated based on the routed intent and available data.

Field	Type	When it appears
`workspace_context`	object \| null	Structural overview for workspace description queries (prefer client `graph_data`).
`graph_context`	object \| null	Usage/impact/dependencies/centrality slices from stored graph (and some client fallbacks).
`flow_context`	object \| null	Call paths for flow questions (prefer client `graph_data`, fall back to stored graph).
`slo_context`	object \| null	SLO coverage and unmonitored routes (prefer client SLO nodes).
`enumerate_context`	object \| null	Lists/counts (routes, workspaces, coverage summaries).
`graph_stats`	object \| null	Basic graph stats when `graph_data` is present.

Routing and intents

Routing produces an execution plan with an intent (RouteIntent). Unfault uses:

a small ML classifier when available
regex-based fallback when ML is unavailable or low-confidence

Two thresholds are relevant:

ML_CONFIDENCE_THRESHOLD: 0.7 (below this, fall back to regex)
ROUTING_CONFIDENCE_THRESHOLD: 0.35 (below this, routing is treated as weak)

Intent catalog

Each intent selects which contexts to attempt.

Intent	Primary context(s)	What it is for
`overview`	`workspace_context`	”Describe this workspace”. Structural overview.
`coverage`	`enumerate_context`	Cross-workspace endpoint coverage, best-effort.
`relationship`	`enumerate_context`	Cross-workspace dependency direction, best-effort.
`flow`	`flow_context`	Call-path tracing: “how does X work”.
`usage`	`graph_context`	”Who calls this” and “where is this used”.
`impact`	`graph_context`	Change blast radius: “what breaks if”.
`dependencies`	`graph_context`	Imports and external dependency view.
`centrality`	`graph_context`	Hotspots and most-connected files/functions.
`observability`	`slo_context`	SLO coverage and unmonitored routes.
`enumerate`	`enumerate_context`	”List all routes” / “how many endpoints”.
`semantic`	`sessions`, `findings`	General semantic search over findings and session summaries.

Retrieval behavior

Lazy embeddings

On a query, the API generates embeddings for a small number of recently completed sessions that are missing them (bounded to avoid a slow first query).

Semantic retrieval

Vector search returns:

similar sessions (sessions)
similar findings (findings)

Filters can include:

workspace scope
language/framework hints parsed from the query
rule id patterns derived from concept extraction

Concept filtering

If the query matches a known concept (for example “error handling” or “timeout”), the API applies rule id pattern filters so that results stay on-topic.

When a concept-targeted query returns no matching findings, the API emits an “all clear” hint and clears unrelated session context to avoid confusing output.

Diversification

For broad semantic queries (intent semantic without a targeted concept), the API may diversify results by rule type (for example max 2 findings per rule). Targeted concept queries do not diversify.

File-scoped fallback

If the query explicitly references a file and uses words like “risks” or “issues”, the API may override semantic results and return findings for that file token from the latest session.

This is a UX safeguard: users usually mean “filter by this file”.

Graph enrichment

When an impact/usage slice returns dependent files, the API may fetch findings for those dependent files and attach them with source metadata marking the origin as dependent-file enrichment.

Disambiguation

When the plan needs a concrete target and cannot resolve it unambiguously, the response includes disambiguation.

It provides:

a reason (why the system could not resolve the target)
safe tokens to paste into a follow-up query
candidate matches

This is the expected response shape for queries like “what breaks if I change auth” when there are multiple plausible auth targets.

Notes on fidelity

Some intents prefer graph_data from the CLI and only fall back to stored graph data.

This matters when:

the stored graph is stale (review not re-run after a refactor)
dynamic languages produce multiple plausible resolutions
flow/SLO data exists locally but was not persisted

If an answer seems inconsistent with the current working tree, re-running unfault review usually resolves the mismatch.