Skip to content

RAG Query Reference

This page is a reference for Unfault’s RAG query endpoint and the routing logic used by unfault ask.

It focuses on what the system does and what shapes it returns. For usage examples, see the CLI reference.

POST /rag/query

Returns a context pack suitable for:

  • direct CLI display (structured context)
  • optional LLM synthesis (the CLI sends the context to your configured provider)
FieldTypeDefaultNotes
querystring-Required. 3 to 1000 characters.
workspace_idstring | nullnullOptional scope. Most queries are answered within a single workspace.
max_sessionsint51 to 20. Session summaries are project-level context.
max_findingsint101 to 50. Findings are issue-level context.
similarity_thresholdfloat0.50.0 to 1.0. Applies to vector retrieval.
graph_dataobject | nullnullOptional ClientGraphData from the CLI. Used for local graph-aware answers (flow, enumerate, SLOs) without relying on stored graph fidelity.

The response is intentionally structured. It is not “the final answer”. The CLI can:

  • print it as-is
  • render it into a narrative with --llm
FieldTypeNotes
querystringThe original query.
context_summarystringA short summary of retrieved context.
topic_labelstring | nullCoarse label like “Workspace” or “Integration” when applicable.
sessionsarraySession-level contexts (similar sessions).
findingsarrayFinding-level contexts (similar findings and enrichments).
sourcesarrayAttribution for retrieved items. Used to explain provenance.
routing_confidencefloatConfidence for the selected intent (best-effort).
hintstring | nullHuman hint when data is missing or the query is too vague.
disambiguationobject | nullStructured follow-up guidance when a target cannot be resolved.

These are populated based on the routed intent and available data.

FieldTypeWhen it appears
workspace_contextobject | nullStructural overview for workspace description queries (prefer client graph_data).
graph_contextobject | nullUsage/impact/dependencies/centrality slices from stored graph (and some client fallbacks).
flow_contextobject | nullCall paths for flow questions (prefer client graph_data, fall back to stored graph).
slo_contextobject | nullSLO coverage and unmonitored routes (prefer client SLO nodes).
enumerate_contextobject | nullLists/counts (routes, workspaces, coverage summaries).
graph_statsobject | nullBasic graph stats when graph_data is present.

Routing produces an execution plan with an intent (RouteIntent). Unfault uses:

  • a small ML classifier when available
  • regex-based fallback when ML is unavailable or low-confidence

Two thresholds are relevant:

  • ML_CONFIDENCE_THRESHOLD: 0.7 (below this, fall back to regex)
  • ROUTING_CONFIDENCE_THRESHOLD: 0.35 (below this, routing is treated as weak)

Each intent selects which contexts to attempt.

IntentPrimary context(s)What it is for
overviewworkspace_context”Describe this workspace”. Structural overview.
coverageenumerate_contextCross-workspace endpoint coverage, best-effort.
relationshipenumerate_contextCross-workspace dependency direction, best-effort.
flowflow_contextCall-path tracing: “how does X work”.
usagegraph_context”Who calls this” and “where is this used”.
impactgraph_contextChange blast radius: “what breaks if”.
dependenciesgraph_contextImports and external dependency view.
centralitygraph_contextHotspots and most-connected files/functions.
observabilityslo_contextSLO coverage and unmonitored routes.
enumerateenumerate_context”List all routes” / “how many endpoints”.
semanticsessions, findingsGeneral semantic search over findings and session summaries.

On a query, the API generates embeddings for a small number of recently completed sessions that are missing them (bounded to avoid a slow first query).

Vector search returns:

  • similar sessions (sessions)
  • similar findings (findings)

Filters can include:

  • workspace scope
  • language/framework hints parsed from the query
  • rule id patterns derived from concept extraction

If the query matches a known concept (for example “error handling” or “timeout”), the API applies rule id pattern filters so that results stay on-topic.

When a concept-targeted query returns no matching findings, the API emits an “all clear” hint and clears unrelated session context to avoid confusing output.

For broad semantic queries (intent semantic without a targeted concept), the API may diversify results by rule type (for example max 2 findings per rule). Targeted concept queries do not diversify.

If the query explicitly references a file and uses words like “risks” or “issues”, the API may override semantic results and return findings for that file token from the latest session.

This is a UX safeguard: users usually mean “filter by this file”.

When an impact/usage slice returns dependent files, the API may fetch findings for those dependent files and attach them with source metadata marking the origin as dependent-file enrichment.

When the plan needs a concrete target and cannot resolve it unambiguously, the response includes disambiguation.

It provides:

  • a reason (why the system could not resolve the target)
  • safe tokens to paste into a follow-up query
  • candidate matches

This is the expected response shape for queries like “what breaks if I change auth” when there are multiple plausible auth targets.

Some intents prefer graph_data from the CLI and only fall back to stored graph data.

This matters when:

  • the stored graph is stale (review not re-run after a refactor)
  • dynamic languages produce multiple plausible resolutions
  • flow/SLO data exists locally but was not persisted

If an answer seems inconsistent with the current working tree, re-running unfault review usually resolves the mismatch.