Skip to content

World Model

When unfault flags a missing timeout, the finding itself is the least interesting part. The interesting part is what happens downstream: which services are affected, which SLO is at risk, and how confident we are in that assessment.

This page explains the machinery behind that: the code graph, the propagation model, and how runtime data from SLOs and distributed traces augments what static analysis can see on its own.


A rule fires on a file. The finding names a line. That’s useful, but it answers the wrong question. The developer already knows the line exists; what they don’t know is whether it matters enough to fix before shipping.

Traditional static analysis can’t answer that because it has no model of the system. It sees the code but not the architecture. It sees the function but not which HTTP handler calls it, which SLO covers that handler, or what happens to a downstream service if a retry storm starts here at 3am.

unfault builds a graph that lets it answer those questions.


The foundation is a directed graph built by the Rust/Tree-sitter parser:

  • File nodes: one per source file, with language and path
  • Function nodes: functions and methods, with HTTP handler metadata when applicable
  • ExternalModule nodes: third-party libraries, categorised (HttpClient, Database, etc.)
  • Edges: Imports, ImportsFrom, Calls, Contains, UsesLibrary

This is a static graph. It captures the structural shape of the codebase from source alone, without executing anything. It’s fast, deterministic, and complete within the repository boundary.

The graph is the starting point. On its own it enables blast radius queries (“what imports this file?”) and centrality analysis (“which file is most depended upon?”). What it can’t do alone is connect a code-level finding to a business objective.


Two optional enrichment passes extend the graph beyond the repository boundary.

When GCP Cloud Monitoring, Datadog, or Dynatrace credentials are present, unfault fetches SLO definitions and matches them to HTTP route handlers in the graph using path patterns. Each matched handler gets a MonitoredBy edge pointing to a GraphNode::Slo:

Function(POST /checkout) --[MonitoredBy]--> Slo("Checkout API 99.9%")

SLO nodes are the top tier of the hierarchy: they represent what “success” means for a user journey. When the propagation model reaches an SLO node, it has a concrete answer to “what breaks”: not an inferred entrypoint, but a declared availability target.

For service-level SLOs (those without a specific path pattern), unfault matches the GCP service slug embedded in the SLO resource name against the local workspace directory name. This prevents sibling services in the same GCP project from being incorrectly linked to the wrong codebase.

When GCP Cloud Trace credentials are present, unfault fetches recent spans from the Cloud Trace v1 API and extracts cross-service call patterns. Each distinct remote service observed in RPC_CLIENT spans (or outbound HTTP spans, since Cloud Run’s OTEL exporter omits the kind field) becomes a GraphNode::RemoteService, linked to the local file that makes the call:

File(payments/client.py) --[RemoteCall]--> RemoteService("inventory-service")

Service name extraction works in layers: peer.service label first, then /http/host, then span name heuristics (Sent.<Service>, gRPC patterns), then URL host. Kubernetes FQDNs are stripped to the service name component; public internet hostnames (.googleapis.com, .github.com, etc.) are kept intact.

The value of RemoteCall edges is that they extend the propagation model across service boundaries. A finding in a file that calls an external service now has a propagation path that crosses the repository boundary, which is categorically different from a local failure, because there’s no local recovery path.


Given a finding at a file, the model asks: if this file breaks, what is the furthest meaningful thing that breaks with it?

The answer is computed by a weighted BFS that traverses the graph in two directions simultaneously.

Calls and Imports edges point from dependent to dependency. To find everything affected when a file breaks, we walk against these edges, collecting everything that imports or calls the failing file:

File(db.py) <--[Imports]-- File(auth.py) <--[Imports]-- File(main.py)

This is the blast radius direction. It answers “who depends on me.”

MonitoredBy and RemoteCall edges point forward toward consequences. We follow these in the normal direction to reach anchors:

Function(handler) --[MonitoredBy]--> Slo("Checkout API")
File(payments.py) --[RemoteCall]--> RemoteService("inventory-service")

Contains edges (File → Function) are also traversed forward with zero weight, so the model can reach MonitoredBy edges on function nodes from findings that land on the parent file.

Each edge type carries a propagation weight representing the conditional probability that a failure at the source materialises at the target:

EdgeWeightRationale
Calls0.80Direct invocation; caller blocks on callee
Imports / ImportsFrom0.50Structural dependency; indirect but real
Contains0.00Traversal only, no additional risk
RemoteCall0.90Cross-service; no local circuit breaker assumed
MonitoredBy1.00Reaching the SLO confirms macro-goal impact

The aggregate risk is the complement probability product across hops:

risk = 1 - ∏(1 - weight_i)

This is the “at least one failure propagates” probability under the independence assumption, expressed as a percentage. A two-hop path through Imports (0.5) and RemoteCall (0.9) gives 1 - (0.5 × 0.1) = 95%.

The BFS selects the best anchor found in priority order:

  1. SLO node: highest confidence. The finding is tied to a declared availability target with a specific percentage and timeframe.
  2. RemoteService node: present when trace data is available. Signals a cross-service boundary, which matters because there is no local recovery path.
  3. Inferred entrypoint: fallback. The nearest file with no importers (a root of the import tree) is used as a proxy for the request entry point.

When no anchor is reachable (isolated file, no SLOs configured, no traces), the risk score is zero and the system view line is omitted from the output.


The result of the propagation model is attached to every SystemHazard as a PropagationPath:

hops: [payments/client.py, checkout_handler.py, SLO: Checkout API]
aggregate_risk: 95.0
macro_goal: "Checkout API 99.9%"
anchored_to_slo: true

This drives the ↳ puts line in the review output:

🟡 payments/client.py:48 · The Retry Storm
HTTP call via httpx.AsyncClient has no retry policy
↳ puts Checkout API (99.9%) at risk (95%)

SLO and trace fetches are cached at .unfault/cache/enrichment/ with a 5-minute TTL, keyed on (project_id, workspace_name). The review footer distinguishes cache hits (cached, green) from live fetches (fetch Xms, yellow), so the source of latency is always visible.

Terminal window
unfault review # uses cache if fresh
unfault review --refresh-cache # bust cache, re-fetch from providers
unfault review --offline # skip enrichment entirely

The three-tier structure (primitives, sub-goals, macro-goals) was shaped in part by reading two papers from early 2026:

Dupoux, LeCun, Malik et al. (arXiv:2603.15381) proposes a cognitive architecture with three learning modes, including a meta-controller (System M) that switches between passive observation and active exploration based on internal signals. The framing of a system that reasons at multiple levels of abstraction, rather than applying flat rules, is what we were reaching for. The analogy isn’t precise: unfault’s “meta-controller” is just the propagation model deciding which anchor is most relevant, not a learned policy. But the vocabulary was useful for thinking about the problem.

Zhang et al. (arXiv:2604.03208) presents hierarchical planning with latent world models for robotic manipulation. The key result is that planning at multiple temporal scales (a high-level planner generating sub-goal waypoints, a low-level planner executing them) dramatically outperforms single-level planning on compositional tasks. The structural parallel to code analysis is real: a finding at a line (primitive) is only interpretable in the context of the call chain (sub-goal) it belongs to, which is only meaningful against the business objective (macro-goal) it serves. We’re not building a latent world model or doing any learning; the code graph is our world model, and it’s deterministic.

The honest summary: these papers articulated a way of framing the problem that we found useful. The implementation is a weighted BFS over a directed graph, with SLO and trace data bolted on as optional enrichment. Nothing exotic.


The independence assumption is wrong. The complement probability product assumes each hop fails independently. In practice, failures are correlated; a database outage hits every service that uses it simultaneously. The risk scores are a relative ranking, not calibrated probabilities.

Static graphs miss dynamic dispatch. If a file calls a function through an interface, the graph may not capture the concrete implementation. The propagation model is conservative (it uses what it can see) but it can miss paths.

Trace coverage is partial. Cloud Trace only captures what was instrumented and exercised recently. A code path that hasn’t been hit in the last hour won’t appear as a RemoteCall edge. Enabling OTEL instrumentation on all services and ensuring regular traffic will improve coverage.

Service matching is heuristic. SLOs are matched to the local workspace by comparing the GCP service slug or SLO display name against the directory name. This works in conventional layouts but will produce incorrect results in monorepos where the directory name doesn’t match the deployed service name. The path pattern mechanism (setting /path labels on SLOs) is more reliable when it’s available.