Inside the Engine: How the CLI Works
Why we chose a heavy client architecture for distributed analysis.
The CLI is the primary “sensor” of the Unfault platform. It is designed around the “Heavy Client” principle: we move the compute-intensive task of parsing and graph construction to the user’s machine to ensure privacy and distribute the computational load.
The Analysis Pipeline
The unfault review command executes a strictly ordered pipeline, designed to fail fast and prioritize local computation.
1. Discovery
We use the ignore crate to traverse the filesystem. This ensures we inherently respect .gitignore and .ignore files, preventing the accidental ingestion of artifacts or vendored dependencies. We filter strictly for supported extensions (.py, .go, .rs, .ts, .tsx).
2. Local Parsing (Parallel)
This is the CPU-intensive phase. We use rayon to parallelize tree-sitter parsing across all available cores.
- Library:
unfault-corehandles the raw parsing. - Extraction: We extract
Semantics(signatures, imports, complexity metrics) into memory. - Constraint: We explicitly do not read full function bodies into the IR. We only extract what is necessary for architectural inference (calls, type usage).
3. Graph Construction
Once semantics are extracted, we build a local petgraph.
- Nodes: Functions, Classes, Modules.
- Edges:
Calls,Imports,Inherits. - Resolution: We perform local symbol resolution to connect call sites to definitions. This is a “best-effort” static analysis - we favor recall over precision to handle dynamic languages like Python.
4. Enrichment
If observability discovery is active, the SloEnricher module executes.
- It authenticates against providers (GCP, Datadog) using local user credentials.
- It retrieves Service Level Objectives (SLOs).
- It mutates the local graph, adding
MonitoredByedges to relevant route nodes.
5. Ingestion & Streaming
To handle large monorepos without timing out, we decouple structure from data.
-
Phase 1: Skeleton Ingestion We serialize the
Graphtopology (nodes and edges) without the heavy semantic payloads. This lightweight structure is POSTed to/api/v1/graph/ingest. The API allocates the session and builds the empty graph in memory. -
Phase 2: Semantic Streaming We stream the detailed
Semanticsobjects in chunks to/api/v1/graph/analyze/chunk.- Protocol: Framed MessagePack.
- Compression: Zstandard (zstd).
- Adaptive Chunking: The
SemanticsChunkeradjusts batch sizes dynamically based on the compression ratio of previous chunks to maintain optimal throughput.
Data Privacy
The architecture enforces privacy by design. The source code is never serialized. The API receives only the Intermediate Representation (IR), which contains structural metadata, signatures, and derived metrics, but not the implementation logic.