Staleguard

The architecture

Three layers, escalating only when needed

Each layer is cheaper and higher-signal than the next, so most drift is caught before any model runs. Layer 1 is instant and needs nothing; Layers 2–3 run local ONNX models — code never leaves the machine.

I
DeterministicPaths exist? Commands real? Config keys present? Architecture rules parsed from prose, checked against the real import graph. No ML — tuned for zero false positives.
instant · no model~1.2s on 330k LOC
II
RetrievalFor each surviving claim, local embeddings fetch the most-relevant code chunks — symbol-aligned via tree-sitter, with an optional reranker.
local embeddingsjina · on-device
III
VerificationA code-aware NLI cross-encoder judges (evidence, claim) → supported · contradicted · unverifiable, each with a confidence.
UniXcoder fine-tune~0.14s / claim

Underneath sits a drift ledger — makes runs incremental, scores alignment, and gates CI on regressions.

What it catches

Six classes of documentation drift

// REFERENCES

Broken references

file/dir paths quoted in docs that don't exist
commands (npm run, make) with no matching script or target
env vars & flags documented but never read
qualified code refs that resolve to no symbol

// ARCHITECTURE

Architecture violations

forbidden imports — "controllers must not import db"
layering — "domain depends on nothing"
independence — "core is independent of infra"
forbidden symbols outside their allowed module

// BEHAVIOR

Behavioral contradictions

a local NLI cross-encoder judges prose the rules can't
verdicts: supported / contradicted / unverifiable
claims ground to symbols — a verdict re-opens when that code changes

// COVERAGE

Coverage gaps

public code surface that no doc describes
risk-ranked by fan-in, churn, and complexity

// DIAGRAMS

Diagram coherence

Mermaid / PlantUML / Graphviz diffed against the real graph
phantom edges, stale boxes, missing arrows

// DRIFT

Drift over time

--diff <ref> re-checks only what changed
per-module & repo-wide alignment score
CI regression gate; fingerprint staleness

The Layer 3 judge

A code-aware NLI model, evaluated against real targets

The default judge is Arthur920/staleguard — a microsoft/unixcoder-base fine-tune. Code-aware, so real code stays in-distribution as the premise. The alert class — contradictions — is what we optimize for.

Contradiction precision

87.6%

low alert fatigue — flags you can trust

Contradiction recall

89.9%

catches ~9 in 10 contradictions

contradiction F1

0.887

neutral F1

0.843

entailment F1

0.690

macro F1

0.807

baseline* F1

0.386

fine-tuned (staleguard) *roberta-large-mnli baseline, macro F1

Get started

Install, run, gate CI, wire into agents

The default build gives you Layer 1 — the deterministic, zero-false-positive core that needs no models. Add the ml feature for Layers 2–3.

Homebrew, an install script, or from source — all give you the deterministic core. Then run check on the full repo.

shell

# Homebrew (macOS / Linux)
$ brew install Arthur920/tap/staleguard

# or from source
$ cargo install --git https://github.com/Arthur920/Staleguard

$ staleguard check                 # full repo, Layer 1

Layers 2–3 run local ONNX models behind the ml feature (prebuilt binaries omit it — the deps are large). Both routes compile from source, then fetch models at runtime.

shell

# Homebrew — compiles with the ml feature
$ brew install Arthur920/tap/staleguard-ml

# or with cargo
$ cargo install --git https://github.com/Arthur920/Staleguard --features ml

$ staleguard setup                 # fetch + load every model, offline thereafter
$ staleguard check --layer 3       # all three layers

staleguard check exits non-zero on any finding or a score regression — a drop-in for any pipeline. Commit a baseline on main, then gate PRs against it.

.github/workflows

# once, on the base branch — records the alignment baseline
$ staleguard check --write-ledger

# in CI on each PR — fail only if alignment regressed
$ staleguard check --fail-on-regression --format json

Staleguard speaks --format json, so any coding agent can run it and read findings back — directly via shell, or exposed as an MCP check_doc_drift tool.

agent

# let the agent call it directly
$ staleguard check --format json --diff main

# a good standing instruction in CLAUDE.md:
#   "After editing code or docs, run staleguard check
#    --format json and fix any reported drift."

Environment overrides

Variable	Effect
STALEGUARD_NLI_REPO	NLI judge model repo (default `Arthur920/staleguard`)
STALEGUARD_NLI_THRESHOLD STALEGUARD_NLI_MARGIN	decision thresholds — how far contradiction must out-score entailment
STALEGUARD_NLI_MAX_CLAIMS	per-run claim budget (default 300; `0` = no cap)
STALEGUARD_EMBED_REPO	Layer 2 embedding model
STALEGUARD_RERANK_REPO	optional reranker
STALEGUARD_ORT_THREADS	ONNX intra-op threads (default: all cores)