Cost-aware agent gates: rules first, models last.

7 min read · For teams trying to make agent governance fast enough to stay on by default

TL;DR: The expensive part of agent governance should not run on every action. ThumbGate should route checks through deterministic rules, semantic cache, local text classifiers, and local semantic recall before using an LLM judge. High-risk private ambiguity should stop for human review instead of calling a cloud model.

The pattern across the latest agent infrastructure work

The same lesson keeps showing up in different forms. Semantic caching cuts repeated LLM calls. Traditional text classifiers beat LLMs on speed and cost when labels are clear. Breadth-first query execution batches similar work instead of walking one branch at a time. Structured live dataset agents only become trustworthy when every row has source provenance. Streaming output removes dead air. Dynamic harnesses work best when critic, tournament, loop, and fan-out patterns are selected deliberately.

For ThumbGate, these are not separate product bets. They collapse into one control-plane rule: choose the cheapest reliable gate before the action runs.

The routing ladder

Lane Use when Why it is high ROI
Deterministic Secrets, force-push, destructive SQL, protected files, known repeated commands. Near-zero latency, no tokens, no provider call. This is the default for exact policy risk.
Semantic cache A prompt or action is semantically equivalent to a prior rejected or approved pattern. Returns the cached decision without rerunning the judge. This is the AISG-style buyer message applied to pre-action checks.
Rubric gate A critic/rubric loop failed a criterion, hit its cap, or lacks done evidence. Turns LangChain-style rubric iteration into an enforcement event: block completion claims until the missing proof exists.
Local classical classifier High-volume labels with enough examples and low ambiguity. Fast and cheap for routine feedback triage, import classification, and known error families.
Local semantic recall Few examples, fuzzy near-misses, or cross-session recurrence. Keeps private context local while catching cases regex and keyword routing miss.
LLM judge High-risk semantic ambiguity with explicit cloud permission and a budget cap. Useful for critic/rubric review, multi-document evidence review, and structured provenance checks, but not for every action.
Human review Private, regulated, payment, credential, customer-data, or unbounded external-posting risk. Prevents automation from laundering a risky decision through a model call.

What changed in ThumbGate

ThumbGate now has a small, testable routing primitive that makes this policy explicit:

node scripts/classifier-routing.js --risk=high --ambiguity=0.82 --allow-cloud --latency-ms=5000

That command returns an evidence-requiring LLM judge lane. Add --semantic-cache-hit, and it reuses the prior decision without a provider call. Add --rubric-failed or --structured-dataset --missing-provenance, and it blocks completion through the rubric gate. Change the same high-risk ambiguous input to --privacy-sensitive without --allow-cloud, and it routes to human review instead.

How the newer signals map to product work

Buyer proof: show the same risky action going through three routes: exact repeat blocked instantly, fuzzy repeat caught locally, and genuinely ambiguous production change paused for evidence or human review.

Implementation checklist

  1. Put exact denials and approval boundaries in deterministic checks.
  2. Cache semantically equivalent gate decisions with provenance and expiry.
  3. Use local text classification for routine high-volume feedback labels.
  4. Use local semantic recall for sparse, fuzzy, or cross-session lessons.
  5. Treat failed rubrics and missing source provenance as gate failures, not just evaluation notes.
  6. Reserve LLM judges for ambiguous high-value decisions with evidence requirements.
  7. Stream progress for long reviews and record every routed decision in the audit trail.

Try the routing primitive

Check the gate lane before spending tokens on a risky decision.

$ node scripts/classifier-routing.js --hard-rule --risk=critical
Install: npx thumbgate init GitHub →