6 min read · For engineering leaders, attorneys, and risk teams deciding how to govern AI agents in their environment
Most "AI agent governance" conversations are stuck in one of two camps:
| Layer | What it does | Why it isn't enough alone |
|---|---|---|
| Decision-layer | Prompt rules embedded in the system prompt. AI judge models that score proposed actions. Human-in-the-loop workflow principles. RLHF training that adjusts model weights toward safe output. | The model can forget the rule under context compression, reason around it as a new edge case, or produce output the human approver misses. Decision-layer governance is what Sullivan & Cromwell had when they filed AI-hallucinated citations with a federal judge. |
| Action-layer | A runtime gate that inspects the proposed tool call (bash, SQL, file write, MCP call, outbound LLM call) at PreToolUse and returns allow / warn / block / route-to-human deterministically. | A static rule set is generic. It doesn't learn from your team's incidents. It doesn't get sharper after each blocked action. Every team gets the same starter pack. The piece that makes the gate yours is missing. |
ThumbGate is the loop, not the hook. Four stages, all in your environment, all closing the gap between "the user noticed something was wrong" and "the next agent action that pattern matches is gated automatically."
Generic governance rules ("don't push to main") are table stakes. The rules that actually matter for your firm or your team are the ones derived from your incidents — the specific UPL phrasing your associates flag, the matter-number conventions your privilege review uses, the deploy commands your platform team blocks. The loop encodes those automatically.
A rule in the prompt vanishes when the next Claude or Gemini release reinterprets context. A rule promoted from your lesson DB lives in your infrastructure, separate from any model vendor. Model upgrades reset training-baked safety; the lesson DB doesn't.
Sullivan & Cromwell's hallucination was a one-time embarrassment for them. If their lesson DB had captured the pattern and promoted it to a rule, every subsequent intake across every S&C office would have the gate firing the moment a similar advice-shaped output was proposed. The first 👎 becomes the rule that catches the next ten cases.
Anthropic's Claude for Legal, Harvey's BigLaw assistants, Cursor's agent mode, OpenAI's Codex CLI — all of these are decision-layer products. The loop runs underneath them, agnostic to which vendor generated the proposed action. You don't pick a vendor and live with their generic safety; you pick the vendor for capability and own the safety yourself.
RLHF (Reinforcement Learning from Human Feedback) is decision-layer governance — it adjusts the model's weights so the model is statistically less likely to produce bad output. ThumbGate's loop is the opposite category. It doesn't touch the model. The lesson DB and the promoted rules live outside the model context.
| Property | RLHF / model fine-tuning | ThumbGate feedback loop |
|---|---|---|
| Where the change lives | Inside the model weights | Outside the model, in your lesson DB |
| Who controls it | Model vendor | Your team |
| How many examples to shift behavior | Millions | Default Thompson Sampling threshold — a small number of consistent 👎 |
| What happens at model upgrade | Training-baked safety partially resets | Rules survive; loop continues |
| Auditability | Opaque weights; explainability is best-effort | Rule ID + version + matched pattern + audit ID, deterministic |
The loop is the product. The hook is the moment of enforcement. The DB is the memory. The promotion is the learning. Without all four, "agent governance" is either a prompt rule or a static gate — both useful, neither sufficient.
The PreToolUse hook is the endpoint. The loop is what makes it learn. Two minutes to install.