AI Agent Governance

6 min read · For teams choosing where to enforce agent behavior

TL;DR: AI agent governance has four layers — prompt rules, decorator wrappers, pre-action hooks, and sandbox isolation. Each catches a different class of failure. Prompt rules alone fail because they live inside the agent's context and get evicted under pressure. The layer you actually need depends on whether your stakes are per-call (decorator), per-tool (hook), or per-process (sandbox).

What "governance" actually means here

The dominant failure mode of agent frameworks isn't that the agent picked the wrong next step in isolation — it's that the agent took an action you can't undo. Dropped a production table. Force-pushed over a colleague's commit. Wrote credentials to a public log file. Installed a package the team had already rejected.

Governance is the layer of code that decides whether an action is allowed before the action runs. The post-hoc audit log is forensics; governance is prevention.

The four layers

1. Prompt-level rules

Lives in CLAUDE.md, .cursorrules, system prompt, AGENTS.md.

Tell the agent what not to do. Cheap, fast, expressive.

Catches: stylistic and intent-level mistakes the agent can recognize.

Misses: anything the agent forgets when context pressure compresses the early instructions out. Misses every action the agent doesn't categorize as "rule-relevant" at decision time.

2. Decorator wrappers

In-process wrapper around the agent's tool functions. Examples: Rein (Python), function decorators in LangChain-style frameworks.

Wrap the function. The wrapper checks a policy before forwarding the call to the underlying tool.

Catches: actions through the wrapped function path. Per-call policy enforcement. Audit-trail capture.

Misses: actions that bypass the wrapped function — direct subprocess calls, raw HTTP requests, plugins that skip the framework's tool registry.

3. Pre-action hooks

Out-of-process intercept at the agent runtime's tool-call boundary. Examples: MCP PreToolUse hooks, Claude Code hooks, ThumbGate.

The hook runs in a separate process from the agent. The agent emits a tool-call intent; the hook accepts, rejects, or transforms it before the tool itself sees it.

Catches: every tool call the agent runtime mediates. Works across agent CLIs (Claude Code, Cursor, Codex, Gemini, Amp, Cline, OpenCode). Survives context-eviction because the hook is not in the agent's context.

Misses: actions taken by tools the runtime doesn't know about (e.g., a sub-process opening its own network socket after the initial allowed call).

4. Sandbox isolation

OS- or VM-level isolation. Examples: Docker / microVMs / seccomp / AppArmor / macOS sandbox profiles.

The agent runs inside a sandboxed environment that physically cannot reach production data, the host filesystem outside the project, or arbitrary network endpoints.

Catches: every action the agent might try, including ones the agent runtime doesn't even know about. Last-line defense.

Misses: nothing at the boundary, but operationally heavy. Requires sandbox-friendly tools, controlled mounts, and a usability cost — the agent can't reach things it legitimately needs without explicit grants.

Why prompt rules alone fail

Prompt rules live inside the agent's context. Context is finite. As the conversation grows, the early-system instructions are the first thing to lose attention weight when the model decides what to compress. By the time the agent is on hour two of a session, the CLAUDE.md rule that said "never force-push to main" may not even be in the active reasoning frame.

Decorator wrappers, pre-action hooks, and sandbox isolation all live outside the agent's context. They cannot be reasoned around by a model under context pressure — they're not part of the model's input at all.

Mental model: Prompt rules are speed limit signs. Decorators are speed bumps. Pre-action hooks are physical barriers at the intersection. Sandbox isolation is a chain-link fence around the whole street.

Where ThumbGate fits

ThumbGate runs at layer 3 — pre-action hooks. The hook process intercepts tool calls from Claude Code, Cursor, Codex, Gemini, Amp, Cline, and OpenCode before they fire. Three things make this layer worth the engineering minutes:

Context-eviction immunity. The hook runs in a different process. The agent can compress, forget, or hallucinate away the rule, and the rule still fires.
Cross-agent portability. One installation, every agent runtime the team uses. The rule "no force-push to main" applies whether today's session is Claude Code or Cursor.
Learning loop. When you give the agent a thumbs-down on a session-time mistake, the feedback becomes a structured prevention rule that auto-fires the next time the same pattern shows up. No policy-authoring step — the operator's correction is the policy.

Picking the layer that matches your stack

If you're writing prompt rules and they "work most of the time": add a hook layer. Most of the failures you've already absorbed are this category.
If you're a Python production app in a regulated domain: a decorator-level governance layer (e.g. Rein) is the right tradeoff. Per-call stakes are high enough to justify the integration cost.
If you're running AI coding agents (Claude Code, Cursor, Codex, Gemini, Amp, Cline, OpenCode): hook-level governance is the right layer. Your tool-call volume is high; per-call stakes are mixed; learning from feedback dominates.
If you're running agents on untrusted code or data: stack a sandbox layer on top. Hook + sandbox is the strongest combination.

These layers are not mutually exclusive. A real production setup typically combines them: prompt rules at the top for intent, a hook layer for tool-call enforcement, a sandbox at the bottom for blast radius.

The honest constraint

No governance layer is correct if the rules you encode in it are wrong. Layers 2, 3, and 4 enforce policy; they don't generate it. The hard part of governance has never been the enforcement boundary — it's knowing what the agent will get wrong before the agent gets it wrong. That's why ThumbGate emphasizes the feedback loop: most teams can't write the rule until they've watched the agent fail in their codebase. The thumbs-down is the rule's origin.

Pick the layer. Install it once. Move on.

For AI coding agents, the hook layer is where the leverage is. ThumbGate ships as npx thumbgate init and is MIT-licensed at the core.

$ npx thumbgate init

AI Agent Governance

What "governance" actually means here

The four layers

1. Prompt-level rules

2. Decorator wrappers

3. Pre-action hooks

4. Sandbox isolation

Why prompt rules alone fail

Where ThumbGate fits

Picking the layer that matches your stack

The honest constraint

Pick the layer. Install it once. Move on.

Related guides