Agent Swarms: One Gate Layer, Every Model

6 min read · For teams running multi-agent systems across multiple LLMs

TL;DR: A 5-agent swarm without shared memory pays 5× the tokens on every repeated mistake. The fix is a single pre-action gate layer that every agent in the swarm consults before tool use — not 5 sets of prompt rules.

The swarm token-cost problem

Agent swarms route different parts of a task to different models — one model for front-end work, another for backend, a third for visual reasoning, a fourth for efficient bulk operations. The pitch is real: the right model for the right slice of work, in parallel.

The hidden cost is repetition. Each agent in the swarm has its own context window. When one agent makes a mistake — force-pushes to main, deletes a production file, paginates wrong against a rate-limited API — the other agents in the swarm have no idea. They are statistically likely to make the same mistake, because they were trained on the same internet and given the same prompt scaffolding.

The math is unkind: with N agents and a recurring mistake class, you pay the token cost of that mistake N times per task instead of once. The fix is not better prompting. It is shared state below the agents.

Why prompt rules cannot solve this

A natural reflex is to copy the same "do not force-push" rule into every agent's system prompt. This fails for three reasons:

Drift. Each agent's prompt evolves independently. After a few iterations, the rules diverge across the swarm without anyone noticing.
Reasoning around. A long-context agent can rationalize an exception to a prompt rule. There is no enforcement layer to say no.
No learning loop. When agent A makes a new mistake, agents B/C/D never find out. Each agent has to discover the same failure independently.

The shared-gate architecture

The pattern that actually works is one gate layer that every agent consults before any tool call. The gate layer holds a single lesson database, a single set of prevention rules, and a single PreToolUse hook surface.

Concern	Per-agent prompt rules	Shared gate layer
New mistake caught once	Other agents still vulnerable	All agents protected immediately
Rule consistency	Drifts across agents over time	Single source of truth on disk
Enforcement	Best-effort — agent can override	Hard block at the tool boundary
Token cost of repeated failures	N× per task (one per agent)	1× per task (gate refuses)

How ThumbGate implements this

ThumbGate is a single MCP server with a PreToolUse hook. Every agent in the swarm — whether it speaks to Claude, GPT, Gemini, or any other model — issues tool calls through the same MCP layer. The hook fires once per tool call, regardless of which agent issued it.

Three concrete properties make this work in a swarm:

Single feedback directory. Point every agent at the same .thumbgate/ on disk (or set the THUMBGATE_FEEDBACK_DIR environment variable). All lessons, gates, and feedback land in one place.
Model-agnostic hook. The PreToolUse hook does not care which model produced the tool call. It pattern-matches against the call itself.
Append-only feedback log. Concurrent thumbs-up/down captures from different agents are JSONL appends — no contention, no lock-out, no lost lessons.

The swarm becomes one learner. Agent A makes a mistake, you thumbs-down it, the gate is promoted. The next tool call from agent B, C, or D matching that pattern is blocked. You paid once for the lesson; every agent benefits.

What this does not solve

Honesty matters here. A shared gate layer is not a swarm orchestrator. It does not route work between agents, decide which model is best for a subtask, or balance load. Those are the swarm framework's job.

What the gate layer does is make the swarm cheaper to operate by removing the most expensive failure mode: paying the same mistake-tax N times. If your swarm framework gives you concurrency and the gate layer gives you shared safety memory, the combination is what makes multi-agent systems economically viable for production work.

Try it on your swarm

If your swarm runs locally, point every agent at the same project directory:

cd /path/to/your-project
npx thumbgate init

If your agents run as separate processes or containers, share the feedback dir via env var:

export THUMBGATE_FEEDBACK_DIR=/shared/thumbgate
# every agent in the swarm reads from and writes to this directory

That is the entire integration. The gate layer is now in front of every tool call in the swarm, learning from feedback captured anywhere in the swarm.

One gate layer. Every model in your swarm.

Works with Claude Code, Cursor, Codex, Gemini, Amp, OpenCode, and any MCP-compatible agent.

$ npx thumbgate init

Agent Swarms: One Gate Layer, Every Model

The swarm token-cost problem

Why prompt rules cannot solve this

The shared-gate architecture

How ThumbGate implements this

What this does not solve

Try it on your swarm

One gate layer. Every model in your swarm.

Related articles