AGENTS.md is the operating manual, not a vibe

The premise most agent posts skip

Most writing about AI coding agents stops at “I asked it to build a feature and it did.” The interesting regime is the one after that: multiple agents, running in parallel, against a codebase too big to hold in one context window, where the failure mode isn’t a bad function — it’s two agents clobbering each other’s uncommitted work, or one confidently re-inventing a convention the codebase already settled six months ago.

RoleReady is built that way day to day. I run agents in isolated git worktrees — Conductor spins each one up under ~/repo/.conductor/<name>/ with its own filesystem copy — so several can work at once without sharing a working tree. That setup only works because the repo treats agent behavior as something you specify, not something you hope for. The specification is AGENTS.md: 29 non-negotiable rules and a map of where to look. It’s the most valuable file in the project, and it contains zero application code.

Rules, not suggestions

The file opens by naming the stack and then lays down law. A few that earn their place:

Bun only (bun, bun x) — never npm/yarn/pnpm.

Never use git stash — multiple agents share this worktree; stashing clobbers their uncommitted work. Use git diff / targeted tests instead.

If launched from a Conductor worktree (~/repo/.conductor/<name>/), stay in it — never read the root repo.

That git stash rule looks petty until you’ve watched an agent helpfully stash to “clean up,” wiping another agent’s half-finished change that was never committed. The rules aren’t style preferences; each one is a scar. The worktree-isolation rule is the same: an agent that wanders into the root repo reads stale conventions and “fixes” things in a checkout no one asked it to touch.

The database rule is the sharpest, because it’s where an agent can do real damage:

Schema changes: edit lib/database/schema.ts only, then ask the maintainer to run bun x drizzle-kit push — never run push yourself, never use drizzle-kit generate/migrate, never add migration SQL files.

A push against the wrong database drops columns. So agents are allowed to propose a schema change and explicitly forbidden to apply it. The dangerous verb requires a human.

Read the doc before you touch the area

The rule I’d port to every codebase is this one:

Everything else is in docs/ — read the matching doc before working in an area; do not guess conventions. Before you start writing code, state which doc(s) you read (or that none applied) so the maintainer can verify you didn’t skip them.

Two things make this work. First, the documentation is lazy-loaded — there’s an index, and an agent pulls only the doc for the subsystem it’s about to touch, instead of trying to swallow the whole architecture into context. Second, the agent has to declare what it read. That single requirement catches the most common agent failure mode — confidently guessing a convention — before any code gets written, because a maintainer reviewing the PR can see “read none” and know exactly why the patterns are wrong.

Three layers of enforcement

Writing rules down isn’t enough; agents (and tired humans) skip prose. So the conventions are enforced at three levels, and the levels back each other up.

Declarative — config the tools read directly. opencode.json pins AGENTS.md and CLAUDE.md as always-loaded instructions. The Claude Code settings pre-approve a Bash allowlist (bun, docker, git, doppler, psql, gcloud, terraform) so agents don’t stall on permission prompts for safe commands, while anything destructive still stops.

Procedural — the human-readable rules above, plus a fixed verification order that mirrors CI exactly:

baml:generate → lint → typecheck → typecheck:tests → test:unit → build. Run targeted tests first, then broaden.

An agent that follows this in order fails fast and cheap, instead of discovering at build what typecheck would have caught in seconds.

Automated — gates that don’t care what the agent intended. A pre-push git hook diffs the changed .ts/.tsx files against the base branch and runs knip — an unused-export detector — on just that changeset, blocking the push if an agent left dead code behind. It runs on the diff, not the whole repo, so it’s fast enough to never be the reason someone reaches for --no-verify. CI then re-runs the full sequence and, for integration tests, spins up Postgres and pushes the schema to two databases before the suite runs.

The point of three layers is that no single one has to be perfect. An agent can ignore prose, but the hook still catches dead code; a hook can be bypassed, but CI still fails the PR.

What changed in how I work

Before this, reviewing agent output meant re-deriving the codebase’s conventions in my head for every diff and checking the change against them manually. Now most of that check is mechanical: did it cite the doc, did it follow the enqueue contract, did the hooks pass, did it leave the schema alone. My attention goes to the part that actually needs judgment — is this the right change — instead of “did the agent respect rule 14 again.”

The mental shift is that AGENTS.md isn’t documentation about the system. It’s part of the system’s control plane. A new agent — or a new contributor, or future-me — gets handed the same manual and the same gates. Low surprise by construction. That’s the whole game when more than one mind, human or not, is editing the same code at the same time.