Skip to main content

A supervisor, a status file, and a worktree per agent

7 min read
ai-agents orchestration git-worktrees workflow

The problem isn’t the agent, it’s the fleet

One AI coding agent in a terminal is a solved, boring thing. The interesting failure modes start at n > 1: two agents editing the same working tree and clobbering each other’s uncommitted changes, an agent finishing while you’re not looking and then sitting idle, or you — the human — becoming the message bus, copy-pasting “are you done yet” into five panes. The agents are fine. The coordination is the hard part.

I run a small orchestration layer for exactly this. There’s a supervisor process and a fleet of worker agents (“crewmates”), and the whole thing is built on three primitives that are almost aggressively simple: a git worktree per agent, an append-only status file per task, and a watcher that stays asleep until something changes. This is how those pieces fit. (Separately, a project’s AGENTS.md is the rulebook each agent must follow inside its worktree — that’s a different post; this one is about the machinery around the agents.)

A disposable worktree per agent

The first rule is that no two agents ever share a working tree. Each task gets its own git worktree, spun up from a pool, on a detached HEAD on a clean default branch. The very first thing every agent does is prove it’s actually isolated before it touches anything:

pwd -P
git rev-parse --show-toplevel
# both must resolve to the disposable worktree, NOT the primary checkout

This check is load-bearing, and it’s the path that’s authoritative — not git rev-parse --git-dir, which can still point into the shared repo. If the top-level path is the primary checkout, the agent refuses to branch and reports blocked instead. The nightmare this prevents is an agent running git checkout -b in the directory a human is actively working in. Worktrees make that structurally impossible: each agent’s filesystem is its own, the branches are namespaced (fm/<task-id>), and when the task is done the worktree is returned to the pool and the window is killed. The supervisor itself never branches the primary checkout; the only sanctioned write it makes there is to put it back on the default branch if it ever drifts.

The reason worktrees beat full clones is they share the object store. Spinning one up is cheap, there’s no re-fetch, and a dozen of them cost a dozen working directories, not a dozen copies of the repo’s history.

The status file is the whole protocol

Agents don’t report progress through a chat channel the supervisor has to parse. Each task has one append-only file, and the agent writes a single line at each meaningful phase change:

echo "working: isolation verified, branch created, baseline green" >> state/<id>.status
echo "done: PR https://github.com/…/pull/1361 (all local checks green)" >> state/<id>.status

The grammar is five verbs and nothing else:

  • working: — proceeding normally
  • needs-decision: — a human has to choose something (product call, ambiguous scope)
  • blocked: — stuck on something external (a tool’s missing, CI is down)
  • done: — finished and ready for review
  • failed: — work failed irrecoverably

That constraint is the design. Because the vocabulary is tiny and terminal states are explicit, the supervisor — and even a plain shell loop — can reason about the whole fleet by reading the last line of each file. There’s no natural-language ambiguity to interpret. done: means a PR exists; needs-decision: means stop and escalate to the human; blocked: twice on the same obstacle means stop trying. The agents are also told to report sparingly — only phase changes a supervisor would act on — because every append is a wake, and a chatty agent is a supervisor that never sleeps.

A watcher that sleeps until there’s news

Polling five panes in a loop is how you burn tokens doing nothing. Instead the supervisor arms a watcher as a background process and goes to sleep. The watcher polls the cheap signals — file modification times on the *.status files and a turn-end marker — every fifteen seconds, and only wakes the supervisor when a signature actually changes:

for f in "$STATE"/*.status "$STATE"/*.turn-ended; do
  sig=$(stat_sig "$f")                 # size:mtime, cheap
  if [ "$sig" != "$(cat ".seen-$f" 2>/dev/null)" ]; then
    emit_signal "$f"                    # something changed — wake up
  fi
done

A few details earn their keep. Signals are coalesced with a short grace window, so two agents finishing within seconds of each other wake the supervisor once with both results, not twice. Wakes are written to a durable queue before the “seen” markers update, so if the watcher dies mid-scan the signal isn’t lost — it’s drained and de-duplicated on restart. And the heartbeat backs off exponentially when the fleet is quiet, from minutes toward hours, so an idle supervisor isn’t a busy one. The watcher also detects a third thing the status file can’t: a stale pane — output unchanged with no “busy” indicator across two scans — which is how you catch an agent that wedged without reporting.

The result is event-driven, not poll-driven. The supervisor is asleep almost all the time and wakes precisely when an agent has crossed a phase boundary.

Bounding parallelism by overlap, not by count

There’s no fixed concurrency cap. The bound is semantic: tasks in the same repo touching the same subsystem get serialized; everything else runs in parallel. Two agents adding unrelated features to the same app run at once and rebase before merge. Two agents both rewriting the auth layer do not — that’s just a merge conflict you scheduled on purpose. Deciding this at dispatch time, instead of letting everything run and sorting out conflicts after, is cheaper than it sounds, because the expensive resource isn’t CPU — it’s the human attention each conflict eventually demands.

Delivery mode is per-project, and it changes what “done” means:

  • direct-PR — the agent pushes its branch and opens the PR itself; the human reviews and merges.
  • no-mistakes — the agent runs a full validation pipeline before the PR is allowed out.
  • local-only — the agent stops at a branch and the supervisor merges to a local main, no remote.

This blog post was written by an agent running in direct-PR mode, in a disposable worktree, reporting up a status file exactly like the one above.

What’s actually worth stealing

You don’t need my supervisor. You need the three ideas under it. Worktrees give you real isolation for free — never let two agents share a working tree. An append-only status file with a five-word vocabulary turns “what are my agents doing” from a parsing problem into a tail command. And an event-driven watcher with coalescing and a durable queue is the difference between an orchestrator that sleeps and one that melts your token budget watching paint dry. The agents got good a while ago. The leverage now is in the boring layer that runs a fleet of them without you in the loop for every step.