Skip to main content

The Write-Test Cycle

The most reliable autonomous pattern: Claude writes code, runs tests, checks output, iterates until green. For more examples, see the official common workflows. The key ingredient is a verifiable success condition. Without one, Claude can’t self-correct.
Task TypeSuccess Condition
Unit testsAll tests pass
Integration testsAPI returns expected responses
CLI toolCommand produces expected output
RefactorAll existing tests still pass
Bug fixFailing test now passes, no regressions
Give Claude the success condition upfront: “Write the migration and run npm test until all tests pass.”

tmux for Interactive Testing

Some tools can’t be tested with a simple command — they need interactive input, persistent sessions, or real-time output. Use tmux to let Claude drive interactive tools:
# Start a detached session
tmux new-session -d -s test-session

# Send a command
tmux send-keys -t test-session 'npm run dev' Enter

# Wait for output
sleep 2

# Capture what's on screen
tmux capture-pane -t test-session -p
This pattern works for CLIs, REPLs, dev servers, database consoles — anything that needs a persistent terminal session.
Claude can create and manage tmux sessions on its own. Just describe what you want tested and mention tmux as the approach.

Git Bisect with Claude

When something broke and you don’t know when:
  1. Give Claude a test script that fails on the current commit
  2. Tell it to use git bisect to find the breaking commit
  3. Claude runs the binary search automatically, testing each commit
This turns a manual debugging session into an autonomous task with a clear answer.

Browser-Based Verification

Unit tests verify logic. Browser tests verify what users actually see. Use Playwright to let Claude verify rendered output:
  • “Run the app, navigate to /dashboard, and verify the chart renders with data”
  • “Fill out the signup form and verify the success message appears”
  • “Check that the mobile nav menu opens and closes correctly”
This catches CSS regressions, rendering bugs, and integration issues that unit tests miss entirely.

Background Testing

Run a second Claude session that continuously tests your app while you work on features in another session:
  1. Open a second terminal in the same project (or a worktree)
  2. Tell Claude: “Run the full test suite every 5 minutes and alert me if anything breaks”
  3. Continue working in your primary session
You get continuous regression feedback without interrupting your flow.

Designing for Overnight Autonomy

The Write-Test Cycle works for minutes. For hours-long autonomous runs, you need more structure. Verification-first architecture: Define the success condition before the agent starts. The agent loops on implementation until verification passes — not until it “feels done.”
"Implement the data pipeline. After each change, run `make test-pipeline && make lint`.
Do not stop until both pass with zero errors. If stuck after 3 attempts on the same
error, create a TODO with the blocker and move to the next task."
Back-pressure patterns:
  • Stop hooks can enforce completion conditions — the agent can’t finish until tests pass or tasks are marked done. See the hooks playbook.
  • Escape hatches prevent infinite loops: cap retries, time-box exploration, create TODOs for blockers instead of spinning.
  • Progress files (e.g., PROGRESS.md) let fresh agents resume where the last one left off after context compaction.
The overnight checklist:
  1. Commit current state (rollback point)
  2. Branch for the autonomous work
  3. Define verifiable success conditions
  4. Set escape hatches (max retries, TODO-on-block)
  5. Use hooks to enforce quality gates
Scheduled tasks in Claude Desktop can run agents on a cadence (hourly, daily) — useful for monitoring, audits, and recurring maintenance without interactive sessions.

Safety Rails

Autonomous agents are powerful but need guardrails. Before delegating unsupervised work:
Always commit before delegating — If Claude breaks something, git checkout . gets you back. Without a commit, your work is gone.
  • Commit first. Always. Non-negotiable.
  • Branch for risky work. Migrations, large refactors, dependency upgrades — branch so main stays clean.
  • Pre-commit hooks block dangerous patterns. Secret leaks, rm -rf, force pushes. See the hooks playbook.
  • Use containers for --dangerously-skip-permissions. If you need to give Claude unrestricted access, run it in a container where the blast radius is limited.

← Prev: Prompting Craft · Next: Self-Improving Loop →