The Write-Test Cycle
The most reliable autonomous pattern: Claude writes code, runs tests, checks output, iterates until green. For more examples, see the official common workflows. The key ingredient is a verifiable success condition. Without one, Claude can’t self-correct.| Task Type | Success Condition |
|---|---|
| Unit tests | All tests pass |
| Integration tests | API returns expected responses |
| CLI tool | Command produces expected output |
| Refactor | All existing tests still pass |
| Bug fix | Failing test now passes, no regressions |
npm test until all tests pass.”
tmux for Interactive Testing
Some tools can’t be tested with a simple command — they need interactive input, persistent sessions, or real-time output. Use tmux to let Claude drive interactive tools:Git Bisect with Claude
When something broke and you don’t know when:- Give Claude a test script that fails on the current commit
- Tell it to use
git bisectto find the breaking commit - Claude runs the binary search automatically, testing each commit
Browser-Based Verification
Unit tests verify logic. Browser tests verify what users actually see. Use Playwright to let Claude verify rendered output:- “Run the app, navigate to
/dashboard, and verify the chart renders with data” - “Fill out the signup form and verify the success message appears”
- “Check that the mobile nav menu opens and closes correctly”
Background Testing
Run a second Claude session that continuously tests your app while you work on features in another session:- Open a second terminal in the same project (or a worktree)
- Tell Claude: “Run the full test suite every 5 minutes and alert me if anything breaks”
- Continue working in your primary session
Designing for Overnight Autonomy
The Write-Test Cycle works for minutes. For hours-long autonomous runs, you need more structure. Verification-first architecture: Define the success condition before the agent starts. The agent loops on implementation until verification passes — not until it “feels done.”- Stop hooks can enforce completion conditions — the agent can’t finish until tests pass or tasks are marked done. See the hooks playbook.
- Escape hatches prevent infinite loops: cap retries, time-box exploration, create TODOs for blockers instead of spinning.
- Progress files (e.g.,
PROGRESS.md) let fresh agents resume where the last one left off after context compaction.
- Commit current state (rollback point)
- Branch for the autonomous work
- Define verifiable success conditions
- Set escape hatches (max retries, TODO-on-block)
- Use hooks to enforce quality gates
Safety Rails
Autonomous agents are powerful but need guardrails. Before delegating unsupervised work:- Commit first. Always. Non-negotiable.
- Branch for risky work. Migrations, large refactors, dependency upgrades — branch so main stays clean.
- Pre-commit hooks block dangerous patterns. Secret leaks,
rm -rf, force pushes. See the hooks playbook. - Use containers for
--dangerously-skip-permissions. If you need to give Claude unrestricted access, run it in a container where the blast radius is limited.
← Prev: Prompting Craft · Next: Self-Improving Loop →