Autonomous Task Patterns

The Write-Test Cycle

The most reliable autonomous pattern: Claude writes code, runs tests, checks output, iterates until green. For more examples, see the official common workflows. The key ingredient is a verifiable success condition. Without one, Claude can’t self-correct.

Task Type	Success Condition
Unit tests	All tests pass
Integration tests	API returns expected responses
CLI tool	Command produces expected output
Refactor	All existing tests still pass
Bug fix	Failing test now passes, no regressions

Give Claude the success condition upfront: “Write the migration and run npm test until all tests pass.”

tmux for Interactive Testing

Some tools can’t be tested with a simple command — they need interactive input, persistent sessions, or real-time output. Use tmux to let Claude drive interactive tools:

# Start a detached session
tmux new-session -d -s test-session

# Send a command
tmux send-keys -t test-session 'npm run dev' Enter

# Wait for output
sleep 2

# Capture what's on screen
tmux capture-pane -t test-session -p

This pattern works for CLIs, REPLs, dev servers, database consoles — anything that needs a persistent terminal session.

Claude can create and manage tmux sessions on its own. Just describe what you want tested and mention tmux as the approach.

Git Bisect with Claude

When something broke and you don’t know when:

Give Claude a test script that fails on the current commit
Tell it to use git bisect to find the breaking commit
Claude runs the binary search automatically, testing each commit

This turns a manual debugging session into an autonomous task with a clear answer.

Browser-Based Verification

Unit tests verify logic. Browser tests verify what users actually see. Use Playwright to let Claude verify rendered output:

“Run the app, navigate to /dashboard, and verify the chart renders with data”
“Fill out the signup form and verify the success message appears”
“Check that the mobile nav menu opens and closes correctly”

This catches CSS regressions, rendering bugs, and integration issues that unit tests miss entirely.

Background Testing

Run a second Claude session that continuously tests your app while you work on features in another session:

Open a second terminal in the same project (or a worktree)
Tell Claude: “Run the full test suite every 5 minutes and alert me if anything breaks”
Continue working in your primary session

You get continuous regression feedback without interrupting your flow.

Designing for Overnight Autonomy

The Write-Test Cycle works for minutes. For hours-long autonomous runs, you need more structure. Verification-first architecture: Define the success condition before the agent starts. The agent loops on implementation until verification passes — not until it “feels done.”

"Implement the data pipeline. After each change, run `make test-pipeline && make lint`.
Do not stop until both pass with zero errors. If stuck after 3 attempts on the same
error, create a TODO with the blocker and move to the next task."

Back-pressure patterns:

Stop hooks can enforce completion conditions — the agent can’t finish until tests pass or tasks are marked done. See the hooks playbook.
Escape hatches prevent infinite loops: cap retries, time-box exploration, create TODOs for blockers instead of spinning.
Progress files (e.g., PROGRESS.md) let fresh agents resume where the last one left off after context compaction.

The overnight checklist:

Commit current state (rollback point)
Branch for the autonomous work
Define verifiable success conditions
Set escape hatches (max retries, TODO-on-block)
Use hooks to enforce quality gates

Scheduled tasks in Claude Desktop can run agents on a cadence (hourly, daily) — useful for monitoring, audits, and recurring maintenance without interactive sessions.

Safety Rails

Autonomous agents are powerful but need guardrails. Before delegating unsupervised work:

Always commit before delegating — If Claude breaks something, git checkout . gets you back. Without a commit, your work is gone.

Commit first. Always. Non-negotiable.
Branch for risky work. Migrations, large refactors, dependency upgrades — branch so main stays clean.
Pre-commit hooks block dangerous patterns. Secret leaks, rm -rf, force pushes. See the hooks playbook.
Use containers for --dangerously-skip-permissions. If you need to give Claude unrestricted access, run it in a container where the blast radius is limited.

← Prev: Prompting Craft · Next: Self-Improving Loop →

Daily Workflow

Advanced

The Write-Test Cycle

tmux for Interactive Testing

Git Bisect with Claude

Browser-Based Verification

Background Testing

Designing for Overnight Autonomy

Safety Rails

Daily Workflow

Advanced

​The Write-Test Cycle

​tmux for Interactive Testing

​Git Bisect with Claude

​Browser-Based Verification

​Background Testing

​Designing for Overnight Autonomy

​Safety Rails

The Write-Test Cycle

tmux for Interactive Testing

Git Bisect with Claude

Browser-Based Verification

Background Testing

Designing for Overnight Autonomy

Safety Rails