The Cost Reality
| Approach | Approximate cost multiplier | When it’s worth it |
|---|---|---|
| Single session | 1x (baseline) | Sequential tasks, focused work |
| Subagents | 4-7x | Parallel research, independent file edits |
| Agent Teams | ~15x | Tasks requiring inter-agent coordination |
Model Routing Strategy
The most impactful cost optimization: use the cheapest model that can do the job at each stage.| Stage | Model | Why |
|---|---|---|
| Research / exploration | Haiku | Fast, cheap, only needs to read and summarize |
| Implementation | Sonnet | Good balance of capability and cost |
| Architecture / coordination | Opus | Complex reasoning, big-picture decisions |
| Validation | Sonnet | Needs to understand code, doesn’t need to generate much |
The opusplan Pattern
The highest-leverage cost optimization: Opus plans, Sonnet executes. Use Opus for architecture decisions and coordination, then route implementation to Sonnet agents. This cuts 60-80% of token costs while maintaining quality where it matters.| Technique | How | Savings |
|---|---|---|
| opusplan | Opus plans + Sonnet implements | 60-80% |
Mid-session /model switch | Start with Opus, switch to Sonnet for execution | 40-60% |
think harder / ultrathink | Extended thinking on Sonnet instead of switching to Opus | ~80% vs Opus |
| Haiku for research | Route exploration agents to Haiku via model: haiku in agent frontmatter | 90%+ vs Opus |
Practical Tips
Minimize context pollution
- Verbose output degrades agent performance and burns tokens. Keep agent prompts focused.
- Use the Explore agent (Haiku, read-only) for broad searches instead of having your main Opus session read dozens of files.
Speed up test suites
- Include
--fastoptions for test suites in your CLAUDE.md — agents waste time running full suites when a subset would suffice. - Route test-running agents to Sonnet; they don’t need Opus-level reasoning.
Structure CLAUDE.md for multi-agent
- Include module boundaries so agents know which files they own.
- List verification commands so every agent can self-check.
- Mark shared files (e.g.,
package.json,tsconfig.json) as “coordinate before editing.”
Use /batch for independent work
/batch is dramatically cheaper than Agent Teams for work where agents don’t need to communicate.
Reference: Real-World Scale
For perspective: Anthropic’s C compiler project — 16 agents over 2,000 sessions across two weeks — consumed approximately 2 billion input tokens and 140 million output tokens. Key takeaways:- Write extremely high-quality tests — agents solve whatever the verifier validates
- Minimize verbose output — context pollution degrades performance
- Maintain README/progress files so each fresh agent gets orientation context quickly
- The constraint is always review bandwidth, not compute
← Prev: Agent Teams