Skip to main content
Official reference — The official costs docs and model configuration docs cover pricing, caching, and the CLAUDE_CODE_SUBAGENT_MODEL setting. This page adds the opinionated routing strategy.

The Cost Reality

ApproachApproximate cost multiplierWhen it’s worth it
Single session1x (baseline)Sequential tasks, focused work
Subagents4-7xParallel research, independent file edits
Agent Teams~15xTasks requiring inter-agent coordination
The overhead comes from each agent loading project context independently, every message consuming tokens in both sender’s and receiver’s context, and broadcasts multiplying cost by teammate count.

Model Routing Strategy

The most impactful cost optimization: use the cheapest model that can do the job at each stage.
StageModelWhy
Research / explorationHaikuFast, cheap, only needs to read and summarize
ImplementationSonnetGood balance of capability and cost
Architecture / coordinationOpusComplex reasoning, big-picture decisions
ValidationSonnetNeeds to understand code, doesn’t need to generate much
Example pipeline:
Research (Haiku) → Plan (Opus) → Implement (Sonnet) → Validate (Sonnet, read-only)

The opusplan Pattern

Official reference — For model aliases, /model switching, and CLAUDE_CODE_SUBAGENT_MODEL, see the official model configuration docs. This section covers the routing strategy.
The highest-leverage cost optimization: Opus plans, Sonnet executes. Use Opus for architecture decisions and coordination, then route implementation to Sonnet agents. This cuts 60-80% of token costs while maintaining quality where it matters.
TechniqueHowSavings
opusplanOpus plans + Sonnet implements60-80%
Mid-session /model switchStart with Opus, switch to Sonnet for execution40-60%
think harder / ultrathinkExtended thinking on Sonnet instead of switching to Opus~80% vs Opus
Haiku for researchRoute exploration agents to Haiku via model: haiku in agent frontmatter90%+ vs Opus
The key insight: model choice is a per-task decision, not a per-session one. Switch models as the work shifts from reasoning to execution.

Practical Tips

Minimize context pollution

  • Verbose output degrades agent performance and burns tokens. Keep agent prompts focused.
  • Use the Explore agent (Haiku, read-only) for broad searches instead of having your main Opus session read dozens of files.

Speed up test suites

  • Include --fast options for test suites in your CLAUDE.md — agents waste time running full suites when a subset would suffice.
  • Route test-running agents to Sonnet; they don’t need Opus-level reasoning.

Structure CLAUDE.md for multi-agent

  • Include module boundaries so agents know which files they own.
  • List verification commands so every agent can self-check.
  • Mark shared files (e.g., package.json, tsconfig.json) as “coordinate before editing.”

Use /batch for independent work

/batch is dramatically cheaper than Agent Teams for work where agents don’t need to communicate.

Reference: Real-World Scale

For perspective: Anthropic’s C compiler project — 16 agents over 2,000 sessions across two weeks — consumed approximately 2 billion input tokens and 140 million output tokens. Key takeaways:
  • Write extremely high-quality tests — agents solve whatever the verifier validates
  • Minimize verbose output — context pollution degrades performance
  • Maintain README/progress files so each fresh agent gets orientation context quickly
  • The constraint is always review bandwidth, not compute

← Prev: Agent Teams