Cost & Model Routing

Official reference — The official costs docs and model configuration docs cover pricing, caching, and the CLAUDE_CODE_SUBAGENT_MODEL setting. This page adds the opinionated routing strategy.

The Cost Reality

Approach	Approximate cost multiplier	When it’s worth it
Single session	1x (baseline)	Sequential tasks, focused work
Subagents	4-7x	Parallel research, independent file edits
Agent Teams	~15x	Tasks requiring inter-agent coordination

The overhead comes from each agent loading project context independently, every message consuming tokens in both sender’s and receiver’s context, and broadcasts multiplying cost by teammate count.

Model Routing Strategy

The most impactful cost optimization: use the cheapest model that can do the job at each stage.

Stage	Model	Why
Research / exploration	Haiku	Fast, cheap, only needs to read and summarize
Implementation	Sonnet	Good balance of capability and cost
Architecture / coordination	Opus	Complex reasoning, big-picture decisions
Validation	Sonnet	Needs to understand code, doesn’t need to generate much

Example pipeline:

Research (Haiku) → Plan (Opus) → Implement (Sonnet) → Validate (Sonnet, read-only)

The opusplan Pattern

Official reference — For model aliases, /model switching, and CLAUDE_CODE_SUBAGENT_MODEL, see the official model configuration docs. This section covers the routing strategy.

The highest-leverage cost optimization: Opus plans, Sonnet executes. Use Opus for architecture decisions and coordination, then route implementation to Sonnet agents. This cuts 60-80% of token costs while maintaining quality where it matters.

Technique	How	Savings
opusplan	Opus plans + Sonnet implements	60-80%
Mid-session `/model` switch	Start with Opus, switch to Sonnet for execution	40-60%
`think harder` / `ultrathink`	Extended thinking on Sonnet instead of switching to Opus	~80% vs Opus
Haiku for research	Route exploration agents to Haiku via `model: haiku` in agent frontmatter	90%+ vs Opus

The key insight: model choice is a per-task decision, not a per-session one. Switch models as the work shifts from reasoning to execution.

Practical Tips

Minimize context pollution

Verbose output degrades agent performance and burns tokens. Keep agent prompts focused.
Use the Explore agent (Haiku, read-only) for broad searches instead of having your main Opus session read dozens of files.

Speed up test suites

Include --fast options for test suites in your CLAUDE.md — agents waste time running full suites when a subset would suffice.
Route test-running agents to Sonnet; they don’t need Opus-level reasoning.

Structure CLAUDE.md for multi-agent

Include module boundaries so agents know which files they own.
List verification commands so every agent can self-check.
Mark shared files (e.g., package.json, tsconfig.json) as “coordinate before editing.”

Use `/batch` for independent work

/batch is dramatically cheaper than Agent Teams for work where agents don’t need to communicate.

Reference: Real-World Scale

For perspective: Anthropic’s C compiler project — 16 agents over 2,000 sessions across two weeks — consumed approximately 2 billion input tokens and 140 million output tokens. Key takeaways:

Write extremely high-quality tests — agents solve whatever the verifier validates
Minimize verbose output — context pollution degrades performance
Maintain README/progress files so each fresh agent gets orientation context quickly
The constraint is always review bandwidth, not compute

← Prev: Agent Teams

Daily Workflow

Advanced

The Cost Reality

Model Routing Strategy

The opusplan Pattern

Practical Tips

Minimize context pollution

Speed up test suites

Structure CLAUDE.md for multi-agent

Use `/batch` for independent work

Reference: Real-World Scale

Daily Workflow

Advanced

​The Cost Reality

​Model Routing Strategy

​The opusplan Pattern

​Practical Tips

​Minimize context pollution

​Speed up test suites

​Structure CLAUDE.md for multi-agent

​Use /batch for independent work

​Reference: Real-World Scale

The Cost Reality

Model Routing Strategy

The opusplan Pattern

Practical Tips

Minimize context pollution

Speed up test suites

Structure CLAUDE.md for multi-agent

Use `/batch` for independent work

Reference: Real-World Scale