Documentation Index
Fetch the complete documentation index at: https://agentic.proxify.io/llms.txt
Use this file to discover all available pages before exploring further.
The Cost Reality
| Approach | Approximate cost multiplier | When it’s worth it |
|---|
| Single session | 1x (baseline) | Sequential tasks, focused work |
| Subagents | 4-7x | Parallel research, independent file edits |
| Agent Teams | ~15x | Tasks requiring inter-agent coordination |
The overhead comes from each agent loading project context independently, every message consuming tokens in both sender’s and receiver’s context, and broadcasts multiplying cost by teammate count.
Model Routing Strategy
The most impactful cost optimization: use the cheapest model that can do the job at each stage.
| Stage | Model | Why |
|---|
| Research / exploration | Haiku | Fast, cheap, only needs to read and summarize |
| Implementation | Sonnet | Good balance of capability and cost |
| Architecture / coordination | Opus | Complex reasoning, big-picture decisions |
| Validation | Sonnet | Needs to understand code, doesn’t need to generate much |
Example pipeline:
Research (Haiku) → Plan (Opus) → Implement (Sonnet) → Validate (Sonnet, read-only)
The opusplan Pattern
Official reference — For model aliases,
/model switching, and
CLAUDE_CODE_SUBAGENT_MODEL, see the
official model configuration docs. This section covers the routing strategy.
The highest-leverage cost optimization: Opus plans, Sonnet executes. Use Opus for architecture decisions and coordination, then route implementation to Sonnet agents. This cuts 60-80% of token costs while maintaining quality where it matters.
| Technique | How | Savings |
|---|
| opusplan | Opus plans + Sonnet implements | 60-80% |
Mid-session /model switch | Start with Opus, switch to Sonnet for execution | 40-60% |
think harder / ultrathink | Extended thinking on Sonnet instead of switching to Opus | ~80% vs Opus |
| Haiku for research | Route exploration agents to Haiku via model: haiku in agent frontmatter | 90%+ vs Opus |
The key insight: model choice is a per-task decision, not a per-session one. Switch models as the work shifts from reasoning to execution.
Practical Tips
Minimize context pollution
- Verbose output degrades agent performance and burns tokens. Keep agent prompts focused.
- Use the Explore agent (Haiku, read-only) for broad searches instead of having your main Opus session read dozens of files.
Speed up test suites
- Include
--fast options for test suites in your CLAUDE.md — agents waste time running full suites when a subset would suffice.
- Route test-running agents to Sonnet; they don’t need Opus-level reasoning.
Structure CLAUDE.md for multi-agent
- Include module boundaries so agents know which files they own.
- List verification commands so every agent can self-check.
- Mark shared files (e.g.,
package.json, tsconfig.json) as “coordinate before editing.”
Use /batch for independent work
/batch is dramatically cheaper than Agent Teams for work where agents don’t need to communicate.
Reference: Real-World Scale
For perspective: Anthropic’s C compiler project — 16 agents over 2,000 sessions across two weeks — consumed approximately 2 billion input tokens and 140 million output tokens. Key takeaways:
- Write extremely high-quality tests — agents solve whatever the verifier validates
- Minimize verbose output — context pollution degrades performance
- Maintain README/progress files so each fresh agent gets orientation context quickly
- The constraint is always review bandwidth, not compute