Skip to main content

Parallel Agents & Subscription Cap

Running commands that spawn multiple agents simultaneously (like /test-counsel, /test-app, /feature-counsel) consumes your Claude subscription usage much faster than normal conversations. This guide explains why and how to use these commands responsibly.

Why Parallel Agents Drain Your Cap Fast

Every Claude Code API call sends the following with it:

  • CLAUDE.md — workspace instructions
  • MEMORY.md — persistent memory index
  • The full conversation history so far

When you run a command that launches 8 agents in parallel, all 8 agents start simultaneously, and each one makes many tool calls internally (file reads, browser snapshots, API calls). That means those files above get sent dozens to hundreds of times within a few minutes.

Example: /test-counsel on a single project

  • 8 agents × ~30 tool calls each = ~240 API calls
  • Each call carries CLAUDE.md + MEMORY.md + the agent's conversation history
  • This can consume as much as a full session (~5h rolling window) of normal usage in one run

When you see: You've hit your limit · resets 3pm (Europe/Amsterdam) — that's your Claude subscription's rolling usage cap, not a rate limit.

Commands That Use Parallel Agents

CommandAgentsCap Impact
/test-counsel8 agentsVery high — use sparingly
/feature-counsel8 agentsVery high — use sparingly
/test-app (Full mode)6 agentsHigh — use sparingly
/test-app (Quick mode)1 agentLow — fine to use regularly
/test-scenario-run (multiple)up to 5 agentsMedium — depends on scenario count
/test-scenario-run (single)1 agentLow — fine to use regularly
/opsx-pipeline1–5 agents per batchMedium to Very high — depends on number of changes and model selected
/opsx-apply1 agentLow — fine to use regularly
Single /test-persona-*1 agentLow — fine to use regularly

Guidelines for Careful Use

Open a fresh window before running: Start the command in a new Claude Code window (no prior conversation history). The full conversation history is sent with every API call — in a window with 30+ prior messages, that history alone multiplies the token cost significantly across all parallel agents. A fresh window has zero history overhead.

Before running a multi-agent command:

  • Check the clock — do you have enough session left, or will you need Claude for other work today?
  • Prefer Quick mode over Full mode in /test-app unless you need the full perspective sweep
  • Run individual persona testers (/test-persona-henk, etc.) instead of the full /test-counsel when you only need one perspective

Don't run these commands:

  • Multiple times in a row on the same day
  • Right before needing Claude for urgent implementation work
  • Just to "see what happens" — run them when you have a concrete need for the output

After hitting the cap:

  • The limit resets at a fixed time (shown in the message, e.g. resets 3pm (Europe/Amsterdam))
  • Wait for the reset before continuing — there's no workaround
  • Use the waiting time to review output that was already generated

Files to Keep Lean

These files are sent with every single API call in the workspace. In a parallel-agent run they are multiplied by the number of agents and the number of tool calls each agent makes. Keep them minimal.

FilePurposeTarget size
.claude/CLAUDE.mdWorkspace instructions for Claude< 100 lines
.claude/MEMORY.mdIndex of memory files< 50 lines (index only, no content)

Rules:

  • CLAUDE.md: Only include instructions Claude needs on every task. Move niche/infrequent knowledge to separate files in .claude/docs/ that can be read on demand.
  • MEMORY.md: This is an index only — one line per memory file with a brief description. Never write memory content directly into MEMORY.md.
  • Persona files (personas/*.md): These are only loaded when a sub-agent explicitly reads them — they don't auto-load. Keep them focused, but they don't need to be ultra-short.

Two Kinds of Token Limits

Claude has two separate token limits that are easy to confuse:

Context window (per-conversation limit)

The maximum tokens a model can process in a single conversation. This is a fixed technical limit of the model itself. If a conversation exceeds it, older messages are compressed or dropped. You never "run out" of context window across conversations — each conversation starts fresh.

ModelContext windowMax output
Haiku 4.5200k tokens64k tokens
Sonnet 4.61M tokens64k tokens
Opus 4.61M tokens128k tokens

Source: Anthropic models overview

Subscription quota (account-level limit)

The total tokens you can use across all conversations combined within a rolling time window. When you hit this, you see "You've hit your limit - resets 3pm". Cheaper models allow more total tokens before hitting the cap:

ModelSession (~5h)Weekly (~7d)
Haiku~1.2M tokens~6M tokens
Sonnet~400K tokens~2M tokens
Opus~200K tokens~1M tokens

These are approximate estimates — Anthropic does not publish exact numbers. Calibrate your own values using claude.ai/settings/usage (see the usage tracker setup guide).

Why this matters for parallel agents

A 3-agent parallel run with Haiku might use ~90K tokens total. That fits easily in Haiku's 200k context window per agent, but it consumes ~7.5% of your ~1.2M session quota. The same run with Opus uses similar tokens but that's ~45% of your ~200K session quota. The context window is rarely the bottleneck — the subscription quota is.

Model Selection

All parallel sub-agent skills ask which model to use at run time. Haiku is the default and recommended choice for parallel runs — it costs significantly less from your subscription quota than Sonnet or Opus.

ModelContext windowCap costBest for
Haiku 4.5 (default)200k tokensLowestMost parallel test runs — broad coverage, fast, quota-efficient
Sonnet 4.61M tokensHigherBrowser-heavy tasks with many snapshots, or nuanced analysis
Opus 4.61M tokensHighestFinal pre-release testing or critical targeted reviews

Skills that ask for a model choice when launching agents:

  • /test-counsel — 8 agents (one per persona)
  • /feature-counsel — 8 agents (one per persona)
  • /test-app (Full mode) — 6 agents (one per perspective)
  • /opsx-pipeline — 1–5 agents (model selectable per change or uniformly for all)
  • /test-scenario-run — Haiku by default, Sonnet optional (asked per run)

Choosing Sonnet or Opus: Both have a 1M context window vs Haiku's 200k. For browser-heavy tasks that process many page snapshots or read large files, Sonnet's larger context is an advantage. Reserve Opus for final pre-release sweeps or critical targeted reviews where maximum reasoning depth matters.

The main conversation (where you type commands) always uses whichever model you have active. Only the sub-agents use the model you select when prompted.

For guidance on which testing commands to use and when, see testing.md.

Monitor Your Live Usage

The usage tracker lets you watch your token consumption in real time from a terminal panel in VS Code — useful for knowing how much cap you have left before starting a parallel-agent run.

# One-line status check
python3 usage-tracker/claude-usage-tracker.py --status-bar

# Live monitoring (30s refresh)
python3 usage-tracker/claude-usage-tracker.py --monitor

The tracker reads Claude Code's session files (~/.claude/projects/) directly and is accurate for API token counts. The limit thresholds are approximate — verify your real cap at claude.ai/settings/usage. Setup instructions: usage-tracker/SETUP.md.