Claude Code Subagents: What Operators Actually Need to Know (2026)
I started using Claude Code subagents the week they shipped because my parent agent kept choking on its own context after long debugging sessions. The first version I wrote was a code-reviewer subagent that took the diff and returned findings. Anecdotally on a ~50K-LOC TypeScript repo (Sonnet 4.6, 2026-04, n≈6 turns, counting parent input tokens reported by the harness before vs after introducing the subagent), it trimmed parent input per turn from roughly 18K-22K down to 11K-13K. It also surfaced a failure mode I had not seen: the subagent invented file paths that did not exist, because nobody had told it the repo layout. No affiliate links appear in this page; everything below is field notes from my own setup.
TL;DR
Use Claude Code subagents when work needs parallel independent streams, when isolating a tool-restricted task (read-only review, sandboxed search, untrusted output parsing) is worth the setup, or when a long-running session is about to exhaust parent context. Skip them for one-shot helpers, simple tool calls, or any work where the orchestration cost outweighs the context savings. The cost is real: cold-start latency per Task call (single-digit seconds in informal use on Claude Code 2.x as of 2026-04-28, not benchmarked), duplicated system-prompt and CLAUDE.md tokens billed fresh on every invocation, routing errors when the parent picks the wrong specialist based on a vague description, and debugging overhead because the parent only sees the subagent's final message and not the trail of tool calls behind it. The mental model is "subprocess with its own context window," not "specialist colleague." Configure subagents in .claude/agents/ with a tight description so the parent picks them correctly, restrict tools to the minimum, and pass state through files because conversation history does not cross the boundary.
The mental model
A Claude Code subagent is a separate Claude inference session, spawned by the parent agent, with its own fresh context window and its own (configurable) tool allowlist. The parent invokes it through the harness's Agent tool (the tool was renamed from Task to Agent in Claude Code 2.1.63 per code.claude.com/docs/en/sub-agents; the legacy Task(...) form still works as an alias in settings and agent tools: fields), hands it a task prompt, and receives one final assistant message back. That final message is the only subagent output the parent's context ingests, per the Claude Code subagents documentation at code.claude.com/docs/en/sub-agents (read 2026-04-28). The subagent's tool-call transcript, the file reads it ran, the Grep walks it abandoned, none of it lands in the parent.
Three properties follow directly from that design:
- Context isolation is the headline feature. The parent does not pay token cost for the subagent's tool noise. A subagent that runs 30 Grep calls reports back as a single result message in the parent's context.
- State is not shared. Per the same docs (read 2026-04-28), each Task invocation starts a fresh context window; the subagent does not see the parent's prior turns, and I have verified this on my own setup by spawning a subagent after a multi-turn parent session and confirming via verbose harness logging that none of the parent's earlier turns appear in the subagent's input. CLAUDE.md is loaded fresh. Workers are blank slates by default.
- The contract is the prompt and the description. The parent uses the
descriptionfrontmatter field to decide whether to delegate. Bad descriptions cause the wrong delegation choices, which is a class of bug that does not show up in any log unless you go looking for it.
The concrete boundary against the Anthropic Agent SDK: a Claude Code subagent is a markdown file in .claude/agents/ with frontmatter (name, description, tools, model), invoked by the parent through the harness's Agent tool, with tools drawn from the harness's built-in set plus any MCP servers the harness loads. An Agent SDK agent is a runtime object instantiated in Python or TypeScript code, with tools defined as SDK-shaped schemas, invoked programmatically by your application rather than by a parent agent's tool call. Same conceptual family, different configuration surface, different invocation contract, different tool model. A prompt written for one is portable; the surrounding adapter is not.
The landscape
In my own work and in a handful of public Claude Code subagent definitions I happened to read while building my own setup, three patterns show up repeatedly. These are personal field notes from a biased sample, not a survey.
- Built-in single subagent dispatch. The parent decides per-turn whether to call a named subagent based on its description. This is the default Claude Code experience.
- Parallel task dispatch with parent-side synthesis. The parent emits N Task calls in a single response, the harness runs them concurrently, and the parent collects and synthesizes results in its next turn.
- Plugin-distributed subagent libraries. Plugins ship pre-built subagent definitions that become available via Task once installed. Verify the plugin's source repo and its
.claude/agents/files at the version you install before trusting the description.
Outside Claude Code itself, the comparison points that matter are runtime, configuration, context boundary, and tool boundary.
| Runtime (verified 2026-04-28) | Configuration surface | Context isolation | Tool execution boundary |
|---|---|---|---|
| Claude Code subagents (docs at code.claude.com/docs/en/sub-agents) | Markdown files in .claude/agents/ with frontmatter (name, description, tools, model) | Separate inference session per Task call, fresh context window | Per-subagent tools allowlist; defaults to inheriting parent's tools and MCP servers |
| Cursor agent mode (per Cursor's docs at docs.cursor.com, read 2026-04-28) | IDE settings; I have not found a documented user-facing way to define named subagents the way Claude Code does | Best I can tell, the agent runs as one IDE-managed session rather than spawning isolated child sessions, but Cursor ships rapidly and this may be stale | Tools managed at the IDE-session level |
| Aider architect mode (Aider 0.x as of 2026-04-28, docs at aider.chat/docs/usage/modes.html) | --architect flag pairs an architect model with an editor model for a two-pass plan-then-edit flow; a mode, not a named-subagent abstraction | Two sequential model calls in one Aider session, not isolated context windows | Aider's built-in edit and shell flows, same set across the pair |
Anthropic Claude Agent SDK (Python claude-agent-sdk, docs at docs.claude.com/en/api/agent-sdk/subagents read 2026-04-28) | Subagents defined either as filesystem markdown (the same .claude/agents/ shape) or programmatically via the SDK's agent-definition API | Separate agent instance with its own context window when invoked, mirroring the CLI harness model | Tools defined per agent in the SDK configuration |
The high-level pattern rhymes across all four. The runtimes, configuration surfaces, and tool models do not, so a subagent prompt is portable only after the surrounding adapter layer is rewritten.
What actually matters operationally
The vendor framing is "specialist agents for specialized tasks." That framing is mostly marketing. The dimensions that actually drive whether subagents pay off are different.
Context budget arithmetic (simplified model). A subagent costs one final response in parent context plus its own full session billed independently. If the work would have cost the parent 8K tokens of tool noise and the response is 600 tokens, that is roughly 7.4K parent tokens saved at the price of duplicating the system prompt. Treat that as a back-of-envelope sketch. It excludes per-token pricing differences, latency from cold starts, prompt-cache hit rates (which can flip the economics dramatically when the parent's prefix is hot and the subagent's is cold), and output quality. The narrower claim that survives: on long sessions where parent-context pressure is the binding constraint, the arithmetic above is the first thing I check. For a single-shot task it almost never pays.
Description quality drives routing. The parent's decision to delegate is a tool-routing decision based on the description field. Vague descriptions ("helps with code review") cause the parent to either over-delegate or under-delegate. Descriptions written like trigger conditions ("Use after writing or modifying code, before committing") route correctly. This is the routing knob I test first when a subagent is firing at the wrong moments.
Tool restriction is a security primitive, not a nicety. A subagent with no Edit, Write, or Bash tools cannot mutate state. That makes it safe to fan out across untrusted inputs, scan dependency manifests, or run on log content from external systems. I often see tools: * in example definitions and plugin-shipped subagents, which throws away the strongest argument for using subagents at all.
Cold-start cost (anecdote, not benchmark). Each invocation reloads the system prompt, project CLAUDE.md, and harness preludes. There is a non-trivial constant-factor latency per Task call. Wall-clock-eyeballed dispatch-to-first-token times on my own laptop on a medium TypeScript repo felt like single-digit seconds in informal use, but I would not plan capacity against that. Measure on your own setup before architecting around it.
Debuggability is worse than the parent. When a subagent fails, the parent receives the final message and not the trail. If the subagent hallucinated a path or misread a file, the only recourse is rerunning with verbose harness logging. Plan for this before shipping subagents into a workflow humans rely on.
Lock-in is real but bounded. Subagent markdown files are tied to the Claude Code harness's frontmatter schema. Migrating to the Agent SDK is a rewrite, not a port. The reusable part is usually the task rubric, not the full prompt.
Detailed teardowns
Pattern 1: The read-only reviewer
A subagent with tools: Read, Grep, Glob (no Bash, no Edit, no Write) that takes a diff supplied by the parent and returns structured findings. The tool restriction is the actual security primitive here. Granting Bash would let the subagent execute arbitrary shell commands and silently turn the "read-only" label into a false control.
Architecture. Parent runs the implementation and computes the diff itself. It then dispatches a Task to the reviewer with the diff content embedded directly in the prompt, or written to a scratch file path under .claude/scratch/ that the subagent can Read. The reviewer walks referenced files, follows neighboring code with Grep, and emits a final message with severity-tagged findings. Because the subagent has no Bash, Edit, or Write, there is no path to mutation.
Tradeoffs. Cheap context win on long sessions because review noise stays out of parent context. Cheap to test because the subagent is essentially stateless given the diff. The most common failure mode I see is operators who keep Bash in the allowlist "for convenience" and ask the subagent to find its own diff. That is fine as a productivity pattern, but call it what it is: a review-only subagent that depends on shell command policy and repo trust, not a tool-restricted security boundary.
When it is the right call. Codebases where review is repetitive and rule-driven, sessions long enough that parent context pressure is real, and the work is naturally batchable.
When it is the wrong call. Single-file edits where the change is eyeball-sized, projects without a stable review checklist, or workflows where the parent already has the diff in context.
Pattern 2: The parallel research fan-out
The parent dispatches N Task calls in a single response, each to the same general-purpose research subagent, each with a different scoped query. The harness runs them concurrently.
Architecture. Parent is the planner. It splits a research goal into independent sub-questions, fires N tasks in parallel, and synthesizes results in its next turn.
Tradeoffs. This is the pattern where subagents can improve wall-clock latency, when the harness actually runs Task calls concurrently and the per-task work is large enough to dominate setup overhead. I have seen meaningful wins on fan-outs of 3 to 5 independent searches across a medium TypeScript repo on Claude Code 2.x with Sonnet 4.6 subagents, but I have not run a controlled benchmark, so I will not put a sequential-vs-parallel multiplier on the page. The structural tradeoff: parallel fan-out trades duplicate setup tokens across N subagents (one CLAUDE.md and system prompt per subagent) and harder synthesis on the parent side for whatever wall-clock win the concurrency delivers. The harness also caps concurrent Task execution, so beyond a handful of parallel calls the gains flatten.
When it is the right call. Multi-source research, codebase audits across modules that do not interact, log analysis across unrelated services, anything where the sub-questions are genuinely independent.
When it is the wrong call. Sub-questions with shared state because you end up serializing through the parent. Also wrong for single deep questions: one focused subagent beats five shallow ones.
Pattern 3: The orchestrator with named specialist subagents
The parent acts as a router that delegates by domain to differently-configured subagents (frontend reviewer, backend reviewer, security scanner, test generator). Each specialist has a tight description and a restricted toolset.
Architecture. This is the pattern most plugin authors ship. The parent rarely does work itself; it picks the right subagent and forwards the request.
Tradeoffs. The measurable upside is consistency: each domain gets a tuned prompt that does not have to be re-explained turn-by-turn, so review criteria stay stable across invocations. In my own setup, that meant security findings stopped drifting in format between sessions. When this pattern fails, it fails subtly: the parent picks the wrong specialist, the wrong specialist returns plausible-but-misaligned output, and the user does not notice until they try to apply the change. The fix is ruthless description discipline plus a should-not-use clause where appropriate.
When it is the right call. Mature teams with stable conventions, large codebases with genuinely different review criteria per area, plugins shipped to many users.
When it is the wrong call. Solo projects, early-stage codebases without stable conventions, or any setup where you cannot articulate what each specialist should refuse to do.
Pattern 4: The long-task delegator
A subagent that takes on a multi-step task end-to-end and reports a single final result. The parent essentially yields the turn.
Architecture. Parent prepares context (file paths, constraints, success criteria) and dispatches one Task. The subagent runs for many turns internally, makes its own tool calls, and produces a final message. Parent then validates and either accepts or re-dispatches with corrections.
Tradeoffs. Maximum context savings because the entire multi-turn workflow runs invisibly. Maximum debuggability cost because if something went wrong in turn 7 of 20, you cannot see it. The Claude Code harness returns the subagent's final message and nothing else, so there is no built-in "ask the human" channel mid-run. If the subagent gets stuck, the only escalation is to stop and emit a final message describing what it needs. That has to be instructed explicitly ("If a required input is missing or ambiguous, stop and return a single message starting with BLOCKED: describing what you need"), and the parent has to recognize that pattern and re-dispatch. Without that convention, a confused subagent will guess, and the guess will look like a confident final answer.
When it is the right call. Tasks with crisp acceptance criteria and bounded scope. Background work where humans will validate the artifact before accepting.
When it is the wrong call. Anything ambiguous, anything where a human needs to be in the loop on intermediate decisions, or anything where partial progress matters and full failure is unacceptable.
The standards layer: MCP, Skills, and the Agent SDK
Subagents do not exist in isolation. The harness ships several adjacent abstractions that operators routinely confuse with each other.
MCP (Model Context Protocol). The open protocol for exposing tools and data sources to agents. Subagents inherit the parent's MCP servers by default. If a subagent does not see a tool you expected, check that the parent's MCP config covers it and that the subagent's tools allowlist includes it. Documented at modelcontextprotocol.io.
Skills. Per-turn invokable workflows distinct from subagents. In Claude Code as of 2026-04-28, verified against code.claude.com/docs/en/skills, invoking a skill loads its content into the current session and the parent follows the instructions inline rather than spawning a separate inference. Use skills for "when X, do Y" recipes that need to live in parent context, and use subagents for work that benefits from isolation.
Agent SDK. Anthropic's Python and TypeScript SDK for building agents outside the Claude Code harness. The SDK has its own subagent concept that mirrors but does not duplicate the CLI harness's. If you are migrating subagent prompts from Claude Code to the SDK, expect to rewrite the invocation contract.
The protocol situation in 2026 is healthier than it was 12 months ago because MCP has consolidated tool exposure, but agent-to-subagent invocation is still proprietary per harness. There is no portable standard for "this is my subagent definition" today. If interop matters, keep the prompt content in plain markdown and treat the frontmatter as a per-runtime adapter.
Things nobody talks about
Subagent context is not actually free. Every Task invocation re-bills the system prompt, the project CLAUDE.md, and the harness prelude as fresh input tokens. On a medium repo with a 4K token CLAUDE.md, that is roughly 4K input tokens of repeated overhead per Task call. Fan out 5 subagents and you have just paid 20K tokens for setup before any work happens. The fix is to write tight CLAUDE.md and prefer one larger Task over many small ones when the work is sequential.
Description-routing is silent and untestable. The parent's choice to delegate is a model decision based on the description text. There is no test harness for "does the parent route correctly given this user prompt." Operators ship subagents, hope routing works, and only find out it does not when results look wrong. My workaround is to add a one-line "Use when..." sentence at the top of every description and to verify routing manually for the top 5 use cases before considering the subagent done.
Subagents do not inherit conversation memory. I have watched smart engineers lose half a day debugging why their subagent "forgot" the architectural context they explained to the parent three turns ago. The subagent never knew. The parent must re-state every constraint that matters in the Task prompt itself.
Tool allowlists silently widen on harness updates. When a new built-in tool ships in Claude Code, existing subagent definitions that use tools: * automatically get access to it. If you rely on tool restriction as a security boundary, use an explicit allowlist (tools: Read, Grep, Glob) and accept the maintenance cost.
Plugins shipping subagents can shadow your own. When a plugin defines a subagent with the same name as a project-local one, the resolution order matters and is not loudly documented. Run /agents to confirm which definition is winning before debugging behavior that "doesn't match the file I edited."
Implementation patterns
The companion repo at github.com/MPIsaac-Per/agentinfra-examples is the intended home for working versions of each pattern; as of 2026-05-04 the example files have not all landed there yet, so treat the snippets here as the authoritative reference.
Pattern 1: The read-only reviewer (skeleton)
---
name: code-reviewer
description: Use after writing or modifying code, before committing or creating a PR. Reviews a diff supplied by the parent for correctness, project conventions, and potential bugs. Do not use for whole-codebase audits or for changes the parent has not staged and passed in.
tools: Read, Grep, Glob
model: sonnet
---
You review code changes for a senior engineer who values precision over politeness. The parent agent will pass you the staged diff in the task prompt or as a file path under .claude/scratch/. You have no shell access; if the diff is missing, return a single message asking the parent to provide it rather than attempting to fetch it yourself.
Process:
1. Read the diff from the prompt or the scratch file path the parent provided.
2. For each modified file, Read the surrounding context (50 lines above and below the change).
3. For each non-trivial change, Grep for callers and verify assumptions.
4. Output findings as a numbered list. Each finding: severity (blocker | warning | nit), file:line, one-sentence description, suggested fix.
Refuse to suggest changes you have not verified by reading the relevant code. If you cannot verify a concern, label it as a question, not a finding.
The parent-side glue is a single parent assistant turn that runs Bash to compute the diff, then emits an Agent tool call carrying the diff as input.
# Pseudo-code transcript of a single parent assistant turn.
# The parent emits Bash and Agent tool-use blocks; the harness executes them
# and returns tool_result blocks the parent reads on its next turn.
assistant turn (parent):
tool_use(name="Bash", input={
command: "git diff --staged",
description: "capture staged diff for review"
})
# Harness returns the diff as a tool_result. On the next turn the parent
# embeds that text into the Agent prompt:
assistant turn (parent, next):
tool_use(name="Agent", input={
subagent_type: "code-reviewer",
description: "review staged diff",
prompt: "Review this staged diff. Findings only, no rewrites.\n\n<diff>\n${staged_diff_from_previous_tool_result}\n</diff>"
})
The operator-facing instruction is: "Run git diff --staged, then dispatch the code-reviewer subagent on the result." Operators who prefer to keep the diff out of the model's context entirely can write the diff to .claude/scratch/review.diff via Bash and pass only the path in the Task prompt.
If Bash has to live inside the subagent (some workflows genuinely need it), drop the "read-only" framing and treat the subagent as review-only. Document the mutation risk in the description, lean on harness-level shell command policy and repo trust as the actual controls.
Parallel research fan-out (parent prompt fragment)
Ask the parent agent to emit multiple Agent tool calls in a single assistant turn, and the harness will run them concurrently.
# Pseudo-code transcript of a single parent assistant turn.
# The parent emits three Agent tool-use blocks in one message; the harness
# dispatches them concurrently and returns three tool_result blocks before
# the parent's next turn.
assistant turn (parent):
tool_use(name="Agent", input={
subagent_type: "general-purpose",
description: "find parseConfig call sites",
prompt: "Find every call site of `parseConfig` in src/. Report file:line and surrounding context."
})
tool_use(name="Agent", input={
subagent_type: "general-purpose",
description: "list env vars read in src/server",
prompt: "List all environment variables read in src/server/. Report variable name, file:line, default value if any."
})
tool_use(name="Agent", input={
subagent_type: "general-purpose",
description: "find TODO/FIXME comments",
prompt: "Find all TODO and FIXME comments in src/. Report file:line and full comment text."
})
# Next parent turn receives three tool_result blocks (one per Task) and
# synthesizes a single report from them.
The operator-facing instruction is: "Dispatch three Task calls in parallel in your next turn, one for each of the following sub-questions, then synthesize the results in the turn after." Spelling out the sub-questions is what makes the fan-out reliable rather than aspirational.
Orchestrator with allowlisted specialists
---
name: security-scanner
description: Use when reviewing code that handles authentication, authorization, secrets, user input, or external API calls. Read-only; reports findings without modifying code. Skip for pure UI changes, refactors that do not touch security-relevant paths, or generated code.
tools: Read, Grep, Glob
model: sonnet
---
You scan code for exploitable security issues across five risk classes: broken authentication or authorization, exposed secrets and credentials, injection (shell, SQL, path), sensitive-data exposure in logs or responses, and audit-trail gaps on privileged operations.
Focus areas (in order of priority):
1. Hardcoded secrets, API keys, or credentials.
2. Unvalidated user input flowing into shell commands, SQL, or file paths.
3. Authorization checks missing on sensitive endpoints.
4. Timing-sensitive comparisons (token equality, password checks).
5. Logging that captures secrets or PII.
For each finding: severity, file:line, one-paragraph explanation including the attacker scenario and which risk class it falls under, and the specific change that would fix it. If the codebase already has a pattern that addresses the concern (e.g., a constant-time compare util), reference it by file:line.
Pin your CLAUDE.md, pin your subagent definitions in git, and test routing on representative prompts before trusting the parent to delegate correctly.
Decision framework
I use this rough order when deciding whether a piece of work should become a subagent:
- Is the work independent of the parent's ongoing state? If no, do not split it. Subagents do not share memory.
- Will the parent context overflow within this session without the split? If yes, a subagent that wraps the noisy work is the right call.
- Is there a security or trust boundary that matters? If yes, a tool-restricted subagent is the strongest enforcement primitive available.
- Are there 2+ genuinely independent sub-tasks that could run in parallel? If yes, parallel fan-out earns its keep on wall-clock latency.
- None of the above? Stay in the parent unless the expected saved parent-context tokens exceed the duplicated setup tokens (roughly 4K per Task call on a medium repo with a 4K-token CLAUDE.md), or the task has a real isolation boundary worth enforcing through a restricted toolset.
Where this space is headed: I expect MCP to absorb more of the cross-runtime tool exposure problem within 2026, which will make the prompts in subagent definitions more portable. I expect subagent-to-subagent communication to remain proprietary and frankly to remain a place I avoid; the orchestration tax compounds at every level. What I would bet on is treating the subagent prompts themselves as the durable artifact, version-controlled and tested, with the runtime adapter as a thin layer underneath.
The thing nobody told me when I started: most production agent infrastructure is one well-written parent agent with two or three carefully chosen subagents, not a sprawling specialist hierarchy. Start there. Add a subagent only when you can name the specific context-budget or independence problem it solves.