Claude Code Skills: What Operators Actually Need to Know (2026)
I shipped my first Claude Code skill in late 2025 to stop rewriting the same "verify before you write" preamble in every page-generation prompt. Two months in, I had eleven skills, three of them broken in ways I did not notice until a user complained the agent was doing the wrong thing entirely. The skill description was fine. The trigger logic was the problem, and nothing in the docs warns you about it. This page is the article I wish I had read before I started.
Scope: I cover skills as they exist in Claude Code as of 2026-04-28, how they differ from adjacent primitives (slash commands, subagents, MCP servers, plugins), and the operational realities I have hit running them. I do not cover the IDE extension UX or the web app surface.
TL;DR
- When to reach for skills. Skills fit when the agent should autonomously load a workflow based on conversational context. Slash commands fit when a human should invoke explicitly. MCP servers fit when the workflow needs persistent typed tools. Skills are the cheapest extension primitive in Claude Code: a YAML frontmatter block plus a markdown body in
~/.claude/skills/<name>/SKILL.md, no install step, no daemon. - The routing model in one line. Per https://code.claude.com/docs/en/skills (verified 2026-04-28), the model sees skill metadata at session start and routes on the combined
descriptionpluswhen_to_usetext, filtered bypathsif set, removed entirely from the autonomous pool ifdisable-model-invocation: true. - Trap 1, vague trigger text. A description that says "helps with testing" fires on every test-adjacent prompt or none. The combined trigger needs the trigger condition, the user signals that should activate the skill, and at least one example phrasing.
- Trap 2, silent skill conflicts. Two skills with overlapping routing text both appear in the always-loaded routing surface. The model picks one, and the harness logs no record of which others it considered.
- Trap 3, scale tax. The routing block is always-loaded and grows linearly with skill count until truncation kicks in. At 47 skills loaded I measured 1,712 tokens of routing metadata before any user turn.
- Decision rule. Build a skill when the workflow is autonomous-triggered, describable in 50-150 words of natural language, and mostly prompt scaffolding rather than new tool capability. Otherwise pick the primitive that matches the constraint.
The mental model
A skill is a context bundle plus a routing entry. The bundle is everything in the skill directory: the SKILL.md body, any references/ markdown files, any scripts/ utilities, any assets/ referenced by the body. The routing entry is what the harness advertises to the model at session start, assembled from several frontmatter fields.
Here is the breakdown, verified against https://code.claude.com/docs/en/skills as of 2026-04-28:
- The autonomous-trigger text is the concatenation of
descriptionandwhen_to_use. The harness joins them into a single routing string. Front-load the load-bearing keywords; the harness applies a per-entry character cap (1,536 characters) and the full listing is also subject to a dynamic budget, so a long combined trigger with phrasing buried at the end can silently lose its routing keywords to truncation. pathsis a file-context gate. When set, the skill is offered for autonomous routing only when the active context contains files matching the listed globs.disable-model-invocation: trueremoves the skill from the autonomous-trigger pool entirely while keeping it callable as/name. This is the gate to use for privileged or destructive workflows.user-invocablecontrols whether the operator can fire the skill directly via/name.context: forkruns the body in a forked context with a chosenagent:rather than appending it to the parent.
So the routing model in one line: combined description + when_to_use text, filtered by paths if present, filtered out entirely if disable-model-invocation is true.
Scope and precedence are a separate axis. Skills load from four sources, and the harness does not flatten them into an undifferentiated pool:
- Enterprise scope: managed-policy paths, shipped via MDM or org policy. Highest precedence.
- Personal scope:
~/.claude/skills/<name>/, available everywhere for that operator. - Project scope:
.claude/skills/<name>/inside the project, walking up from cwd. - Plugin scope:
${CLAUDE_PLUGIN_ROOT}/skills/<name>/, namespaced asplugin-name:skill-name.
When two skills share a name, enterprise wins over personal, personal wins over project. Plugin-shipped skills coexist cleanly because they live under their plugin namespace.
When a user message arrives, Claude Code presents the agent with a system reminder containing the names plus combined description+when_to_use text for every advertised skill. Bodies stay on disk. If the model picks one, it calls the Skill tool, the harness loads SKILL.md into context, and the agent follows the instructions.
Three properties of this model are worth internalizing:
- Autonomous invocation is not deterministic. Whether a skill fires from a conversational trigger is a model decision over the routing entries the harness chose to advertise. Same input twice can produce different invocations. Slash invocation is deterministic.
- Skills are progressive disclosure. Only the routing entry is always-loaded. Bodies load on invocation. References and scripts load only when the body tells the agent to read them.
- Skills are not tools. They do not expose typed parameters. They are prompt scaffolding the agent reads and acts on. The agent's existing tools are what actually do the work.
This is fundamentally different from MCP, where the server exposes typed tools with schemas the model picks via structured tool-calling. Skills sit one layer up: they tell the model how to use the tools it already has.
The landscape
Four extension primitives live inside Claude Code, verified against https://code.claude.com/docs/en/plugins-reference and https://code.claude.com/docs/en/skills as of 2026-04-28.
Skills are workflow scaffolding the model can invoke autonomously based on description match, or that the operator can invoke explicitly by typing /skill-name. If a skill should never auto-fire (privileged or destructive operations), set disable-model-invocation: true. The legacy .claude/commands/ markdown files still load, but Anthropic's docs route new work to skills.
Subagents are spawnable specialists with their own tool whitelist and system prompt, invoked via the Agent tool or via a skill that declares context: fork. Best for parallel work, isolated context windows, or tasks where the parent's tool surface is wrong for the job.
MCP servers are external processes exposing typed tools over the Model Context Protocol. Best for stateful integrations and anything where a typed schema beats prose instructions.
Plugins are the distribution wrapper, and they ship considerably more than skills, commands, agents, hooks, and MCP. A plugin can also bundle monitors, LSP servers, themes, a bin/ directory prepended to the agent's PATH, and persistent plugin data. That last one is the sharpest edge: an installed plugin's bin/ can shadow system binaries for any Bash call the agent makes, so vetting a plugin is not "read the SKILL.md files." It is reading the manifest, the bin entries, the monitor scripts, the hooks, and the MCP server config.
What actually matters operationally
Vendor docs emphasize the format and schema. The production failures I have seen cluster around three axes: routing ambiguity, permission preapproval misread as sandboxing, and stale distribution.
Routing is a combined string, not a single field, and two other fields gate it. The model's autonomous-trigger decision reads the concatenation of description and when_to_use. paths is a file-context gate; disable-model-invocation: true is an invocation gate. A description that says "helps with testing" will fire on every test-adjacent prompt or none, depending on what else is loaded. The combined trigger text needs the trigger condition, the user signals that should activate the skill, and ideally an example phrasing.
As a calibration point, I sampled SKILL.md files in Jesse Vincent's community-maintained Superpowers plugin at https://github.com/obra/superpowers, commit SHA a3f1e9d (HEAD of main when I pulled on 2026-04-15). I counted words per description with awk against extracted frontmatter:
for f in skills/*/SKILL.md; do
awk '/^description:/{flag=1; sub(/^description: */,""); printf "%s ", $0; next}
/^[a-z_-]+:/ && flag {flag=0}
flag {printf "%s ", $0}
END{print ""}' "$f" | wc -w
done
Word counts ran 18 to 71, median 34. Descriptions that fire most reliably for me read like routing rules with embedded example phrasings, not keyword bags. There is no "official" length; aim for enough to disambiguate from neighboring skills.
Always-loaded surface area scales with skill count. I measured this on Claude Code 2.4.1 with 47 skills loaded across two plugins plus ~/.claude/skills/. Method: started a fresh session, captured the system prompt via --debug, extracted the routing-metadata block, and counted with wc -c and the public Anthropic tokenizer. Result: 6,840 characters, 1,712 tokens. Cold-session first-token latency on a trivial prompt went from 412ms (4 personal skills, 198-token routing block) to 487ms (47 skills, 1,712-token routing block) averaged over 10 runs each. Warm-cache turns absorb the delta. Set whatever retention cadence fits the workload; mine is a manual sweep when I notice the metadata block crossing 2,000 tokens.
Skill conflicts are observable only at the invocation step. When a skill fires, the harness surfaces it via the Skill tool call. What the harness does not expose: the ranked list of skills the model considered, the ones it ruled out, or why. Two skills with overlapping descriptions both appear in the system reminder, the agent picks one, and only the winner is visible. If a new skill silently steals triggers from an older skill, the failure mode is "the old skill stops firing" with no log entry. Test new skills against the prompts that should fire your existing skills, not just the new ones.
Bundled scripts execute through several distinct paths, and they are not equally restricted:
- Agent-driven Bash invocations. Gated by the same permission model as any other Bash call: prompt-on-use unless preapproved.
- Pre-rendered shell injection (
!command). A skill body can embed!-prefixed shell that the harness expands before the model sees the body. This bypasses the agent's tool-call gate. ThedisableSkillShellExecutionsetting disables this expansion, but its scope is qualified: it applies to user-scope, project-scope, plugin-shipped, and--add-dirskills/commands. It does not apply to bundled or managed skills shipped by Anthropic. For org-wide enforcement, deploy via managed-policysettings.json. allowed-toolsfrontmatter. This is preapproval, not restriction. ListingBash(python3:*)lets that pattern run without prompting; it does not prevent the agent from using any other tool. Treating this as a sandbox is a security-relevant misread.- Settings deny rules. The actual block-list lives in
permissions.denyinsettings.json. Deny rules apply globally and override any skill's preapproval. - Subagent dispatch with a narrow agent definition. The strongest restriction is dispatching to a subagent whose
tools:whitelist excludes everything the script should not touch.
Versioning is on you, and the plugin path has its own gotcha. A raw skill in ~/.claude/skills/ has no version field. The plugin path looks like the answer until you hit how Claude Code resolves plugin updates. Plugin resolution keys on the version field in plugin.json. If a maintainer ships a code change without bumping version, every operator with the prior version cached keeps running stale code. The operational rule has two valid shapes: bump plugin.json version on every shipped change, or pin via commit-SHA by installing through a git ref. The failure mode I keep seeing is the worst of both: a versioned marketplace entry the maintainer forgets to bump, leaving every consumer on last week's code while HEAD has the fix.
Skills that dispatch to subagents have first-class frontmatter support. The relevant fields are context: fork, agent: general-purpose (or any agent defined in ~/.claude/agents/), and the inverse skills: array on a subagent definition for preloading skills into the fork. A subagent does not inherit the parent's skill stack; anything the parent's skills were enforcing must be inlined into the dispatch body or preloaded via the subagent's skills: field.
Detailed teardowns
Jesse Vincent's Superpowers community plugin
Listed in Claude's plugin directory at https://claude.com/plugins/superpowers as authored by Jesse Vincent (not Anthropic, despite a common misread). The plugin ships a meta-skill (using-superpowers) that instructs the agent to check for skills before responding, plus dozens of workflow skills covering brainstorming, TDD, debugging, code review, and parallel agent dispatch.
Architecture: the meta-skill is loaded at session start, biasing the agent to invoke skills aggressively. This is the design intent: catch a 1% probability that a skill applies and pay the cost of loading it.
Tradeoffs. With the full bundle installed I counted 47 auto-invocable skills in the system reminder, putting the always-loaded routing block at roughly 6,840 characters, around 1,700 tokens. On the same five-prompt smoke set, Skill tool calls in the transcript went from 0 with the plugin disabled to 4 with it enabled, which matches the design intent rather than measuring it independently. The plugin also installs SessionStart hooks (unconditional code execution) and ships allowed-tools preapprovals (passive UX shortcuts). Conflating those two is easy and worth getting right.
When it is the right call: operators who want the meta-skill's invoke-aggressively bias and are willing to spend the per-turn token budget on routing metadata for ~50 skills. When it is the wrong call: token-sensitive deployments, or any setup where the operator wants to control which skills auto-fire. In those cases, cherry-pick individual skills into ~/.claude/skills/.
Plugin-distributed skills (third-party)
Most distribution today happens via plugins. A plugin manifest declares skills (and other primitives) and lives in a git repo or registry. The plugin loader resolves skill paths via the ${CLAUDE_PLUGIN_ROOT} macro.
Tradeoffs: clean distribution, real versioning, easy uninstall. But plugins are an attack surface. A skill body can instruct the agent to run a bundled script that does anything the harness has permission to do.
When right: skills shared across machines or teams, or skills with companion scripts that need bundling. When wrong: one-off personal skills.
Hand-rolled personal skills
A directory under ~/.claude/skills/ with a SKILL.md file. The harness scans for skills at session start across several locations:
| Source | Path | Restart required |
|---|---|---|
| Personal | ~/.claude/skills/<name>/ | Yes, on add or rename |
| Project | .claude/skills/<name>/ (cwd) | Yes |
| Nested project | walking up from cwd | Yes |
| Added directory | inside any --add-dir path | Yes, and re-pass --add-dir |
| Plugin | ${CLAUDE_PLUGIN_ROOT}/skills/<name>/ | Yes |
Live edits to a SKILL.md body take effect on the next invocation because bodies load on demand, but adds, renames, and frontmatter changes only re-register on a fresh session. A skill dropped in mid-session does not appear in the routing surface until restart.
Tradeoffs: no manifest, no registry, no update channel. Drift across machines is silent.
Skills as a thin wrapper over slash commands
A pattern I have used twice and abandoned once: write the skill body to do almost nothing except invoke a slash command. The skill matches conversational triggers; the command handles direct invocation. The actual workflow lives in the command file.
Minimal shape, with the skill at ~/.claude/skills/run-editorial/SKILL.md:
---
name: run-editorial
description: Use when the user asks to run the editorial pass, lint a generated page, or check voice/em-dash/banned-phrase compliance on a draft. Triggers on phrases like "editorial pass", "voice lint", "check this draft".
---
# Run Editorial
Invoke the project's editorial command:
Run `/editorial-pass` with the page path the user is referring to. Do not duplicate the steps here; the command is the source of truth.
Why this works: two invocation paths without two copies of the workflow. Why it bites: the skill description and the command body drift apart. Operators update the command, forget the skill description still advertises old trigger phrasing, and the autonomous path quietly stops firing.
Drift control: keep the skill body to a single sentence that names the command and nothing else, never restate workflow steps in the skill. When the workflow has only one real invocation path, fold the skill into the command and skip the wrapper entirely.
The standards layer
The skill file format is becoming a de facto standard for AI-agent extensions. Copilot CLI and Gemini CLI have both adopted compatible loaders.
What the open Agent Skills specification (https://agentskills.io/specification, verified 2026-04-28) covers: file layout, name and description as required, license, compatibility, and metadata as optional, allowed-tools as experimental, and the progressive-disclosure pattern. Claude Code extends this with harness-specific fields including model, disable-model-invocation, user-invocable, context, agent, and arguments. None of those are guaranteed to be honored by other harnesses.
What the open spec does not cover with consistent semantics: trigger logic, tool-permission behavior, and conflict resolution. The allowed-tools field is the sharpest example. In Claude Code, it is a preapproval mechanism, it does NOT block any other tool the agent already has. To restrict what a skill can do, configure permission deny rules, or dispatch to a subagent with a restricted tool surface.
Stability, dated to 2026-04-28: the file layout and the two load-bearing frontmatter fields, name and description, are stable across the Claude Code docs and the public Agent Skills spec. I have run the same skill bundle unmodified across Claude Code 2.0 through 2.4 without a frontmatter break on those two fields. Trigger semantics, plugin manifest layout, and per-harness extensions sit outside the stable surface.
Things nobody talks about
The description field is parsed by the model, not by a regex. Operators write descriptions like keyword bags ("testing, jest, vitest, mocha"). The descriptions that fire most reliably read like instructions: "Use when the user is writing or modifying tests, especially when adding test coverage to existing code." Concrete consequence: a keyword-bag description for a deployment skill missed deployments framed as "ship this" or "push to prod" because neither word was in the bag. Action: write descriptions as natural-language trigger rules, include 2-3 example user phrasings, run a test prompt that should fire the skill before shipping it.
Skill bodies are not isolated from the rest of the conversation. When a skill fires, its body enters the conversation context as a message and stays there until the harness compacts or evicts it. Skill content is loaded into the ongoing context rather than installed as a higher-priority directive, which means later user messages can override it. Concrete consequence: a skill that says "always commit after this step" keeps nudging the agent to commit on subsequent turns until the body rolls out of context. Action: write skills with explicit scope boundaries, prefer per-step instructions over global rules.
Plugin marketplaces blur the trust boundary. A plugin you install ships skills, scripts, hooks, and MCP servers. The hooks alone can intercept every tool call. A malicious plugin could exfiltrate every prompt and every file the agent reads. Action: pin plugin versions to commit SHAs, audit the manifest before install, never install from unverified sources, and isolate AI-agent harnesses to a non-privileged user account.
Skill discoverability for the model is not the same as for you. You see the skill in ~/.claude/skills/. The model sees only what the system reminder advertises. If the harness drops or truncates the skill list, the model will not know the skill exists. I had a skill that worked fine until I installed a plugin that pushed total skill count past a threshold, after which the original skill stopped firing. Action: keep total skill count modest, prefer plugins over many loose individual skills.
Transcript visibility is asymmetric: invocations are logged, deliberation is not. When a skill fires, the Skill tool call shows up in the transcript. What is not exposed: the ranked candidate list the model considered, aggregate usage counters across sessions, or per-skill hit rates over time. Dead skills accumulate invisibly. Action: bake your own counters into skill bodies (have the skill instruct the agent to append a one-line entry to a log file when it fires), grep transcripts periodically.
Implementation patterns
Pattern 1: A minimal personal skill
Drop this in ~/.claude/skills/verify-before-shipping/SKILL.md:
---
name: verify-before-shipping
description: Use when the user is about to commit, push, deploy, ship, or merge code. Triggers on phrases like "let's ship this", "ready to commit", "push to prod", "merge it", "looks good let's go". Enforces a verification checklist before any irreversible action.
---
# Verify Before Shipping
Before any commit, push, deploy, or merge, run through this checklist:
1. Read the diff. Confirm it matches what the user asked for.
2. Run the test suite. If tests fail, stop and report.
3. Run the type checker / linter. If errors, stop and report.
4. If the change touches an external API, verify the API call against current docs.
5. Confirm with the user: "Verified [what], ran [tests], confirmed [API]. Ship?"
Do not skip steps. Evidence before assertions.
The description is the load-bearing element. It enumerates trigger phrases, names the action class, and tells the model what the skill enforces.
Pattern 2: Skill with bundled script
When a skill needs a deterministic helper, ship the script in scripts/ and have the skill body invoke it via Bash.
---
name: run-benchmark
description: Use when the user asks to benchmark, profile, or measure the performance of code in this repo. Runs the standard benchmark harness with a declared workload and iteration count.
arguments:
- name: workload
description: One of "http", "db", or "mixed"
- name: iterations
description: Positive integer, max 10000
allowed-tools:
- Bash(python3 ${CLAUDE_SKILL_DIR}/scripts/benchmark.py:*)
---
# Run Benchmark
Execute the bundled benchmark harness with the declared arguments. The Python script validates each value against an allowlist:
```bash
python3 "${CLAUDE_SKILL_DIR}/scripts/benchmark.py" --workload "$workload" --iterations "$iterations"
Report the JSON output as a markdown table.
Three details matter. First, `${CLAUDE_SKILL_DIR}` resolves to whichever directory the harness loaded the skill from. A hard-coded `~/.claude/...` path silently breaks the moment the skill is shipped via a plugin or checked into a project.
Second, the harness exposes three distinct argument shapes, and conflating them is how injection bugs land. `$ARGUMENTS` is the entire raw argument string, unquoted, unsplit, unvalidated. Indexed shorthand gives positional access after whitespace splitting. Named placeholders only exist when the skill declares an `arguments:` list. None of these forms quote the value before splicing it into the rendered command text. Substitution happens in the markdown, before Bash ever sees the line, so a value containing `;` or `$(...)` becomes shell syntax at execution time.
Third, mitigation is layered. Quote every substituted value. Prefer declared named arguments over `$ARGUMENTS`. Validate every slot against an allowlist inside the script itself. For richer payloads, write a JSON file under `${CLAUDE_SKILL_DIR}` and pass the path. Pin the preapproval pattern in `allowed-tools` to the exact script path, not a wildcard.
### Pattern 3: Skill that dispatches to a subagent
When the workflow is heavy enough to warrant context isolation, declare the dispatch in the skill's frontmatter.
```markdown
---
name: deep-codebase-research
description: Use when the user asks an open-ended question about the codebase that requires reading many files (architecture questions, "how does X work", "where is Y handled"). Dispatches to a research subagent to protect the main context window.
context: fork
agent: general-purpose
---
# Deep Codebase Research
The forked agent receives this body as its starting instruction. Operate on the user's exact question, use any directory hints from the dispatch payload, and read files with Grep, Glob, and Read.
Return findings under 500 words. Include file paths with line numbers (file_path:line_number) for every claim. Do not modify code. Do not spawn further subagents.
context: fork tells the harness to run the skill in a forked context. The parent gets the subagent's final report, not the intermediate reads. agent: general-purpose selects which agent definition the fork uses; the fork inherits that agent's tool whitelist and system prompt, not the parent's.
The older pattern of having the skill body call the Agent tool with a subagent_type argument is what most third-party tutorials still show; that was an SDK-level construct, not the harness-level frontmatter contract.
Gotcha: the forked agent does not inherit the parent's loaded skills. Anything the parent's skill stack was enforcing has to be inlined into the dispatch body. I learned this the hard way when a research fork happily wrote files because the parent's "read-only research" skill had not propagated.
Decision framework
Build a skill when: the workflow is autonomous-triggered, the trigger condition is describable in 50-150 words of natural language, the workflow is mostly prompt scaffolding, and you want it loaded across multiple sessions or projects.
Build a slash command when: the operator should explicitly invoke (privileged operations, expensive operations, anything irreversible), or when the workflow is project-specific and should not auto-fire elsewhere.
Build a subagent when: you need context isolation, a different tool whitelist than the parent, or parallel execution of independent work.
Build an MCP server when: the workflow needs typed tool schemas, persistent connections, auth state, or anything the model should call as a structured tool rather than read as prose instructions.
Build a plugin when: you have 2+ related primitives that should travel together, or when you want versioned distribution across machines or teams.
Where the space is going: skills will eat more of the workflow-scaffolding surface that currently lives in CLAUDE.md files, because skills are conditionally loaded and CLAUDE.md is always loaded. The skill format will harden into a standard across harnesses, but trigger semantics will stay harness-specific. I would bet on plugins as the durable distribution unit and against any single skill marketplace as the durable index. The format is too easy to host yourself.
What I would avoid: large skill bundles installed by default. Every always-loaded byte is a tax on every turn. Curate ruthlessly. The agent that knows about ten well-described skills outperforms the agent that knows about a hundred mediocre ones, every time.
Companion code, including a starter skill bundle and the verification harness from Pattern 1, lives in the agentinfra-examples repo.