Writing Skills
How to create, structure, and improve Claude Code skills in this repository.
Skills live in .claude/skills/<skill-name>/ and are invoked with /<skill-name>. Each skill is a folder containing a SKILL.md entry point and optional subfolders for supporting files.
Related docs:
- Skill Patterns & Subfolder Guide — common patterns, subfolder conventions, description writing
- Skill Evaluation Guide — evals.json schema, Skill Creator setup, baseline_score
- Skill Checklist — quick validation checklist per maturity level
Skill Maturity Levels
Skills evolve through 7 maturity levels. Each level builds on the previous — a skill is at the highest level where all criteria are met. Use these levels to assess existing skills, plan improvements, and set quality targets.
Source: This framework is based on Simon Scrapes' "Every Level of Claude Code Skills" (March 2026), validated against Anthropic's official skill authoring best practices (platform.claude.com), the Barry Zhang & Mahesh Murag talk "Don't Build Agents, Build Skills Instead" (AI Engineer Summit, November 2025), and the Agent Skills Open Standard (agentskills.io).
Levels Are Cumulative but Not Always Sequential
Levels 1-7 build on each other in terms of criteria. However, in practice skills can exhibit higher-level patterns while skipping intermediate levels. For example, a skill that orchestrates 8 parallel agents (L7 structure) but has never been formally evaluated (L5) or given a learnings pipeline (L6) is "structurally L7 but maturity L4."
When assessing skills, note both the structural level (highest level pattern present) and the maturity level (highest level where ALL criteria through that level are met). The goal is to close gaps — add measurement and self-improvement to skills that already have orchestration.
Not every skill needs to reach Level 7. A simple utility skill (clean-env) may peak at L3-L4. The target level depends on how critical and frequently-used the skill is.
Level 1: Anatomy — What Good Skills Are Made Of
The skill exists as a properly structured folder with a valid SKILL.md.
Criteria:
- Has
SKILL.mdwith valid frontmatter (name,description,metadata) - Has numbered, self-contained steps
- Has guardrails (what the skill must NOT do)
- Folder name matches
namefield and slash command
Anti-patterns:
- No frontmatter or missing
description - Steps are vague paragraphs instead of numbered instructions
- No guardrails — skill can do anything without constraints
Reference: Anthropic's Skill Authoring Best Practices defines this baseline structure. The Agent Skills Open Standard (adopted by 20+ tools including OpenAI Codex, Gemini CLI, GitHub Copilot, Cursor) uses the same SKILL.md format.
Level 2: The Golden Rule — Context Management & Triggering
The skill uses progressive disclosure and has a description optimized for reliable auto-triggering.
The Golden Rule: Only the skill's description field is loaded into context initially. Claude decides whether to load the full skill based on that description alone. If the description is vague, the skill never fires automatically.
Source: Confirmed by Barry Zhang & Mahesh Murag (Anthropic) in "Don't Build Agents, Build Skills Instead" — only metadata is loaded initially, the LLM decides whether to load the full skill. Also confirmed by Anthropic's official docs: descriptions are capped at 250 characters in skill listings.
Criteria (in addition to L1):
- Description is written in third person (it gets injected into the system prompt)
- Description front-loads the key use case in the first 250 characters (longer gets truncated in listings)
- Description includes specific trigger terms, not vague phrases like "helps with code"
SKILL.mdbody is under 500 lines — heavy content extracted to subfolders- Progressive disclosure is actively used (see Size Limits & Progressive Disclosure below)
Example — bad vs good description:
# Bad: vague, passive, no trigger terms
description: A skill that helps with various document-related tasks
# Good: specific, third-person, front-loaded action + trigger terms
description: Create a Pull Request from the current branch — runs local checks, picks target branch, and opens the PR on GitHub
Known limitation: Multiple independent sources report ~50% auto-activation rates for skills. The SLASH_COMMAND_TOOL_CHAR_BUDGET defaults to 1% of context window, limiting how many descriptions fit. With large skill libraries, explicit /skill-name invocation is more reliable than auto-triggering.
Sources: Jesse Vincent (blog.fsck.com), Scott Spence (scottspence.com), GitHub community discussions.
Level 3: Import & Improve — Build on Proven Foundations
The skill is built on recognized patterns, community best practices, or existing validated templates — then customized and improved.
Why this matters: Research shows ~80% of community-created skills make output worse. The 20% that work are built by domain experts using iterative evaluation (Corporate Waters, 2026). Starting from proven foundations avoids reinventing broken wheels.
Criteria (in addition to L2):
- Built on a recognized pattern: Anthropic official patterns, validated community skill, or your own proven pattern library
- Has at least one supporting subfolder:
examples/(output format demos),references/(standards docs), ortemplates/(fillable scaffolds) - Uses at least one common pattern consistently (model guard, AskUserQuestion, destructive action confirmation, quality gates — see skill-patterns.md)
- References standards documents where applicable (in
references/)
What the script auto-detects for L3: at least one common pattern keyword present in SKILL.md from any of these categories: (1) model guard (
model:,On Haiku,active model, ...), (2)AskUserQuestion, (3) quality gates (composer check,phpcs,phpstan,make check,ruff,psalm), (4) subfolder references (examples/,refs/,references/, ortemplates/as text in SKILL.md), or (5) destructive/browser patterns (confirm.*before,browser_snapshot,browser_navigate,## Hard Rule,## Verification,acceptance_criteria) — AND existence of at least one ofexamples/,references/, ortemplates/on disk. These are structural proxies for the full criteria above.
Sources for proven patterns:
- Anthropic's official
/skill-creatorbundled plugin (GitHub) - Agent Skills Open Standard: agentskills.io/specification
- Community repos: awesome-claude-code, awesome-claude-skills
- Your own skill library — reuse patterns that work across skills
Level 4: Personalization — Your Business Context
The skill contains domain-specific knowledge that makes it uniquely effective for your project. It couldn't be dropped into another company and work as-is.
Criteria (in addition to L3):
- Has business-specific reference files (e.g.,
references/dutch-gov-backend-standards.md, persona cards, ADR documents) - Uses project-specific templates (PR format, spec format matching your workflow)
- References your architecture, ADRs, coding standards — not just generic best practices
- Skill output is recognizably tailored to your domain
Example — generic vs personalized:
# Generic (L3): "Run linting checks after code changes"
# Personalized (L4): "Run `composer check:strict` from the app directory.
# This runs PHPCS, PHPMD, Psalm, and PHPStan.
# Fix all issues before continuing — do not skip.
# Follow PSR-4 structure with Conduction naming conventions."
In this repository, L4 personalization includes:
- Dutch government standards (GEMMA, ZGW, BIO2, Common Ground)
- NL Design System CSS variables and WCAG AA compliance
- Conduction app patterns (openregister, opencatalogi, etc.)
- 8 Dutch citizen/professional personas for testing
- OpenSpec workflow integration (specs, ADRs, changes)
Level 5: Measurement — Evaluated and Optimized with Data
The skill has been systematically tested with evaluation scenarios. Its performance has been measured and improved based on data, not just intuition.
Why most skills plateau at L4: A skill that "feels right" but has never been measured may have blind spots, false confidence, or suboptimal triggering. Measurement turns intuition into evidence.
Criteria (in addition to L4):
- Has 3+ evals with input prompts, expected output characteristics, and assertion criteria
- Description trigger testing: 10+
should_trigger+ 10+should_not_triggerprompts - Evals have been run:
last_validatedis set inevals.json - Baseline measurement exists: what does Claude produce without the skill?
- Skill has been through at least one improve cycle based on eval results
evals/folder withevals.json;timing.jsonandgrading.jsonproduced after running evals
What the script auto-detects for L5: 3+ evals (checks
evalskey, falls back toscenarios), 10+/10+ trigger tests,last_validatednon-null inevals.json,grading.jsonpresent in theevals/folder, andtiming.jsonfound anywhere underevals/. Both files are produced by running evals via/skill-creator—grading.jsonalone (from a read-only simulation) is not sufficient.timing.jsonis evidence that the skill was actually executed via the eval runner, not just graded from a static spec read. Baseline measurement and improve cycles are required for true L5 but not auto-checked by the script.
For the full evals.json schema, Skill Creator setup guide, and baseline_score usage, see skill-evals.md.
Level 6: Self-Improvement — Skills That Learn from Experience
The skill captures learnings during execution and periodically consolidates them into improved standing rules.
The Learnings-to-Rules Pipeline:
Execution -> Capture observations -> learnings.md -> Consolidation -> Updated SKILL.md rules
^ |
└──────────────────────────────┘
Criteria (in addition to L5):
- Has a
learnings.mdfile in the skill folder - SKILL.md includes a "capture learnings" step near the end of execution
- Each learning entry is dated and atomic (one insight per bullet)
- Learnings have 5 sections: Patterns That Work, Mistakes to Avoid, Domain Knowledge, Open Questions, Consolidated Principles
- Consolidation triggers at ~80-100 entries: remove outdated, merge duplicates, extract cross-entry patterns, promote validated principles to SKILL.md guardrails/rules
Improvement: Learning Candidates Ledger
Rather than writing every observation directly to learnings.md, use a two-stage buffer to prevent garbage learnings from polluting context:
learning-candidates.md -> (promotion criteria met?) -> learnings.md -> SKILL.md rules
↓ (no)
discarded after 30 days
Promotion criteria: observation confirmed across 3+ executions, or resolves a measured eval failure.
Source: The learnings-to-rules pattern is documented by MindStudio (mindstudio.ai/blog) and implemented in community tools: claude-reflect-system, claude-meta, turbo. The two-stage buffer improvement was suggested by @jacksonporter1949 in community discussion.
Example learnings.md:
# Learnings — create-pr
## Patterns That Work
- 2026-03-15: Branch protection on `main` requires status checks — always verify checks exist first
- 2026-03-20: Including GitHub issue number in PR title improves traceability
## Mistakes to Avoid
- 2026-03-18: Do NOT create PR with uncommitted changes — causes confusion about what's included
- 2026-03-22: Lock file conflicts (composer.lock) -> run `composer update` locally, don't accept either side
## Domain Knowledge
- 2026-03-19: Conduction repos use `development` as primary integration branch, not `main`
## Open Questions
- Should PRs auto-assign reviewers based on CODEOWNERS?
## Consolidated Principles
- (promoted after 3+ confirmations)
- Always run `composer check:strict` before creating PR — catches 90% of review feedback
Level 7: AI Workforce — Multi-Agent Orchestration
The skill orchestrates multiple agents or is part of a coordinated workflow where specialized skills work together autonomously.
Criteria (in addition to L6):
- Orchestrates sub-agents (spawns parallel workers) or is orchestrated by a parent skill
- Part of a defined workflow chain with explicit handoff points:
opsx-new -> opsx-ff -> opsx-plan-to-issues -> opsx-apply -> opsx-verify -> opsx-archive
- Hands off context to the next skill (shows "Next steps: run
/opsx-verify") - Uses isolated execution contexts when needed (git worktrees, Docker containers)
- Has autonomous operation capability for defined scope
- Participates in parallel execution (e.g., 8 agents simultaneously)
Orchestration patterns in this repository:
| Pattern | Example | Description |
|---|---|---|
| Pipeline | opsx-pipeline | Full lifecycle for 1+ changes in parallel via subagents |
| Fan-out/Fan-in | test-counsel, feature-counsel | Spawn N agents in parallel, synthesize results |
| Sequential Chain | opsx-new -> ... -> opsx-archive | Each skill hands off to the next |
| Autonomous Loop | opsx-apply-loop | Runs apply->verify cycle with retry logic, auto-archives |
| Multi-perspective | test-app | Spawns 6 specialized test agents simultaneously |
Reference: Claude Code Agent Teams (experimental) and the Agent SDK support these patterns natively. See Claude Code Agent Teams docs.
Important — structural vs mature L7: A skill can exhibit L7 orchestration patterns (spawning subagents, workflow chains) while lacking L5 measurement and L6 self-improvement. Such a skill is "structurally L7, maturity L4" — it has the architecture of a workforce but the self-awareness of a static tool. The goal is to close the L5-L6 gap so the orchestration is not just complex but also measurably effective and continuously improving.
Maturity Assessment Quick Reference
| Check | Yes -> at least | No -> stuck at |
|---|---|---|
| Has SKILL.md with frontmatter, steps, guardrails? | L1 | Below L1 |
| Description optimized for triggering, progressive disclosure used, <500 lines? | L2 | L1 |
| Built on proven patterns, has examples/references? | L3 | L2 |
| Contains business-specific domain context? | L4 | L3 |
| Has eval scenarios, measured and optimized with data? | L5 | L4 |
| Has learnings.md with consolidation process? | L6 | L5 |
| Orchestrates agents or part of workflow chain? | L7 | L6 |
Maintaining Skill Maturity
Skills degrade over time. Schedule periodic reviews:
Monthly:
- Run eval scenarios for L5+ skills — have pass rates changed?
- Review
learnings.mdfor L6+ skills — consolidate if >80 entries - Check description trigger rates for frequently-used skills
Quarterly:
- Re-evaluate L4 skills: has the business context changed? (new ADRs, standards, deprecated patterns)
- Check if L7 workflow chains still work end-to-end
- Prune skills that are never invoked
When something breaks:
- Add the failure to
learnings.md(L6 skills) or create an eval scenario (L5 skills) - Don't just fix the symptom — update the skill's standing rules to prevent recurrence
- If a skill consistently fails, consider splitting it (simpler skills trigger more reliably)
Common Upgrade Paths
L4 -> L5 (most common need): Create 3 eval scenarios from real usage. Use /skill-creator to run the evals — it spawns with-skill and baseline subagents, captures timing.json from task notifications, grades assertions into grading.json, and launches the benchmark viewer. Both files must exist under evals/ for the script to grant L5. Identify one weakness from the benchmark, improve the skill, re-eval.
L5 -> L6: Add learnings.md and a "capture learnings" step to SKILL.md. After 5-10 executions, review learnings and promote validated patterns to standing rules.
L4 -> L7 (standalone -> workflow): Identify which workflow chain the skill belongs to. Add "Next steps" guidance. Add context handoff. Test the full chain end-to-end.
Fixing "structurally L7 but maturity L4": Add L5 evals and L6 learnings to the orchestrator skill first — its improvements cascade to all sub-agents.
Folder Structure
.claude/skills/
<skill-name>/
SKILL.md <- required: the skill logic (L1+)
templates/ <- files Claude fills in and writes to disk (L2+)
references/ <- standards and guides Claude reads for context (L3+)
examples/ <- worked output demonstrations and few-shot patterns (L3+)
assets/ <- non-markdown static files (SVG, JS, YAML, JSON)
scripts/ <- executable code the skill invokes (.py, .sh) — see below
evals/ <- evaluation scenarios and results (L5+)
learnings.md <- accumulated execution insights (L6+)
learning-candidates.md <- unverified observations awaiting promotion (L6+)
Not every skill needs all subfolders. Create them only when content qualifies. A skill with no supporting files is just a SKILL.md — no subfolders needed.
Script convention: Any executable that the skill invokes (.py, .sh, etc.) lives in scripts/. SKILL.md and reference files invoke them via python -m scripts.<name> or bash scripts/<name>.sh. Don't put scripts in assets/ (which is for non-executable static files like SVG/JSON) or references/ (which is for content Claude reads, not runs). The one exception is vendored upstream skills (e.g., skill-creator) where the layout is dictated by upstream and may differ.
Extraction threshold: Extract content from SKILL.md into a subfolder when the block is:
- 10%+ of the total file size, AND
- A standalone unit (not tightly coupled to surrounding procedural steps)
Do not extract short inline code snippets, conditional logic, or step-by-step instructions — those belong in SKILL.md.
Size Limits & Progressive Disclosure
Claude loads skill content in three layers. Respecting these limits keeps skills fast and context-efficient.
| Layer | What | Budget | When loaded |
|---|---|---|---|
| Metadata | name + description in frontmatter | ~100 words / 250 characters visible in listings | Always — part of system prompt |
| SKILL.md body | Steps, guardrails, inline instructions | < 500 lines | When skill triggers (auto or /slash) |
| Reference files | references/, templates/, examples/, assets/ | Unlimited (but each file adds to context on read) | On demand during execution |
Source: Anthropic's Skill Authoring Best Practices — "Concise is key: the context window is a shared resource. Only add what Claude does not already know."
When SKILL.md exceeds 500 lines:
- Move large reference blocks to
references/and link them - Move output format examples to
examples/ - Move templates with placeholders to
templates/ - Keep in SKILL.md: procedural steps, conditional logic, guardrails, short inline snippets (<20 lines)
Context budget with many skills: The SLASH_COMMAND_TOOL_CHAR_BUDGET defaults to 1% of context window (with 8,000 character fallback). With 50+ skills, descriptions compete for space — keep them concise and front-loaded.
SKILL.md Format
---
name: <skill-name>
description: <one-line action description — shown in skill picker, max 250 chars visible>
metadata:
category: <Workflow | Testing | Development | Delivery>
tags: [tag1, tag2]
---
# Skill Title
Brief explanation of what the skill does.
**Input**: How to invoke the skill and what arguments it accepts.
**Steps**
1. **Step name**
Instructions...
2. **Step name**
Instructions...
**Guardrails**
- What the skill must never do
- What to check before destructive actions
Frontmatter Rules
namemust match the folder name exactly (e.g. foldertest-counsel->name: test-counsel)descriptionis what users see in the skill picker AND what Claude uses to decide whether to load the skill — make it action-oriented, specific, and written in third person- Front-load the key use case in the first 250 characters
- Include specific trigger terms (verbs and nouns a user would naturally say)
tagsare free-form; use them for filtering and grouping
Advanced Frontmatter Fields
Beyond name, description, and metadata, several optional frontmatter fields control skill behavior:
---
name: my-skill
description: "..."
allowed-tools: [Read, Glob, Grep, Bash] # restrict which tools the skill can use
context: fork # run in isolated subagent context
paths: ["openspec/**", "docs/**"] # only auto-trigger when working in these paths
disable-model-invocation: true # prevent auto-triggering; slash-only invocation
---
| Field | Purpose | When to use |
|---|---|---|
allowed-tools | Restricts which tools the skill can call | Safety-critical skills — e.g., a read-only audit skill should not have Write access |
context: fork | Runs the skill in an isolated subagent context | Skills that should not pollute the parent conversation's context window |
paths | Limits auto-activation to specific file patterns | Skills that only apply to certain parts of the repo (e.g., openspec/**) |
disable-model-invocation | Blocks auto-triggering entirely | Skills that should only run when explicitly invoked via /skill-name |
Source: Anthropic's Skill Authoring Best Practices and the Agent Skills Open Standard.
Dynamic Content in Skills
Skills can inject dynamic content at invocation time:
| Syntax | Purpose | Example |
|---|---|---|
$ARGUMENTS | Full argument string passed after /skill-name | /app-create my-app -> $ARGUMENTS = "my-app" |
$ARGUMENTS[0] | Individual positional argument | First argument after the skill name |
${CLAUDE_SKILL_DIR} | Absolute path to the skill's own folder | Useful for referencing bundled scripts or assets |
!`command` | Shell command output injected before skill loads | !`git branch --show-current` injects the current branch name |
The !`command` syntax is particularly powerful for injecting runtime context into skills — the command runs before the skill content is loaded into context.
Degrees of Freedom
Match the skill's specificity to its task fragility:
| Task type | Freedom level | Skill style |
|---|---|---|
| Exploration, code review, brainstorming | High | Provide goals and constraints, let Claude decide approach |
| Feature implementation, refactoring | Medium | Provide steps with decision points, let Claude adapt |
| Database migrations, production deploys, CI config | Low | Prescribe exact commands, explicit confirmation gates |
A skill for /app-explore (thinking mode) should have high degrees of freedom — it's about creativity and investigation. A skill for /app-apply (config -> code sync) should have low freedom — it must apply changes predictably and safely.
Source: Anthropic's Skill Authoring Best Practices — "Set appropriate degrees of freedom: match specificity to task fragility."
Path Conventions in Skill Content
When a SKILL.md instructs Claude to read a file (persona card, shared doc, spec), the path is resolved against Claude's current working directory at invocation time — not relative to the SKILL.md file itself. This makes paths workspace-sensitive, and getting them wrong is one of the most common silent failures in this project.
Rules by target location
| Target | Canonical path form | Example |
|---|---|---|
Personas in hydra/personas/ (Nextcloud workspace) | hydra/personas/<slug>.md | hydra/personas/henk-bakker.md |
Personas inside a .claude-centered workspace (where .claude/ lives at the project root) | .claude/personas/<slug>.md | .claude/personas/<role>.md |
Company-wide specs (hydra openspec/specs/) | hydra/openspec/specs/<capability>/spec.md | hydra/openspec/specs/nextcloud-app/spec.md |
Company-wide ADRs (hydra openspec/architecture/) | hydra/openspec/architecture/adr-<NNN>-*.md | hydra/openspec/architecture/adr-001-data-layer.md |
Developer/Claude docs (lives in ConductionNL/.github repo, cloned separately — not a subdir of any project workspace) | Full GitHub URL | https://github.com/ConductionNL/.github/blob/main/docs/claude/writing-specs.md |
| App-specific docs (inside the current app) | Workspace-relative | {APP_DIR}/docs/features/README.md |
| App-specific specs | Workspace-relative | openspec/specs/<capability>/spec.md |
Why not .claude/personas/ or .claude/docs/ in hydra skills?
.claude/personas/exists in.claude-centered workspaces but not in hydra — hydra keeps personas athydra/personas/. Using the.claude/prefix in a hydra skill silently fails whenever the skill is invoked from a workspace where that directory doesn't exist..claude/docs/is a workspace-local cache of dev docs. Some workspaces keep copies of the.githubguides locally; hydra does not. Skills in hydra that read dev docs should link to the canonical.githubrepo URL — it's the only form that resolves everywhere.
Exception: skills intended to be copied into a workspace
Skills authored inside a .claude-centered workspace may legitimately use .claude/personas/ and .claude/docs/ — those paths match the workspace they run in. When such a skill is copied to another workspace, the paths must be rewritten to match the destination convention. Don't cross-pollinate paths: a skill living in hydra's .claude/skills/ uses hydra conventions; a skill in another workspace's .claude/skills/ uses that workspace's conventions.
Checklist when authoring a new skill
- Where will this skill be invoked from? (workspace root CWD = what?)
- Do every
Read/ file reference match that CWD? - If the target is in
.github, use a GitHub URL — never assume the.githubrepo is cloned as a subdir. - If the target is in sibling workspaces (like personas in
hydra/personas/), use the workspace-prefixed path. - Run
/sync-docs dev→ Phase 6 Part B to catch any drift after the skill is added.
Naming Conventions
Skills use the namespace-action format with lowercase letters, numbers, and hyphens only.
| Namespace | Covers |
|---|---|
opsx- | OpenSpec workflow steps (opsx-new, opsx-apply, opsx-archive) |
app- | Nextcloud app lifecycle (app-design, app-create, app-explore, app-apply) |
test- | Testing — counsel, persona agents, regression (test-counsel, test-persona-henk) |
team- | Scrum team agents (team-backend, team-qa, team-reviewer) |
swc- | Softwarecatalogus-specific (swc-test, swc-update) |
ecosystem- | Ecosystem research (ecosystem-investigate, ecosystem-propose-app) |
tender- | Tender intelligence (tender-scan, tender-status) |
The folder name, name frontmatter field, and the slash command all must match exactly.
What NOT to Put in SKILL.md
Extract to subfolders when a block qualifies (10%+, standalone). Leave in SKILL.md:
- Short inline code snippets (< 20 lines, embedded in a step)
- Conditional logic that references the step context
- Procedural steps that only make sense in sequence
- Guardrails and constraints (they're part of the skill's contract)