How We Ship Statnive Using Claude Code Without Burning Tokens
A WordPress plugin team's actual token budget — 80+ skills, 24 MCP connectors, and a 200K context window. What we measured, what we cut, and the four numbers that now gate every release.
The First Time We Ran /context, We Had 12% Left
Statnive is a small team shipping a privacy-first WordPress analytics plugin. Our codebase has two git submodules (the plugin and the marketing site), 80+ Claude Code skills, 24 MCP connectors, and a release gate that runs 248 tests and 22 release gates before anything ships.
For the first two months, AI-assisted development felt magical. Then it started feeling expensive. Sessions timed out mid-task. The model seemed to forget things it had read five minutes earlier. Our Anthropic bill climbed past $6 a day for one engineer.
We ran /context for the first time and understood why. Before we had typed a single prompt, we were already using 88% of the context window. Twelve percent left for actual work.
This post is how we cut that overhead by roughly two thirds — without dropping any skills or connectors — and the four numbers that now gate every release.
The headline numbers: ~54K tokens of baseline overhead (down from ~175K), ~73% of the context window available for real work, and daily spend cut from ~$6 to ~$2–3.
What Actually Lives In Those 200K Tokens
Claude Code gives you a 200K-token context window. That sounds generous until you understand what’s eating it before your first message.
| Component | What it is | Unoptimized | Our target |
|---|---|---|---|
| System prompt | Built-in Claude Code instructions | ~3,200 | ~3,200 |
| Built-in tools | Read, Write, Bash, Grep, Glob, Edit | ~11,600 | ~11,600 |
| Root CLAUDE.md | Project instructions, always loaded | 8,000+ | ≤ 1,500 |
| Skill metadata | <available_skills> entries | 4,000+ | ≤ 2,500 |
| MCP tool schemas | 24 connectors × many tools | 48,000–120,000 | ≤ 3,000 |
| Auto-compact buffer | Reserved headroom | 32,000 | 32,000 |
Three of these rows are the entire fight: the always-loaded CLAUDE.md, the skill metadata registry, and the MCP tool schema dump. Everything else is fixed by the harness.
The underlying mechanism is progressive disclosure. Claude Code’s skills system loads only the name and description fields of each skill at startup — roughly 30–50 tokens per skill — and defers the full SKILL.md body until the skill is actually invoked. The same trick works for MCP tool schemas and reference documentation, if you configure it. If you don’t, every tool definition, every rule, every instruction sits in context forever.
MCP Tool Overhead Was Our Biggest Leak
Running /context for the first time is a humbling experience. Here’s what we saw before we touched anything:
| MCP connector | Tools | Tokens consumed |
|---|---|---|
| GitHub | 35 | ~26,000 |
| Playwright (browser automation) | 21 | ~13,647 |
| Slack | 11 | ~21,000 |
| Context7 (library docs) | ~15 | ~8,000 |
| Other 20 connectors | ~200 | ~60,000+ |
Those five rows alone consumed roughly 60% of the context window before we opened a file. The problem is the architecture: every MCP tool schema — name, description, full JSON parameter definitions — is injected into context at session start by default. Docker’s MCP server ships 135 tools and consumes ~126,000 tokens by itself.
The fix that did 85% of the work for us was turning on MCP Tool Search. Shipped in Claude Code v2.1.7, Tool Search builds a lightweight ~5K-token index of tool names and descriptions and loads the full schema for a tool only when Claude actually calls it. Anthropic’s internal testing showed a reduction from 134K to ~5K tokens — an 85% cut — while accuracy on MCP evaluations went up (Opus 4: 49% → 74%).
Activation happens automatically when tool descriptions exceed roughly 10% of the context window, but we verify it’s active on every session via /context and watch for the “tool search enabled” line.
We wrote more about the before/after numbers and the three connectors we kept eager-loaded in a dedicated post on MCP Tool Search.
CLAUDE.md: 162 Lines, Not 800
Unlike skills and MCP tools, every byte of CLAUDE.md loads into context at every session start with no lazy loading. This includes the root file, any imports via the @path/to/file syntax (recursive up to 5 levels), and all global and enterprise files.
Our first CLAUDE.md was 820 lines. It documented every skill, every workflow, every coding standard, every release gate, every nuance of our WordPress-coding-standards configuration. It was thorough. It also consumed roughly 12% of the context window on every single session, including sessions that had nothing to do with most of what it described.
We stripped it to 162 lines by moving protocols out and replacing them with a trigger table — a compact skill-lookup pattern that replaces verbose per-skill prose:
## Skill triggers
| Trigger keywords | Skill | Domain |
|------------------|-------|--------|
| sprint, backlog, iteration | pm-sprint-plan | PM |
| deploy, release, ship | statnive-release | Dev |
| security, audit | sec-audit-remediate | Security |
This pattern costs ~800 tokens instead of 3,000+ for verbose documentation. Detailed protocols live in the individual SKILL.md files, loaded only when Claude routes to them. Path-scoped rules under .claude/rules/ pick up domain-specific constraints (React conventions, PHP coding standards, release-gate rules) only when Claude works with matching files.
The full before/after is documented in our CLAUDE.md redesign post, but the single biggest anti-pattern we removed was @-importing large reference files into the root CLAUDE.md. Every @import loads the full target file every session — we had three of them, adding roughly 6,000 tokens of permanent overhead for content the model rarely needed.
Skill Tiering: Four Buckets, One Rule
We have more than 80 skills covering product management, backend scaffolding, QA, security auditing, WordPress-specific patterns, release packaging, and more. Naively loaded, 80 skills × ~50 tokens of metadata each is 4,000 tokens of permanent overhead. Growing to 141 skills (as the jaan.to framework we build on does) can push that past 14,000.
The fix is the four-bucket tiering model defined by Claude Code’s skill system:
| Bucket | Frontmatter | Metadata cost | When to use |
|---|---|---|---|
| Always-on | (default) | ~40 tokens | Core workflows the model should route to automatically |
| Auto-invocable | (default, concise description) | ~40 tokens | Domain skills with strong trigger keywords |
| Manual-only | disable-model-invocation: true | 0 tokens | Slash-command-only skills — rare or destructive |
| Fork / subagent | context: fork | ~40 tokens | Reviews, audits, multi-step analysis that should not pollute main context |
The one-question test: does the main conversation need to see the output? If no — if the skill is self-contained and returns a summary — it’s a fork/subagent candidate and its internal token use disappears from main context. Anthropic documents subagents returning ~500–1,000 tokens from 10,000+ of internal work — roughly a 37% main-context reduction on complex tasks.
We mark roughly half our skills as disable-model-invocation: true — they’re reachable only via slash commands. This alone saved about 2,000 tokens of baseline metadata, and it actually improved routing quality for the remaining auto-invocable skills because Claude wasn’t choosing between near-duplicates.
The full bucket-by-bucket breakdown — including how we classify Statnive’s actual skill library — is in the skill tiering post.
Subagent Isolation For The Heavy Work
Three categories of work never touch our main context anymore: code reviews, security audits, and exploratory research. They run in subagents — separate Claude instances with their own 200K-token context window — and return a summary message.
The economics are subtle. Subagent sessions consume more total tokens than inline work: Anthropic documents agent teams using approximately 7× more tokens overall because each agent spawns a new Claude instance with its own system-prompt loading and tool initialization overhead.
But total token spend is not what we optimize for. We optimize for:
- Main-context cleanliness. A security audit that reads 40 files and finds 3 issues returns a 600-token summary. Without isolation, the full read-loop would eat 40K tokens of main context, pushing us toward the “lost in the middle” zone where retrieval quality degrades 15–47%.
- Model routing. Subagents can run on Haiku 4.5 ($1/$5 per MTok) while the main session uses Sonnet or Opus. Read-only exploration doesn’t need the top model — Haiku’s 3× cost advantage compounds fast on audits that read hundreds of files.
What One Normal Release Day Looks Like Now
On a typical Statnive release day, we burn through roughly 400K–600K tokens of actual work. Here’s where they go:
| Work | Model | Pattern |
|---|---|---|
| Morning code review on an open PR | Sonnet (main) + Haiku (fork subagent) | Fork the review, return summary |
| Writing a new feature in the React dashboard | Sonnet (main) | Auto-invocable frontend-scaffold skill, references loaded on demand |
| Running the release gate | Sonnet (main) | statnive-release skill, bash-driven — no extra context |
| Writing one of these blog posts | Sonnet (main) | Draft inline, fork a review pass |
Prompt caching handles the rest. Claude Code caches the stable prefix — system prompt, tool definitions, skill metadata, the root CLAUDE.md — which repeats every turn. Cache reads cost 0.1× the base price, delivering roughly 90% cost reduction on that stable prefix. Ordering content static-first and dynamic-last maximizes cache hits, so we keep the root CLAUDE.md above any injected dynamic context.
What We Did Not Optimize
Transparency about limitations, not just capabilities:
- We haven’t moved to heavy hook-driven context. Research suggests
SessionStarthooks can inject dynamic context (current branch, changed files, running services) to replace static CLAUDE.md content — community case studies show a further ~62% reduction. We tried it, reverted. The risk ofexit code 2accumulating error text in context spooked us. We’ll revisit after Claude Code’s hook diagnostics mature. - We still use Opus for some architectural tasks. Research says to default to Sonnet for 80% of work and reserve Opus for complex reasoning. We do this for features, but we over-index on Opus for releases because the cost of a broken release exceeds the marginal Anthropic bill.
- We haven’t built CI gates for token budgets yet. The research playbook — fail the PR if the root CLAUDE.md exceeds ~1,500 tokens, if unscoped rules exceed 400, or if any
SKILL.mdexceeds 500 lines — would prevent regression. It’s on the roadmap. For now we enforce with manual/contextchecks on every session. - Our numbers are self-reported. We’re one small team. Anthropic’s public numbers (134K → 5K for Tool Search, 37% for subagent isolation, 90% for prompt caching) hold up in our measurements, but we haven’t published a rigorous benchmark the way we did for WordPress analytics plugin performance.
The Compounding Effect Is Real
The four optimizations — prompt caching, model routing, subagent isolation, and MCP tool deferral — are multiplicative, not additive. Each one alone looks modest. Stacked, they turn a 200K context window from cramped into comfortable, and they turn a $6/day habit into a ~$2–3/day tool. The full cost-accounting walkthrough is in the economics post.
What this means for Statnive’s users: the same team that ships a privacy-first analytics plugin can work at the scope of a much larger team, without trading off on test coverage or compliance rigor. Every release still passes the same 248 tests and 22 release gates. The AI workflow is scaffolding, not a shortcut.
Why We Published This
We write posts like how we built the fastest WordPress tracker and how we test Statnive because we think the WordPress ecosystem deserves more honest engineering narratives. The same applies to AI-assisted development: plenty of content claims Claude Code will transform your team, almost none of it shows the token accounting.
If you’re a WordPress plugin team, or any small engineering team, running Claude Code at scale: run /context today. See what’s eating your window. The four numbers that gate every one of our releases are now baseline overhead under 30%, root CLAUDE.md under 1,500 tokens, MCP Tool Search verified active, and zero @-imports in the root config. Those are achievable in one afternoon.
Try Statnive
The privacy-first WordPress analytics plugin built with this workflow is free on WordPress.org. Full source on GitHub — including our CLAUDE.md, our release-gate skill, and the complete test suite. Your data stays on your server. Ours stays on ours.