How We Ship Statnive Using Claude Code Without Burning Tokens

The First Time We Ran `/context`, We Had 12% Left

Statnive is a small team shipping a privacy-first WordPress analytics plugin. Our codebase has two git submodules (the plugin and the marketing site), 80+ Claude Code skills, 24 MCP connectors, and a release gate that runs 248 tests and 22 release gates before anything ships.

For the first two months, AI-assisted development felt magical. Then it started feeling expensive. Sessions timed out mid-task. The model seemed to forget things it had read five minutes earlier. Our Anthropic bill climbed past $6 a day for one engineer.

We ran /context for the first time and understood why. Before we had typed a single prompt, we were already using 88% of the context window. Twelve percent left for actual work.

This post is how we cut that overhead by roughly two thirds — without dropping any skills or connectors — and the four numbers that now gate every release.

The headline numbers: ~54K tokens of baseline overhead (down from ~175K), ~73% of the context window available for real work, and daily spend cut from ~$6 to ~$2–3.

What Actually Lives In Those 200K Tokens

Claude Code gives you a 200K-token context window. That sounds generous until you understand what’s eating it before your first message.

Component	What it is	Unoptimized	Our target
System prompt	Built-in Claude Code instructions	~3,200	~3,200
Built-in tools	Read, Write, Bash, Grep, Glob, Edit	~11,600	~11,600
Root CLAUDE.md	Project instructions, always loaded	8,000+	≤ 1,500
Skill metadata	`<available_skills>` entries	4,000+	≤ 2,500
MCP tool schemas	24 connectors × many tools	48,000–120,000	≤ 3,000
Auto-compact buffer	Reserved headroom	32,000	32,000

Three of these rows are the entire fight: the always-loaded CLAUDE.md, the skill metadata registry, and the MCP tool schema dump. Everything else is fixed by the harness.

The underlying mechanism is progressive disclosure. Claude Code’s skills system loads only the name and description fields of each skill at startup — roughly 30–50 tokens per skill — and defers the full SKILL.md body until the skill is actually invoked. The same trick works for MCP tool schemas and reference documentation, if you configure it. If you don’t, every tool definition, every rule, every instruction sits in context forever.

MCP Tool Overhead Was Our Biggest Leak

Running /context for the first time is a humbling experience. Here’s what we saw before we touched anything:

MCP connector	Tools	Tokens consumed
GitHub	35	~26,000
Playwright (browser automation)	21	~13,647
Slack	11	~21,000
Context7 (library docs)	~15	~8,000
Other 20 connectors	~200	~60,000+

Those five rows alone consumed roughly 60% of the context window before we opened a file. The problem is the architecture: every MCP tool schema — name, description, full JSON parameter definitions — is injected into context at session start by default. Docker’s MCP server ships 135 tools and consumes ~126,000 tokens by itself.

The fix that did 85% of the work for us was turning on MCP Tool Search. Shipped in Claude Code v2.1.7, Tool Search builds a lightweight ~5K-token index of tool names and descriptions and loads the full schema for a tool only when Claude actually calls it. Anthropic’s internal testing showed a reduction from 134K to ~5K tokens — an 85% cut — while accuracy on MCP evaluations went up (Opus 4: 49% → 74%).

Activation happens automatically when tool descriptions exceed roughly 10% of the context window, but we verify it’s active on every session via /context and watch for the “tool search enabled” line.

We wrote more about the before/after numbers and the three connectors we kept eager-loaded in a dedicated post on MCP Tool Search.

CLAUDE.md: 162 Lines, Not 800

Unlike skills and MCP tools, every byte of CLAUDE.md loads into context at every session start with no lazy loading. This includes the root file, any imports via the @path/to/file syntax (recursive up to 5 levels), and all global and enterprise files.

Our first CLAUDE.md was 820 lines. It documented every skill, every workflow, every coding standard, every release gate, every nuance of our WordPress-coding-standards configuration. It was thorough. It also consumed roughly 12% of the context window on every single session, including sessions that had nothing to do with most of what it described.

We stripped it to 162 lines by moving protocols out and replacing them with a trigger table — a compact skill-lookup pattern that replaces verbose per-skill prose:

## Skill triggers
| Trigger keywords | Skill | Domain |
|------------------|-------|--------|
| sprint, backlog, iteration | pm-sprint-plan | PM |
| deploy, release, ship | statnive-release | Dev |
| security, audit | sec-audit-remediate | Security |

This pattern costs ~800 tokens instead of 3,000+ for verbose documentation. Detailed protocols live in the individual SKILL.md files, loaded only when Claude routes to them. Path-scoped rules under .claude/rules/ pick up domain-specific constraints (React conventions, PHP coding standards, release-gate rules) only when Claude works with matching files.

The full before/after is documented in our CLAUDE.md redesign post, but the single biggest anti-pattern we removed was @-importing large reference files into the root CLAUDE.md. Every @import loads the full target file every session — we had three of them, adding roughly 6,000 tokens of permanent overhead for content the model rarely needed.

Skill Tiering: Four Buckets, One Rule

We have more than 80 skills covering product management, backend scaffolding, QA, security auditing, WordPress-specific patterns, release packaging, and more. Naively loaded, 80 skills × ~50 tokens of metadata each is 4,000 tokens of permanent overhead. Growing to 141 skills (as the jaan.to framework we build on does) can push that past 14,000.

The fix is the four-bucket tiering model defined by Claude Code’s skill system:

Bucket	Frontmatter	Metadata cost	When to use
Always-on	(default)	~40 tokens	Core workflows the model should route to automatically
Auto-invocable	(default, concise description)	~40 tokens	Domain skills with strong trigger keywords
Manual-only	`disable-model-invocation: true`	0 tokens	Slash-command-only skills — rare or destructive
Fork / subagent	`context: fork`	~40 tokens	Reviews, audits, multi-step analysis that should not pollute main context

The one-question test: does the main conversation need to see the output? If no — if the skill is self-contained and returns a summary — it’s a fork/subagent candidate and its internal token use disappears from main context. Anthropic documents subagents returning ~500–1,000 tokens from 10,000+ of internal work — roughly a 37% main-context reduction on complex tasks.

We mark roughly half our skills as disable-model-invocation: true — they’re reachable only via slash commands. This alone saved about 2,000 tokens of baseline metadata, and it actually improved routing quality for the remaining auto-invocable skills because Claude wasn’t choosing between near-duplicates.

The full bucket-by-bucket breakdown — including how we classify Statnive’s actual skill library — is in the skill tiering post.

Subagent Isolation For The Heavy Work

Three categories of work never touch our main context anymore: code reviews, security audits, and exploratory research. They run in subagents — separate Claude instances with their own 200K-token context window — and return a summary message.

The economics are subtle. Subagent sessions consume more total tokens than inline work: Anthropic documents agent teams using approximately 7× more tokens overall because each agent spawns a new Claude instance with its own system-prompt loading and tool initialization overhead.

But total token spend is not what we optimize for. We optimize for:

Main-context cleanliness. A security audit that reads 40 files and finds 3 issues returns a 600-token summary. Without isolation, the full read-loop would eat 40K tokens of main context, pushing us toward the “lost in the middle” zone where retrieval quality degrades 15–47%.
Model routing. Subagents can run on Haiku 4.5 ($1/$5 per MTok) while the main session uses Sonnet or Opus. Read-only exploration doesn’t need the top model — Haiku’s 3× cost advantage compounds fast on audits that read hundreds of files.

What One Normal Release Day Looks Like Now

On a typical Statnive release day, we burn through roughly 400K–600K tokens of actual work. Here’s where they go:

Work	Model	Pattern
Morning code review on an open PR	Sonnet (main) + Haiku (fork subagent)	Fork the review, return summary
Writing a new feature in the React dashboard	Sonnet (main)	Auto-invocable `frontend-scaffold` skill, references loaded on demand
Running the release gate	Sonnet (main)	`statnive-release` skill, bash-driven — no extra context
Writing one of these blog posts	Sonnet (main)	Draft inline, fork a review pass

Prompt caching handles the rest. Claude Code caches the stable prefix — system prompt, tool definitions, skill metadata, the root CLAUDE.md — which repeats every turn. Cache reads cost 0.1× the base price, delivering roughly 90% cost reduction on that stable prefix. Ordering content static-first and dynamic-last maximizes cache hits, so we keep the root CLAUDE.md above any injected dynamic context.

What We Did Not Optimize

Transparency about limitations, not just capabilities:

We haven’t moved to heavy hook-driven context. Research suggests SessionStart hooks can inject dynamic context (current branch, changed files, running services) to replace static CLAUDE.md content — community case studies show a further ~62% reduction. We tried it, reverted. The risk of exit code 2 accumulating error text in context spooked us. We’ll revisit after Claude Code’s hook diagnostics mature.
We still use Opus for some architectural tasks. Research says to default to Sonnet for 80% of work and reserve Opus for complex reasoning. We do this for features, but we over-index on Opus for releases because the cost of a broken release exceeds the marginal Anthropic bill.
We haven’t built CI gates for token budgets yet. The research playbook — fail the PR if the root CLAUDE.md exceeds ~1,500 tokens, if unscoped rules exceed 400, or if any SKILL.md exceeds 500 lines — would prevent regression. It’s on the roadmap. For now we enforce with manual /context checks on every session.
Our numbers are self-reported. We’re one small team. Anthropic’s public numbers (134K → 5K for Tool Search, 37% for subagent isolation, 90% for prompt caching) hold up in our measurements, but we haven’t published a rigorous benchmark the way we did for WordPress analytics plugin performance.

The Compounding Effect Is Real

The four optimizations — prompt caching, model routing, subagent isolation, and MCP tool deferral — are multiplicative, not additive. Each one alone looks modest. Stacked, they turn a 200K context window from cramped into comfortable, and they turn a $6/day habit into a ~$2–3/day tool. The full cost-accounting walkthrough is in the economics post.

What this means for Statnive’s users: the same team that ships a privacy-first analytics plugin can work at the scope of a much larger team, without trading off on test coverage or compliance rigor. Every release still passes the same 248 tests and 22 release gates. The AI workflow is scaffolding, not a shortcut.

Why We Published This

We write posts like how we built the fastest WordPress tracker and how we test Statnive because we think the WordPress ecosystem deserves more honest engineering narratives. The same applies to AI-assisted development: plenty of content claims Claude Code will transform your team, almost none of it shows the token accounting.

If you’re a WordPress plugin team, or any small engineering team, running Claude Code at scale: run /context today. See what’s eating your window. The four numbers that gate every one of our releases are now baseline overhead under 30%, root CLAUDE.md under 1,500 tokens, MCP Tool Search verified active, and zero @-imports in the root config. Those are achievable in one afternoon.

Try Statnive

The privacy-first WordPress analytics plugin built with this workflow is free on WordPress.org. Full source on GitHub — including our CLAUDE.md, our release-gate skill, and the complete test suite. Your data stays on your server. Ours stays on ours.