ctxweight
X-ray your AI agent's context — health and token cost in one command.
ctxweight is an offline, developer-first auditor for the context files that drive AI coding agents (CLAUDE.md, AGENTS.md, .cursorrules, copilot-instructions.md, skills, and connected MCP servers). It tells you two things most teams are flying blind on:
- Is my context healthy? — oversized, truncated files and leaked secrets (redundancy, drift, and contradiction checks are on the roadmap).
- What is my context costing me? — how many tokens (and dollars) each piece eats on every single agent run.
Think
npm audit, but for the context window. No config leaves your machine.
Example output
Running ctxweight on a real repo (browser-use/browser-use):
ctxweight · browser-use
GRADE D · always-on 11,647 tok/turn (5.8%) · on-demand 11,720 tok
always-on = read on every message · on-demand = only loaded when a skill/rule runs
2 files too long · 1 truncated (>32 KiB) · 1 email in context
truncated = too big; the agent cuts it off and won't read all of it
Suggestions
→ Trim AGENTS.md to the essentials, or split it via @imports and move task-specific guidance into on-demand skills.
→ Split AGENTS.md (38.4 KiB): move stable, rarely-read sections into @imported files or on-demand skills so the always-on core stays under 32 KiB.
→ Remove or redact this — context files are committed and sent to the agent on every run.
AGENTS.md 9,141 (39%) ██████·········· [always-on]
CLAUDE.md 2,506 (11%) ██·············· [always-on]
skills/x402/SKILL.md 4,123 (18%) ███············· [on-demand]
skills/qa/SKILL.md 2,250 (10%) ██·············· [on-demand]
skills/remote-browser/SKILL.md 1,839 ( 8%) █··············· [on-demand]
browser_use/skills/browser-use/SKILL.md 1,145 ( 5%) █··············· [on-demand]
skills/browser-use/SKILL.md 1,145 ( 5%) █··············· [on-demand]
… and 2 more (1,218 tokens, --full)
In a real terminal the grade badge and bars are colored (green A/B · amber C/D · red F). Add --full for every finding and every file.
Why this exists
Every AI coding agent reads a context file before it does anything. The instinct is to make that file bigger — and that's exactly the trap.
A 2026 ETH Zürich study found that auto-generated, redundant context files reduced task success rates and increased inference cost by over 20%, mostly by duplicating what the agent could already read from the code and README. The failure modes are always the same:
- Bloat — files grow past the model's effective instruction budget; the rest is silently ignored ("lost in the middle").
- Redundancy — rules that restate the README or things a linter already enforces.
- Drift —
CLAUDE.mdandAGENTS.mdand.cursorrulesslowly disagree. - Leaked secrets — these files are committed and end up in the agent's logged context, so an API key or internal hostname in there is a real exposure.
- Invisible token cost — config files, skills, and every connected MCP server's tool schema all consume the window on every run, and nobody is measuring it.
The market is full of generators for these files. ctxweight is the opposite: an auditor. It doesn't write your context for you — it tells you what's wrong with the context you have and what it's costing you.
What it does
ctxweight health — context quality
Scans every agent-context source in the repo and reports:
| Check | What it flags |
|---|---|
| Budget / length | Files over the recommended size (200 lines) or past the 32 KiB hard-truncation limit some agents enforce |
| Secrets & PII | API keys, tokens, private IPs and emails committed into context — with placeholder + entropy awareness, so it won't flag your_api_key_here, m0-your-api-key, or RFC 2606 example.com |
Output is a single Context Health score (A–F) plus an itemized, fixable list. A repo with no agent-context files at all scores N/A — "nothing to audit", not a perfect A.
Planned (roadmap — not yet implemented):
- Redundancy — content duplicated from the README, the code, or another context file
- Drift — the same rule present in one context file but missing or reworded in another
- Contradictions — conflicting rules (heuristic; optional semantic pass)
- Linter overlap — rules a formatter/linter already enforces deterministically
ctxweight budget — token cost
Not all context is loaded the same way, so budget reports two numbers instead of one — and this split is the whole point:
- Always-on — your root context files (
CLAUDE.md,AGENTS.md,.cursorrules, …) plus every connected MCP server's tool schemas. This is loaded on every single turn: it's your fixed per-turn token cost (and the only part that's really competing for the context window). - On-demand —
SKILL.mdfiles and.cursor/rules/*.mdc, which load only when that skill or rule is invoked. A 600-line skill isn't bloat — you pay for it when you use it, not every turn.
This matters because tools that sum everything into one "tokens/run" number lie to you: 40k tokens of skills you rarely trigger is fine, while 40k tokens in AGENTS.md is a tax on every request. Real example — running on a repo with 39 skills and Cursor rules reports ~1,559 always-on tokens/turn but ~41,000 on-demand across those 39 files: the headline "43k" would be alarming and wrong.
For each source budget shows its token weight and whether it's always-on or on-demand, the always-on share of the context window, and (with --model) the estimated always-on input cost per turn.
MCP tool-schema accounting is the part no other tool gives you: connecting ten MCP servers can quietly burn thousands of always-on tokens on every turn before your prompt is even read. ctxweight counts the schemas a server declares statically and — staying offline-first — flags servers that only expose tools at runtime instead of connecting to them.
Quickstart
No install required:
npx ctxweight . # health + budget for the current directory
Or install it and use the short ctxweight command:
npm i -g ctxweight
ctxweight health . # quality checks only
ctxweight budget . --model claude-opus --mcp .mcp.json # token cost only
ctxweight . --json # machine-readable
ctxweight . --sarif > ctxweight.sarif # GitHub code scanning
ctxweight . --md # writes ctxweight-report.md
ctxweight . --fail-on secrets/aws-key,error # CI exit gate (see below)
Commands: ctxweight [path] (health + budget), ctxweight health [path], ctxweight budget [path].
Flags: --model <name>, --mcp <file>, --json, --sarif, --md, --fail-on <list>.
Detecting problems
Point it at a CLAUDE.md that committed an AWS key, a real contact email, and a doc example (user@example.com):
ctxweight · my-repo
GRADE D
1 AWS key in context · 1 email in context
Suggestions
→ Remove this value and inject it at runtime instead — context files are committed AND logged.
→ Remove or redact this — context files are committed and sent to the agent on every run.
The AWS key and the real contact email are flagged; the user@example.com doc example is not (RFC 2606 placeholder). Every finding carries a stable code (secrets/aws-key, secrets/email, …) you can target with --fail-on to gate CI. Add --full for per-finding detail with file:line.
Output formats
- Terminal — human-readable summary (default)
--json— machine-readable, for scripts--sarif— drops findings straight into the GitHub Security tab--md— a shareablectxweight-report.md
CI / GitHub Actions
This repo dogfoods itself — see .github/workflows/ci.yml. The pattern is CI-safe in two halves: a self-scan step writes the SARIF with continue-on-error so findings never fail that step, while a separate gate step uses --fail-on on secret codes to actually break the build. The SARIF upload runs with if: always(), so findings always reach the GitHub Security tab even when the gate fails.
- name: Context audit (self) — generate SARIF
continue-on-error: true
run: node dist/cli.js . --sarif > ctxweight.sarif
- name: Context audit gate — fail on leaked secrets
run: node dist/cli.js . --fail-on secrets/private-key,secrets/aws-key,secrets/openai-key,secrets/generic-token
- name: Upload SARIF
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: ctxweight.sarif
Philosophy
- Offline-first. Your context never leaves your machine. No telemetry, ever. (Every cloud scanner asks you to upload the very config you're trying to keep private —
ctxweightdoesn't.) - Auditor, not generator. It measures and explains; it never silently rewrites your files.
- GDPR-aware by default. Secrets and PII detection is a first-class check, not an afterthought, because committed-and-logged context is a real data-exposure path.
Limitations — what ctxweight doesn't see
ctxweight measures the static context on disk: CLAUDE.md, AGENTS.md, skills, .cursorrules, and the MCP tool schemas declared in your config. That's the part you can audit before a single turn runs.
It does not see context injected at runtime:
- Dynamic memory systems (MemPalace, mem0, and friends) that retrieve and inject content per query.
- Runtime RAG that pulls documents into the prompt on the fly.
- MCP servers that only expose their tools on connect — their schemas aren't in the static config, so ctxweight reports them as
0rather than guessing.
A one-line CLAUDE.md that points at a memory system will score light even though it injects thousands of tokens on every turn.
Read the result as the cost of your static context — not the real total if you rely on dynamic memory or runtime RAG.
Study — 33 popular repos
We scanned the agent context of 33 widely-used AI dev tools (Codex, Cline, Continue, crewAI, mem0, LibreChat, …). Always-on cost — the tokens loaded on every turn — ranged from 0 to ~31,700 tokens, with 30% over 8,000 tokens/turn. And the honest headline on security: zero real leaked credentials. What naive scanners flag as "secrets" is almost always documentation — contact emails, example IPs, and env-var references.
Full aggregate report (offline, reproducible with npm run study): scripts/study/STUDY.md.
Roadmap
-
healthchecks: budget/length + secrets & PII -
budgetreal tokenizer (gpt-tokenizer,o200k_base) -
budgetMCP tool-schema accounting (static schemas, offline) - SARIF + Markdown reporters
-
--fail-onCI exit gate - Follow
@importreferences (Claude Code's@AGENTS.md/@docs/x.md) and count imported files transitively - Actionable fix suggestions per finding (split the file, move it to on-demand, fix the broken import, …)
- Placeholder + entropy awareness for secret detection (skips
your_api_key_here,m0-your-api-key, RFC 2606example.com) -
healthredundancy + drift checks (duplication vs. README / cross-file,CLAUDE.mdAGENTS.mddrift) - Treat
process.env.*references and env-var names as non-secrets (placeholder awareness v2) - Memory-system awareness — estimate per-turn token injection from MemPalace / mem0-style stores and runtime RAG (today only static context is measured)
- Target budgets — "always-on uses X of Y recommended tokens", with a configurable per-turn ceiling
-
--mcp-connect: measure real MCP schemas by launching each server in a sandbox and callingtools/list(opt-in) - Optional
--llmsemantic pass for contradictions (local model supported) - Shareable Context Health badge + web report card
Contributing
Issues and PRs welcome. If ctxweight should catch something it doesn't, open an issue with a minimal repro.
License
Apache-2.0.
Author: GonzaloPeriane.