npm.io
3.5.1 • Published 4d agoCLI

contextdevkit

Licence
MIT
Version
3.5.1
Deps
0
Size
8.8 MB
Vulns
0
Weekly
0
Stars
2

ContextDevKit

CI npm Node License Zero deps

A portable, business-driven, level-based AI-assisted development platform that runs natively on Claude Code, Antigravity, and Codex. It connects business intent to governed engineering execution. Drop it into any project — greenfield or existing, any stack — and the harness starts enforcing durable project memory, automatic context loading, drift detection, specialized sub-agents, governed deliberation, and a workflow journey it won't let you skip. Turn on as much or as little as you want.

ContextDevKit treats "AI-assisted coding" as engineering. A plain CLAUDE.md is just instructions the model is free to ignore; ContextDevKit makes the harness enforce the rules with hooks and gates, and records the why in version control so any future session — human or AI — can pick up exactly where the last one left off.


What ContextDevKit is

ContextDevKit is a business-driven development platform: it ties business intent to governed engineering execution, and every feature in the kit exists to keep that line intact. You start from a business case — the problem, the value hypothesis, the investment decision — and the platform drives the work from that intent rather than from ad-hoc tickets: requirements become workflows, workflows become governed tasks, and each decision is recorded with the why so the next session continues the same thread. A hard rule keeps it honest — proposals follow a draft → approve → revise → reject lifecycle, and the AI cannot self-approve.

Around that spine sit the systems that make it trustworthy, summarised across the whole feature set:

  • Durable memory keeps decisions, sessions, and domain language in your repo.
  • Automatic context loading opens every session with the current state loaded.
  • Drift detection refuses to let work go unrecorded.
  • Hook-enforced governance makes the rules the harness runs, not requests the AI may ignore.
  • Specialized squads route each concern to the right domain agent.
  • Three economies — token, cost, autonomy — keep AI-assisted work efficient and accountable.
  • The level system lets you adopt all of it gradually.

The result: development that stays accountable to business value at every step, with a paper trail any future session — human or AI — can pick up.

What's active by default, what's automatic, what's manual

You decide how much runs on its own. The autonomy dial (/autonomy, grades 1–4) is the master knob: it turns automatic behaviour back into manual confirmation whenever you want — so even what is automatic can always be made manual. The level sets which capabilities exist; the autonomy dial sets how much of them runs without you.

Capability Active by default Automatic Run it manually
Context loading at session start L1+ boot hook — always on
Edit tracking + drift detection L2+ ledger + Stop nudge /context-stats · /watch
Session registration nudge, L2+ nudged — you confirm /log-session
Conventional Commits + quality gates L3+ git hooks — enforced on commit/push
Auto-format on edit L4+ PostToolUse
Squad routing L4+ at boot /squad
High-risk blast-radius gate L5+ before flagged edits /simulate-impact
Deliberation before a decision grade ≥ 3 auto-convened /debate
Request orchestration L7 per prompt /workflow · /new-adr · /ship
Token / cost economy advisory, L6+ measured /token-report · toggle to blocking
Feature-reference docs regenerated + CI gate /docs-reindex
Decisions · workflows · shipping — opt-in — you drive /new-adr · /workflow · /ship · /swarm · /pipeline

Lower the grade to turn the automatic rows into manual confirmations; raise it to let the harness drive. A non-negotiable floor keeps secrets, force-push, gate self-edits, ADRs, and grade changes human at every grade.


The 60-second mental model

Everything in the kit serves one thesis: don't depend on the AI's goodwill — make the environment enforce it. Four durable artifacts, all in your repo, all plain text:

Artifact Question it answers Where
ADRs Why did we decide this? memory/decisions/ inside the installed kit
Session logs What happened, session by session? memory/sessions/ inside the installed kit
Glossary What does this domain word mean in code? memory/GLOSSARY.md inside the installed kit
Changelog What shipped, and when? CHANGELOG.md

Around those, hooks inject context at session start, track every edit, and block the session from ending with unregistered work — while gates stop a high-risk edit, a half-finished workflow, or an unreviewed decision from sliding through. You adopt it gradually through seven levels, and a separate autonomy dial decides how much the AI may do without asking.

The seven systems

1. Durable memory

Every decision, session, and domain term lives in plain-text files under version control. ADRs record the why behind choices; session logs record the what of each working session; the glossary maps UI language to code identifiers; the changelog records what shipped and when. No database, no external service — just files your team owns, reads, and diffs.

2. Automatic context loading

The boot hook fires before the first message of every session. It reads the ledger state, detects drift (edits since the last session log), identifies the active workflow and its phase, and assembles a compact context packet — so the AI is never starting cold from a blank CLAUDE.md. Every session opens with the current decisions, open tasks, and relevant ADRs already loaded.

3. Drift detection

Every file edit is tracked in an append-only session ledger. When the session ends without a log-session, the Stop hook flags it. /context-stats reports the drift rate over time. The goal is zero unregistered sessions — because an unregistered session is context the next session can never recover.

4. Specialized sub-agents (squads)

Level 4 installs six domain squads — devteam, qa-team, design-team, security-team, compliance-team, ops-team — each with a router agent that picks the right specialist by intent. Level 6 adds the agent-forge squad, the "agent that builds agents" pipeline. Squads are active: the squad director reads the current diff at boot and activates only the postures the session actually needs, keeping context lean.

5. Hook-enforced governance

The harness enforces rules that don't rely on the AI remembering. Git hooks enforce Conventional Commits and run multi-language quality gates on push. Claude Code hooks inject boot context, track edits, nudge session registration, and at Level 5 run the /simulate-impact blast-radius check before any edit on a flagged high-risk path. The Stop hook refuses to let a session close silently with unregistered work.

6. The level system

Seven levels, each adding capability without removing anything below it:

Level Name Adds
1 Memory Boot context injection, /log-session, ADRs, changelog
2 Ledger Drift detection — tracks edits, nudges you to register the session
3 Multi-session /claim · /worktree-new, derived indices, git hooks (Conventional Commits + conflict-blocking pre-push + multi-language quality gates)
4 Squads Specialized sub-agents (devteam, qa-team, design-team, security-team, compliance-team, ops-team) + PostToolUse auto-format
5 Proactive /simulate-impact gate on high-risk paths, branch-scoped workflow guard, /tech-debt-sweep, /contract-check, auto-distill nudge
6 Autonomy & Insight /ship, /swarm, /pipetest, the auto-invoked deliberation council, /retro, /context-stats, agent-forge squad
7 Ecosystem /fleet multi-repo control plane, /tune-agents, visual tests, playbook runner, multi-platform context bridges

Change level anytime from inside the project:

node contextkit/tools/scripts/context-level.mjs        # show
node contextkit/tools/scripts/context-level.mjs 4      # move to L4 (or /context-level 4)

Going up adds capability; going down cleanly removes the now-disabled hooks. See docs/LEVELS.md.

7. The DevPipeline

Two different artifacts manage work. roadmap.md (in the kit's memory folder) is the product/business plan (capabilities, the what/why). The DevPipeline (contextkit/pipeline/, board in devpipeline.md) is execution control — bugs, increments, chores, and roadmap items broken into tasks with priority, SLA, DAG dependencies, and complexity, flowing backlog → working → testing → conclusion. The roadmap says what to build; the pipeline runs the work.

Three economies

Token economy

The kit is built on a discipline of spending tokens only where they change the outcome. The squad director assembles context from only the matched squads' playbooks — not the whole library. Cost-tiered model routing assigns cheap models to execution (read, scaffold, package), mid-tier to building, and the reasoning tier to architecture and security — so expensive capacity is reserved for decisions that genuinely require it.

Autonomy economy

The autonomy dial (autonomy.grade 1–4) decides what the AI may do without asking, at any level. Grade 1 is fully manual; grade 2 (the default) suggests but waits for confirmation; grade 3 auto-executes most actions but defers decisions to a human quorum; grade 4 is full-auto with a deliberation quorum at each gate. A non-negotiable floor in code keeps secrets, force-push, gate self-edits, ADRs, and grade changes human at every grade, regardless of the dial setting.

Set with /autonomy. See docs/explanation/value-and-impact.md for the engineering rationale.

Cost economy

The economy runtime (Level 6+) measures every session's token spend, attributes it per command and per agent, and surfaces the data on the Execution Contract after each run. Advisory mode reports; blocking mode enforces a budget gate. The goal is making AI-assisted development costs observable before they compound.

The workflow spine

For larger features, /workflow creates a spec pack under the kit's memory folder (memory/workflows/<slug>/) carrying prd.md, spec.md, ADR and task indexes, and dated completion reports. The engine enforces the lifecycle:

intake → prd → spec → adr → roadmap(if feature) → pipeline → ship → testing → conclusion

advance refuses to leave a phase with missing deliverables — it names the gaps. --force is the explicit, recorded escape. Pipeline cards link back with --workflow <slug>; moving a card to testing stamps implemented: YYYY-MM-DD; QA sign-off is the governed path into conclusion.

Requirements

  • Node.js ≥ 18 — the hooks/scripts are plain .mjs; Levels 1–3 need zero npm packages. Node 20.6+ unlocks --env-file for the media-gen credentials flow.
  • git — for divergence detection and the Level 3 git hooks.
  • Claude Code, Antigravity, or Codex (IDE agent, CLI, desktop, or web).
  • Optional: gh (GitHub CLI) for PR/sync awareness; GOOGLE_AI_API_KEY for /media-gen.

Quickstart

One command, from anywhere — the repo is the installer.

First, pick how the kit lives in git (you can switch later — it's non-destructive):

Mode When What it does
Local-only (default) Solo work, an experiment, or trying the kit Writes a managed .git/info/exclude block so the installed artifacts (contextkit/, .claude/, CLAUDE.md, …) stay out of your git history — updates never flood your commits. Your teammates and CI won't see the kit.
Tracked (--tracked) A team, multiple machines, or CI that needs the kit Skips the exclude block so you can git add and commit the kit — everyone who clones gets the same memory, agents, and governance.

Not sure? Start local-only (just run the command below). Move to tracked the moment a second person or machine needs the kit: re-run with --tracked and git add the artifacts — switching only toggles the exclude block, it never touches your index or edits. /context-doctor reports your current mode and flags a local-only kit in a repo that already has a remote.

# from npm (recommended) — auto-picks L3 for an empty folder, L7 if it already has code
npx contextdevkit --target . --yes

# or straight from GitHub (no npm needed)
npx github:reiTavares/ContextDevKit --target . --yes

# team / multi-machine / CI — commit the kit instead of keeping it local-only
npx contextdevkit --target . --tracked --yes

Greenfield? Run it in an empty (or git init-ed) folder and it scaffolds the whole thing. Existing project? It detects your stack, never clobbers your CLAUDE.md (it writes CLAUDE.contextdevkit.md to merge by hand), and preserves any hooks you already had.

Then, one-shot self-configuration — open the project in Claude Code, approve the hooks once, and the boot hook tells you it isn't configured yet. Run:

/setupcontextdevkit

This inspects the project, tunes the config to your stack (ledger path lists, high-risk paths), fills in CLAUDE.md (rules, stack, glossary), scaffolds domain sub-agents, records a baseline ADR, and logs the session — going from "kit installed" to "kit fitted to this project" in a single pass.

$ npx contextdevkit --target . --yes
✓ .claude/settings.json wired for L7
✓ engine installed (contextkit/runtime, contextkit/tools)
✓ slash commands installed · agents installed · providers installed
✓ CLAUDE.md created  ·  CHANGELOG.md created
✅ ContextDevKit installed at Level 7 (existing project — full toolkit)

> /setupcontextdevkit
  Phase 1 — Inspect ……  detected: TypeScript · Vite · React · vitest
  Phase 3 — Apply ……    ledger tuned (src/, tests/); high-risk: src/db/schema.ts
  Phase 4 — CLAUDE.md …  stack + immutable rules filled in
  Phase 7 — baseline ADR-0001 recorded; session logged
  ✅ ContextDevKit fitted to this project.

Security & trust — read before installing. ContextDevKit is a code-execution tool: install it like any dependency you run. npx writes git hooks under .git/hooks/ (L≥3) and Claude Code hooks into .claude/settings.json, which then run node on each session/commit/push. Pin a tag for a reproducible install: npx github:reiTavares/ContextDevKit#v3.0.0 --target . --yes. An existing git hook is never clobbered (backed up to <hook>.bak). /fleet and custom contextkit/detectors/*.mjs execute with full Node privileges — only register repos and add detectors you trust.

Governance — what the harness enforces

This is the part that doesn't rely on the AI remembering. Three layers, each documented in its own explanation doc:

  • Hooks & gates. Boot context injection, edit tracking, a Stop hook that blocks ending with unregistered work, the L5 /simulate-impact gate on high-risk paths, and PostToolUse auto-format + multi-language pre-push quality gates.
  • Deliberation council (explanation). At grade ≥ 3, opening a feature or recording a decision auto-convenes a deterministic, named specialist council that argues the question before the ADR is written — evidence gathered cheaply, voices never downgraded.
  • Workflow journey (explanation). /workflow won't let advance leave a phase with empty deliverables; --force is the explicit, recorded escape. Numbered NNNN-slug; the mutation guard is branch-scoped so parallel sessions don't block each other.

Squads — sub-agents organised by domain

Each squad has a router agent that picks specialists by intent. As of v2.6 the squads are active: routed deterministically, given stack-aware playbooks, and audited at the pre-commit gate — see docs/explanation/active-squads.md.

Squad Specialists When
devteam architect, code-reviewer, context-keeper, test-engineer Cross-cutting design + PR review + memory hygiene
qa-team qa-orchestrator + qa-unit / qa-integration / qa-fuzzer / qa-perf / qa-e2e Testing strategy + execution
design-team ui-designer, ux-designer, accessibility, seo-specialist, landing-architect, conversion-strategist, tracking-integrator UI/UX, WCAG AA, SEO + AISO, high-conversion landing pages
security-team security, code-security, infra-security Auth, secrets, dependencies, IaC, supply chain
compliance-team privacy-lgpd, governance-officer LGPD (Brazilian data protection), policy
ops-team devops CI/CD, deploys, environments, observability
agent-forge (L6+) forge-orchestrator, model-router, prompt-engineer, tool-designer, eval-designer, packager, rag-designer, agent-architect The "agent that builds agents" — produces portable Agent Packages

Grow your own — or new squads — from _BRIEFING.md.tpl via /squad. See docs/SQUADS/design-team.md and docs/SQUADS/agent-forge.md for two squads in depth.

What gets installed into your project

your-project/
  CLAUDE.md                          # boot context + your coding constitution
  .claude/
    settings.json                    # hook wiring (composed for your level)
    commands/                        # the slash-command set, organised in packs
      audit/ pipeline/ qa/ vcs/ forge/ setup/   # domain packs (see Slash commands)
    agents/                          # the sub-agent archetypes, each with a cost tier (L4+)
  .agents/                           # Antigravity host (skills, personas, playbooks — built from Claude sources)
  INSTRUCTIONS.md  ·  ctx.mjs        # Antigravity boot context + central CLI runner (agy)
  .codex/  ·  AGENTS.md  ·  cdx.mjs  # Codex host (hooks + TOML subagents + boot context + runner)
  contextkit/
    .env.example                     # optional credentials template (media-gen)
    runtime/hooks/                   # the engine: boot, ledger, drift, L5 gate, auto-format, deliberation-nudge
    runtime/config/                  # zero-dep loader, defaults, settings composer
    runtime/git-hooks/               # pre-commit (reindex), commit-msg, pre-push (conflicts + quality gates)
    runtime/providers/review/        # PR/review CLI adapters (gh)
    runtime/providers/media/         # Veo + Nano Banana adapters
    runtime/state/                   # canonical append-only state.json substrate
    tools/scripts/                   # reindex, dashboard, sync-check, guard, swarm, deliberation-council, audits, …
    memory/decisions/                # ADRs (the why)
    memory/sessions/                 # one file per session (the what)
    memory/workflows/                # /workflow spec packs (NNNN-slug)
    memory/GLOSSARY.md
    pipeline/                        # DevPipeline lanes: backlog / working / testing / conclusion
    workflows/playbooks/             # tanstack, landing-page, seo-aiso, tech-debt-sweep, squads/…
    squads/agent-forge/              # the "agent that builds agents" (L6+)
    config.json                      # level + ledger path lists + L5 params + autonomy grade
  CHANGELOG.md

Slash commands

Organised into domain packs so the / menu doesn't read as a 60-file scroll. The basename resolver is path-agnostic — /qa-signoff finds qa/qa-signoff.md exactly the same as a flat layout.

Setup: /aidevtool-from0 (empty project) · /setupcontextdevkit (existing project)

Daily (root pack): /state · /log-session · /new-adr · /debate · /advise · /close-version · /context-refresh · /project-map · /bug-hunt · /dashboard · /watch · /landing-page · /media-gen · /playbook · /predictions-review · /squad · /context-budget · /token-report · /tune-agents · /context-stats · /fleet · /distill-sessions · /distill-apply · /simulate-impact · /roadmap · /claude-md · /docs-reindex

pipeline/: /pipeline · /ship · /swarm · /pipetest · /dev-start · /plan-week · /retro · /runs · /workflow · /workflow-assist · /resume

vcs/: /git · /claim · /release · /worktree-new · /gh-triage · /draft-changelog · /changelog-social

qa/: /qa-signoff · /test-plan · /scaffold-tests · /visual-test

audit/: /audit · /deep-analysis · /security-setup · /deps-audit · /tech-debt-sweep · /analyze-code-ia-practices · /contract-check · /seo-audit · /validate-doc

forge/ (L6+, agent-forge squad): /forge-new and 13 lifecycle commands (forge-{list,show,doctor,policy,budget,audit,eval,redteam,route,fallback-test,refresh-matrix,killswitch,deprecate})

setup/: /setupcontextdevkit · /aidevtool-from0 · /autonomy · /context-doctor · /context-level · /context-config

On Antigravity every command is a skill under .agents/skills/ (same names, no / prefix), run through the agy runner — see docs/ANTIGRAVITY.md. On Codex, generated skills live under .agents/skills/, subagents under .codex/agents/*.toml, and the same scripts run through node cdx.mjs <command> — see docs/CODEX.md.

Beyond governance — the rest of the toolkit

Playbooks — reusable procedures (`/playbook run `)
Playbook What it covers
landing-page.md Fold rules, anti-Lovable refusals, dated package recs, Core Web Vitals budget
seo-aiso.md SEO + AISO checklist (llms.txt, FAQ schema, semantic HTML5, AI-crawler robots.txt)
tanstack.md TanStack family, cache-key discipline, typed router params
simulate-impact.md / tech-debt-sweep.md / distillation-cycle.md Blast-radius mapping, constitution scan, CLAUDE.md refinement
security-batch.md Batch security findings → ADRs + backlog
squads/*.md Stack-aware posture guide per active squad
Provider adapters — zero-dep, refuse-on-missing-creds

Pluggable runtime adapters (node:fetch / child_process.spawn) with a typed error contract.

  • Review (contextkit/runtime/providers/review/): gh CLI for PR creation, review-comment listing, and posting. Add glab.mjs / bb.mjs / tea.mjs for GitLab / Bitbucket / Gitea — same _adapter.mjs contract; detect.mjs resolves from git remote get-url origin.
  • Media (contextkit/runtime/providers/media/): nano-banana (Imagen 3 image, ~$0.04/image) and veo (Veo 3 video, ~$0.50/s), both on GOOGLE_AI_API_KEY. Cap per-process spend with CONTEXTDEVKIT_MEDIA_MAX_USD=5.00; --dry-run never charges.
node --env-file=contextkit/.env contextkit/tools/scripts/media-gen.mjs image \
  --prompt "editorial product hero, asymmetric grid" --out public/hero.png
SEO + AISO audit — two static analysers, refuse-on-SPA
node contextkit/tools/scripts/seo-audit.mjs           # 8 SEO codes, exit 1 on SPA_ENTRYPOINT
node contextkit/tools/scripts/aiso-audit.mjs --json   # 8 AISO codes, machine-readable

SEO: SPA_ENTRYPOINT , MISSING_TITLE, MISSING_DESCRIPTION, MULTIPLE_H1, MISSING_CANONICAL, MISSING_ALT, MISSING_SITEMAP, MISSING_ROBOTS. AISO: MISSING_LLMS_TXT, MISSING_FAQ_SCHEMA, MISSING_ORG_SCHEMA, DIV_SOUP, JS_RENDERED_CONTENT, MISSING_AUTHOR_SCHEMA, MISSING_DATE_STAMP, BLOCKS_AI_CRAWLERS. See docs/explanation/value-and-impact.md for the rationale behind AISO.

Visual surfaces/dashboard + /watch
node contextkit/tools/scripts/dashboard.mjs              # snapshot → dashboard.html
node contextkit/tools/scripts/dashboard.mjs --watch      # live on 127.0.0.1:4242 (SSE)
node contextkit/tools/scripts/watch.mjs --follow         # tail the ledger

/dashboard renders pipeline lanes + ADRs + sessions + roadmap + [Unreleased] changelog as self-contained HTML; /watch tails the active session ledger.

Maintenance

# diagnose an install (node, config, hook wiring vs level, git hooks, onboarding)
/context-doctor          # or: agy doctor / node contextkit/tools/scripts/doctor.mjs

# safe update — refresh engine, commands, agents, configs
# (never modifies user-authored memory, CLAUDE.md, or custom settings;
#  project-map may be generated/refreshed when safe — deferred on active sessions)
npx contextdevkit@latest --target . --update

# change level (rewires settings.json, installs git hooks at L≥3)
/context-level 4

# uninstall — keeps memory (ADRs, sessions) and CLAUDE.md; add --purge to also remove the engine
node /path/to/contextdevkit/install.mjs --target . --uninstall

--update runs a conflict-safe 3-way merge against a sha256 manifest (personalized commands/agents are never clobbered), refreshes the installed contextkit/README.md, regenerates docs/README.md, and runs the workflow-numbering migration — but does not take ownership of your project's root README.md.

Develop the kit itself

Test scripts
Script When to run What it does
npm run test:smoke Inner loop — after every local edit Hermetic, no-install suites (~1.5 s)
npm run test:impact Inner loop — conservative auto-selector Runs only the suites touched by changed files; falls back to full on any uncertainty
npm run test:selfcheck After wiring changes Static engine checks (660+ assertions); quiet on pass (selfcheck: N/N)
npm run test:unit Alias for smoke + selfcheck test:smoke then test:selfcheck
npm run test:integration Before opening a PR All six integration clusters (core / installer / hosts / workflow / enforcement / ecosystem)
npm run test:integration:<cluster> Closing a card in that area One cluster: core, installer, hosts, workflow, enforcement, ecosystem
npm run test:full Named alias for the full run Identical to npm test — every suite, serial, fail-fast
npm test Pre-push / CI baseline Full suite; behavior preserved — external callers unaffected
npm run ci:fast PR gate (CI runs this) test:impact + tech-debt RED-line; single Node version; uploads runs/ logs
npm run ci:full Main/release gate (CI + pre-publish) Full suite + tech-debt; runs on Node 18/20/22; mandatory before release
npm run ci Alias for ci:full Same as ci:full — legacy callers are safe

npm test, npm run ci, and npm run check keep their exact meaning — external npx/automation callers are unaffected. Logs land in the gitignored runs/ directory; --verbose on any suite restores full output; --legacy on run-suites.mjs executes the literal pre-TEA serial chain (rollback escape hatch).

npm run test:smoke            # fast hermetic pass after an edit
npm run test:impact           # conservative selector — inner loop for larger changes
npm test                      # full suite (selfcheck + all integration tests)
npm run ci:full               # full gate + tech-debt RED-line (validate before pushing)
node tools/selfcheck.mjs      # static: loads the engine, asserts wiring per level
node tools/integration-test.mjs  # end-to-end: installs to a temp dir, drives real hooks
npm run build:antigravity     # regenerate .agents skills/personas from templates/claude
npm run build:codex           # regenerate .codex agents + source-command skills

The kit dogfoods itself, so the SOURCE lives under templates/ and tools/ — never edit the installed contextkit/ copies. See CONTRIBUTING.md for the immutable rules (zero hot-path deps, hooks never break work, add a test for anything you add).

Docs

Organized by Diátaxis — see docs/README.md for the full index.

Explanation — the why:

Reference & how-to:

Guia em português: instrucoes.md.

License

MIT — see LICENSE.