A logician's-pass linter for software artifacts. Entailer checks whether prose that argues actually follows, and whether a set of requirements is consistent, by formalizing the load-bearing claims into logic and verifying them deterministically. It reports validity and consistency separately from truth, and it never ships a verdict without showing the formalization it judged.
Think "markdownlint or Vale, but for logical validity and consistency," running over claims, prompts, docs, and whole repos.
Contents
- The example that matters
- Try it live
- What is honest about it
- Architecture
- The five tiers
- The concept-faithfulness lens
- Packages
- Editors and agents
- Examples
- Develop
- Roadmap
- Contributing · License
The example that matters
A linter for style cannot catch this. Two requirements in a spec, written pages apart, that cannot both hold:
R1. Every request must be logged.
R2. Health-check requests must not be logged.
R3. /healthz is a request.
These three are mutually inconsistent: R3 makes /healthz a request, R1 forces it to
be logged, R2 forbids it. Entailer surfaces that as an INCONSISTENT verdict with the
minimal conflicting subset {R1, R2, R3}, not a vibe and not a style nag. A
cross-file contradiction like this is the thing a single-sentence "is this a fallacy"
toy structurally cannot produce, and Tiers 3 and 4 catch it across a document or a
whole repository.
Try it live
The Playground runs the real
@entailer/core in your browser: live truth tables, counter-models, and the
markdown/domain checkers with no install.
# CLI (no install)
npx @entailer/cli markdown spec.md # Tier 3: within-doc consistency
npx @entailer/cli repo . # Tier 4: cross-file consistency
npx @entailer/cli pr 42 # Tier 5: base to head delta over PR #42 (gh + git)
npx @entailer/cli pr --base main # Tier 5: working tree vs a local git ref
npx @entailer/cli sentence "a -> a" # Tier 1: classify a claim
npx @entailer/cli check --ir argument.json # validity of a supplied argument
npx @entailer/cli domain --lens access.yaml --repo . # Lens: concept faithfulness (not a tier)
// Library
import { evaluateMarkdown } from "@entailer/core";
const report = evaluateMarkdown({ markdown, uri: "spec.md" });
console.log(report.verdict, report.consistency.minimalConflictingSubset);
Exit codes: 0 no issue / valid, 1 invalid or inconsistent, 2 malformed input,
3 UNKNOWN-blocked. Runnable demos live in examples/.
What is honest about it
The whole tool exists to protect one distinction:
validity ≠ truth ≠ faithful-formalization: three independent fields, never collapsed.
- Validity / consistency is what the deterministic core licenses (
Γ ∪ {¬φ}unsatisfiable means valid; a witnessed model means invalid or consistent). It is sound by construction and reproducible. - Faithfulness (does the formula capture the English?) is attestable only by a human or an LLM in the loop. It is the weak link, so the formalization is always shown. A verdict with no visible symbol dictionary fails schema validation; it cannot ship.
- Premise truth is a question about the world and is explicitly out of scope.
There is no "soundness score." unknown / out-of-fragment / timeout is a first-class
UNKNOWN verdict, never silently VALID or INVALID.
Architecture
A strict two-layer split. A pluggable translator (LLM-in-the-loop or rule-based) emits a typed IR carrying an auditable symbol dictionary. A fully deterministic verifier core consumes only the IR, never raw text, so the same checker runs identically whether a human or a model produced the formalization.
flowchart LR
A["prose · spec · repo · PR"] --> B["translator<br/>(LLM or rule-based,<br/>untrusted proposer)"]
B -->|"typed IR +<br/>symbol dictionary"| C["verifier core<br/>(deterministic, pure,<br/>the trusted kernel)"]
C --> D["LogicReport<br/>validity · consistency · UNKNOWN<br/>+ certificate"]
Logic lives in exactly one place: @entailer/core (pure TypeScript, zero LLM or
solver dependencies). See DESIGN.md for the full pipeline, IR schema,
tier model, and milestone plan.
The five tiers
Every tier runs over the one deterministic core. The IR and source-adapter seam is what lets each tier extend the same checker instead of re-implementing logic.
| Tier | Input | Status |
|---|---|---|
| 1 · Sentence | one claim | Built |
| 2 · Prompt | prose with an implicit conclusion | Built |
| 3 · Markdown | one .md |
Built |
| 4 · Repo | a path, cross-file | Built |
| 5 · PR | base + head file sets (+ PR body) | Built , base→head delta / regression gate |
Tier 5 is a regression gate: it diffs the base claim-set against the head, attributes
each contradiction as introduced / fixed / pre-existing, and (under --gate introduced)
fails only on what a change newly breaks. An inconsistent-but-not-regressed head passes
honestly. The deterministic core takes the formalization as supplied input; the LLM
translator (@entailer/translate) produces it from prose and
degrades to UNKNOWN when the translation is shaky.
The concept-faithfulness lens
The five tiers ask does the logic hold? The domain lens asks a different question on a different axis: does the code stay faithful to its own concepts? You declare a small concept cluster on its four sides (relationships, a defining rule, examples, and vocabulary), and the lens flags where a codebase drifts from it.
# access.yaml: a concept cluster
cluster: access
boundary: ["src/**"]
concepts:
- concept: member
vocabulary: [member, subscriber]
- concept: guest
vocabulary: [guest, anonymous]
relationships:
- member is-not guest # declared mutually exclusive
npx @entailer/cli domain --lens access.yaml --repo .
# ⛔ concept-fusion src/access.ts:1 identifier `grantMemberGuest` names both
# `member` and `guest`, which the cluster declares mutually exclusive. exit 1
It is honest the same way the tiers are. Only a rank-1 finding is a verdict: a
deterministic contradiction, one identifier fusing two is-not concepts, or a
self-contradictory declaration (caught before any file is read). Ranks 2–3 are reader
hints, they assert nothing, can never block, and never render as a certificate,
enforced by the DomainReport schema. An is-a overlap (a member that is a user)
is satisfiable by construction, so it stays silent, no false positive. The one weak link,
classification ("is this token a member-use?"), is lexical and named as such: a human
confirms each site. --gate introduced reports only leaks new versus a base ref. Also
available as the MCP tool evaluate_domain.
Packages
All six publish together and are live on npm. @entailer/core is the only one you need
to check a supplied formalization.
| Package | Role | npm |
|---|---|---|
| @entailer/core | The trusted kernel: Formula AST, DSL parser, propositional tableau + DPLL, classification, the IR, and the LogicReport schema. Pure, dependency-light. |
|
| @entailer/cli | The entailer binary: evaluate a formalization, --json output, honest exit codes including a distinct UNKNOWN-blocked code. |
|
| @entailer/mcp | Stdio MCP server exposing the deterministic check_* / classify / evaluate_* tools with structured output. Stateless; the caller formalizes in-context, then calls a tool. |
|
| @entailer/translate | The autonomous LLM seam, the only package that touches a model. For scripts and CI outside an editor: turns prose into the typed IR via Claude (bring your own ANTHROPIC_API_KEY), then the core verifies it. A parse failure or undeclared atom forces UNKNOWN over a confident-but-wrong verdict. The formalizer is injectable, so the engine is fully testable offline. From Claude Code / Codex you don't need this package or a key — the plugin formalizes in-context. |
|
| @entailer/viz | Deterministic visualizers: typed view-models plus dependency-free text / SVG / Mermaid renderers. Colors a verdict only when its certificate (proof or counter-model) is present, so a picture can never launder a hint into a verdict. | |
| @entailer/solver | Opt-in SMT escalation: a pure SMT-LIB emitter plus a lazily-probed Z3 backend that degrades to UNKNOWN when z3-solver is absent. An amplifier, never a gate. |
Editors and agents
Entailer ships an MCP server plus two editor plugins, so an agent can call the
deterministic checker directly. From an editor this is fully key-less: the agent
is the model, so it formalizes prose in-context (no ANTHROPIC_API_KEY, no network,
no @entailer/translate) and then calls the deterministic tools for the verdict. The
key-carrying @entailer/translate is only for autonomous use
outside an editor.
- MCP server:
npx @entailer/mcp(declared inplugin/.mcp.json). Exposescheck_validity,check_consistency,classify_formula,evaluate_sentence,evaluate_argument,evaluate_prompt,evaluate_markdown,evaluate_repo,evaluate_pull_request, andevaluate_domain. - Claude Code plugin: in
plugin/— an MCP server plus aformalizeskill that formalizes in-context and then runs a real deterministic check (MCP tools or the CLI). - Codex plugin: in
codex/, the same server and skill vianpx -y @entailer/mcp.
Why an MCP tool instead of "just ask the model"?
The model is a good translator and a poor verifier. The MCP path keeps that split:
the agent formalizes in-context (the untrusted step, always shown) and the deterministic
core decides validity/consistency (the trusted step, reproducible). The tool returns a
structured LogicReport whose honesty invariants are schema-enforced, so a confident
hallucinated "VALID" cannot come back without the proof that licenses it.
Examples
Runnable, no keys required, in examples/:
| Command | Shows |
|---|---|
check --ir examples/affirming-consequent.ir.json |
Tier 2: INVALID, counter-model p=F, q=T |
markdown examples/inconsistent-spec.md |
Tier 3: within-doc contradiction, back to path:line |
repo examples/repo-demo |
Tier 4: cross-file conflict spanning README and SPEC |
domain --lens examples/domain-lens/access.yaml --repo examples/domain-lens |
Lens: a rank-1 concept-fusion leak |
node examples/programmatic.mjs |
The library API across sentence / argument / markdown / repo |
Develop
pnpm install
pnpm check # build + typecheck + every test suite + the drift gate
Requires Node >=20.19 and pnpm; built with pnpm workspaces and TypeScript. CI runs the
same pnpm check on every push (ci.yml).
pnpm check is the full local gate. The formalize skill is
vendored one-way via pnpm vendor:skill, and the machine-readable taxonomy is generated
from its references (pnpm gen), never hand-copied; the gate fails on any drift.
Roadmap
Shipped: the deterministic core, all five tiers, the concept-faithfulness lens, six published packages, an MCP server, two editor plugins, and a live playground. Deferred (designed, not yet wired):
- the LLM-grounded rank-3 prose-vs-practice detector on the domain lens,
- execution-grounded leak detection (beyond lexical classification),
- the standing-rule adjudication write-back loop.
See CHANGELOG.md for the release history.
Contributing
Issues and PRs are welcome. Please read CONTRIBUTING.md and the
CODE_OF_CONDUCT.md; security reports go through
SECURITY.md.
License
MIT.