1.2.1 • Published yesterday

@entailer/solver

Licence

MIT

Version

1.2.1

Deps

Size

17 kB

Vulns

Weekly

Summary Dependency Versions

entailer

A logician's-pass linter for software artifacts. Entailer checks whether prose that argues actually follows, and whether a set of requirements is consistent, by formalizing the load-bearing claims into logic and verifying them deterministically. It reports validity and consistency separately from truth, and it never ships a verdict without showing the formalization it judged.

Think "markdownlint or Vale, but for logical validity and consistency", over claims, prompts, docs, and whole repos.

The example that matters

A linter for style cannot catch this. Two requirements in a spec, written pages apart, that cannot both hold:

R1. Every request must be logged.
R2. Health-check requests must not be logged.
R3. /healthz is a request.

These three are mutually inconsistent: R3 makes /healthz a request, R1 forces it to be logged, R2 forbids it. Entailer's job is to surface that as an INCONSISTENT verdict with the minimal conflicting subset {R1, R2, R3}, not a vibe and not a style nag. A cross-file contradiction like this is the thing a single-sentence "is this a fallacy" toy structurally cannot produce, and it is where Entailer is headed (Tiers 3 and 4 below).

Quickstart

# CLI
npx @entailer/cli markdown spec.md           # Tier 3: within-doc consistency
npx @entailer/cli repo .                      # Tier 4: cross-file consistency
npx @entailer/cli pr 42                        # Tier 5: base→head delta over PR #42 (gh + git)
npx @entailer/cli pr --base main              # Tier 5: working tree vs a local git ref
npx @entailer/cli sentence "a -> a"           # Tier 1: classify a claim
npx @entailer/cli check --ir argument.json    # validity of a supplied argument
npx @entailer/cli domain --lens access.yaml --repo .   # Lens: concept faithfulness (not a tier)

# Library
npm i @entailer/core

import { evaluateMarkdown } from "@entailer/core";
const report = evaluateMarkdown({ markdown, uri: "spec.md" });
console.log(report.verdict, report.consistency.minimalConflictingSubset);

Exit codes: 0 no issue / valid, 1 invalid or inconsistent, 2 malformed input, 3 UNKNOWN-blocked. Runnable demos live in examples/. The MCP server is npx @entailer/mcp (declared in plugin/.mcp.json).

What is honest about it

The whole tool exists to protect one distinction:

validity ≠ truth ≠ faithful-formalization — three independent fields, never collapsed.

Validity / consistency is what the deterministic core licenses (Γ ∪ {¬φ} unsatisfiable means valid; a witnessed model means invalid or consistent). It is sound by construction and reproducible.
Faithfulness (does the formula capture the English?) is attestable only by a human or an LLM in the loop. It is the weak link, so the formalization is always shown. A verdict with no visible symbol dictionary fails schema validation; it cannot ship.
Premise truth is a question about the world and is explicitly out of scope.

There is no "soundness score". unknown / out-of-fragment / timeout is a first-class UNKNOWN verdict, never silently VALID or INVALID.

Architecture

A strict two-layer split. A pluggable translator (LLM-in-the-loop or rule-based) emits a typed IR carrying an auditable symbol dictionary. A fully deterministic verifier core consumes only the IR, never raw text, so the same checker runs identically whether a human or a model produced the formalization.

Logic lives in exactly one place: @entailer/core (pure TypeScript, zero LLM or solver dependencies, the trusted kernel). See DESIGN.md for the full pipeline, IR schema, tier model, and milestone plan.

Status

v0.1 shipped the deterministic core plus a CLI. v0.2 and v0.3 added the MCP server, the LLM translator, the tier adapters, and the viz and solver packages; 1.1.0 added the Tier-5 pull-request evaluator. All five tiers run over the one deterministic core. The IR and source-adapter seam is what lets every tier extend the same core instead of re-implementing logic.

Tier	Input	status
1 Sentence	one claim	built
2 Prompt	prose with an implicit conclusion	built
3 Markdown	one `.md`	built
4 Repo	a path, cross-file	built
5 PR	base + head file sets (+ PR body)	built — base→head delta / regression gate

The deterministic core takes the formalization as supplied input; the LLM translator (@entailer/translate) produces it from prose and degrades to UNKNOWN when the translation is shaky.

The concept-faithfulness lens (an orthogonal axis, not a tier)

The five tiers ask does the logic hold? The domain lens asks a different question on a different axis: does the code stay faithful to its own concepts? You declare a small concept cluster on its four sides — relationships, a defining rule, examples, and vocabulary — and the lens flags where a codebase drifts from it.

# access.yaml — a concept cluster
cluster: access
boundary: ["src/**"]
concepts:
  - concept: member
    vocabulary: [member, subscriber]
  - concept: guest
    vocabulary: [guest, anonymous]
relationships:
  - member is-not guest   # declared mutually exclusive

npx @entailer/cli domain --lens access.yaml --repo .
# ⛔ concept-fusion  src/access.ts:1 — identifier `grantMemberGuest` names both
#    `member` and `guest`, which the cluster declares mutually exclusive.  exit 1

It is honest the same way the tiers are. Only a rank-1 finding is a verdict — a deterministic contradiction: one identifier fusing two is-not concepts, or a self-contradictory declaration (caught before any file is read). Ranks 2–3 are reader hints — they assert nothing, can never block, and never render as a certificate. An is-a overlap (a member that is a user) is satisfiable by construction, so it stays silent — no false positive. The one weak link, classification ("is this token a member-use?"), is lexical and named as such: a human confirms each site as the adjudication step. --gate introduced reports only leaks new versus a base ref. Also available as the MCP tool evaluate_domain.

Packages

Package	What it is
`@entailer/core`	The trusted kernel: Formula AST, DSL parser, propositional tableau + DPLL, classification, the IR, and the `LogicReport` schema. Pure, dependency-light.
`@entailer/cli`	`entailer` binary: evaluate a supplied formalization, `--json` output, honest exit codes including a distinct `UNKNOWN`-blocked code.
`@entailer/mcp`	Stdio MCP server exposing deterministic tools (`check_validity`, `check_consistency`, `classify_formula`, `evaluate_sentence`, `evaluate_argument`, `evaluate_prompt`, `evaluate_markdown`, `evaluate_repo`, `evaluate_pull_request`) with structured output. The default path is one-call: the caller formalizes in-context, then calls these tools.
`@entailer/translate`	The LLM seam (the only package that touches a model). Turns prose into the typed IR via Claude, then the deterministic core verifies it. A parse failure or undeclared atom forces the confidence band low, so the core returns `UNKNOWN` rather than a confident-but-wrong verdict. The formalizer is injectable, so the engine is fully testable offline.

Develop

pnpm install
pnpm check        # typecheck + all tests across the workspace

Requires Node >=20.19 and pnpm. Built with pnpm workspaces and TypeScript.

pnpm check is the full local gate: typecheck, every test suite, and the taxonomy

schema drift gate (pnpm gen then a clean git diff). The formalize skill is vendored one-way into plugin/skills/formalize/ via pnpm vendor:skill, and the machine-readable taxonomy is generated from its references, never hand-copied. A GitHub Actions CI lane running pnpm check lands when the repo gets its public remote.

License

MIT.