npm.io
1.1.0 • Published yesterdayCLI

contextplusplus

Licence
MIT
Version
1.1.0
Deps
6
Size
1.4 MB
Vulns
0
Weekly
0

contextplusplus

Convert public URLs and provider responses into clean, LLM-friendly Markdown — through a CLI, REST API, or MCP server.

npm node license

npm install contextplusplus   # or: npx contextplusplus process <url>

Overview

contextplusplus takes a public URL (or a provider's raw JSON), figures out which platform endpoint it belongs to, fetches a structured response through the Scrape Creators API, validates it against a schema, and renders it as readable Markdown. It exists so agent workflows, research pipelines, and internal tools get clean context instead of raw HTML scrapes or noisy JSON dumps.

  • URL-only auto-dispatch — paste a URL, get Markdown; the right endpoint is detected from the URL shape.
  • Explicit endpoint hints — force a specific endpoint and pass arguments when a URL is ambiguous or search-style.
  • Provider caching — Redis, filesystem, or in-memory, with read-through / refresh / bypass / cache-only modes.
  • Saved run artifacts — request, raw upstream, decoded payload, rendered Markdown, and metadata to disk.
  • Generic-URL fallbacks — for URLs no provider claims: a staged escalation ladder — scrape.do T0 normal → T1 render → T2 super proxy, then a T3 Kernel headless-browser last resort.
  • Authenticated MCP tool — expose the same URL-to-Markdown engine to MCP clients through mcp-use with Supabase OAuth and quota metering.

Built on Effect-TS v3 with a strict four-layer architecture, OpenTelemetry tracing, and an optional in-app Sentry integration.

How it works

URL or provider JSON
      │
      ▼
detect endpoint ──► build outbound request ──► fetch (cache-aware) ──► decode (schema) ──► render Markdown
 (pure provider        (per-strategy,            (Redis/FS/memory,        (typed envelope)    (Handlebars template →
  definitions)          SSRF-guarded)             dedupe, retries)                             SC formatter → JSON)

The Markdown renderer tries, in order: an endpoint-specific Handlebars template, then the Scrape Creators success formatter, then a JSON code-fence fallback. When the resolved URL matches no structured provider, the generic-URL ladder runs instead and escalates only when a tier returns empty, thin, or bot-blocked Markdown: T0 scrape.do normal → T1 scrape.do rendered (render) → T2 scrape.do super proxy (super, US geo) → T3 Kernel headless browser (the last-resort tier; requires KERNEL_API_KEY). T0–T2 need SCRAPEDO_API_KEY; T3 also needs KERNEL_API_KEY.

Supported platforms

31 URL-routable platforms covering 146 endpoints, all served through Scrape Creators. Per-endpoint details (arguments, spec paths, URL examples) are in docs/ENDPOINTS.md.

Platform serviceId Host examples Endpoints
TikTok tiktok tiktok.com, m.tiktok.com 20
YouTube youtube youtube.com, youtu.be 15
Instagram instagram instagram.com 13
Facebook facebook facebook.com, m.facebook.com 10
GitHub github github.com 9
Reddit reddit reddit.com, old.reddit.com, v.redd.it 8
Twitter / X twitter x.com, twitter.com 6
TikTok Shop tiktok-shop tiktok.com/shop, shop.tiktok.com 5
Threads threads threads.net, threads.com 5
Rumble rumble rumble.com, www.rumble.com 5
LinkedIn linkedin linkedin.com, www.linkedin.com 5
Twitch twitch twitch.tv, clips.twitch.tv 4
Pinterest pinterest pinterest.com, pinterest.co.uk 4
Facebook Ad Library facebook-ad-library facebook.com/ads/library 4
Truth Social truth-social truthsocial.com 3
Spotify spotify open.spotify.com 3
SoundCloud soundcloud soundcloud.com 3
Google Ad Library google-ad-library adstransparency.google.com 3
Facebook Marketplace facebook-marketplace facebook.com/marketplace 3
Facebook Events facebook-events facebook.com/events 3
Bluesky bluesky bsky.app, staging.bsky.app 3
LinkedIn Ad Library linkedin-ad-library linkedin.com/ad-library 2
Snapchat snapchat snapchat.com/add 1
Pillar pillar pillar.io 1
Linktree linktree linktr.ee 1
Linkme linkme link.me 1
Linkbio linkbio link.bio 1
Komi komi komi.io 1
Kick kick kick.com 1
Google Search google google.com/search 1
Amazon Shop amazon-shop amazon.com, amazon.co.uk, amazon.de 1

Plus one inference endpoint, age-gender (the detect-age-gender hint), which estimates audience age/gender for a profile and is reached by explicit hint rather than URL auto-dispatch.

Quickstart — HTTP server

pnpm install
cp .env.example .env          # then set SCRAPE_CREATORS_API_KEY
pnpm --filter @contextplusplus/api serve   # starts on PORT (default 3000)

POST /v1/process is authenticated by default (AUTH_MODE=required): pass either a Supabase JWT as Authorization: Bearer <jwt> or a programmatic key as X-Api-Key: <key>. An anonymous request is rejected with 401. For local development you can opt into the legacy anonymous path with AUTH_MODE=optional-dev (the examples below then work without a credential). /health and GET /v1/services are always public.

URL-only auto-dispatch:

curl -s http://localhost:3000/v1/process \
  -H 'authorization: Bearer <supabase-jwt>' \
  -H 'content-type: application/json' \
  -d '{"url":"https://www.tiktok.com/@stoolpresidente"}'

URL plus an endpoint hint (here authenticating with an API key instead of a JWT):

curl -s http://localhost:3000/v1/process \
  -H 'x-api-key: <api-key>' \
  -H 'content-type: application/json' \
  -d '{"url":"https://www.youtube.com/watch?v=dQw4w9WgXcQ","endpoint":"video"}'

Endpoint hint plus endpointArgs (for search-style endpoints):

curl -s http://localhost:3000/v1/process \
  -H 'authorization: Bearer <supabase-jwt>' \
  -H 'content-type: application/json' \
  -d '{"url":"https://www.tiktok.com/","endpoint":"search-users","endpointArgs":{"query":"openai"}}'

Read your account's quota and subscription state (no quota is reserved):

curl -s http://localhost:3000/v1/usage -H 'authorization: Bearer <supabase-jwt>'

Send Accept: text/markdown to get the rendered Markdown directly for a single successful item (instead of the JSON envelope).

Quickstart — MCP server

The MCP server runs side-by-side with the HTTP API and serves the mcp-use endpoint at /mcp on MCP_PORT. Clients authenticate through Supabase OAuth (handled by mcp-use); the server reads the caller identity from ctx.auth.user.userId. Before the tool runs, the caller's subscription and quota are checked and a unit is reserved — the call is rejected when there is no active subscription or the monthly quota is exhausted, and the reserved unit is released if the engine fails. Point a client (Claude Desktop, MCP Inspector, or any mcp-use client) at ${MCP_URL}/mcp and complete the OAuth consent flow.

pnpm install
cp .env.example .env
# set SCRAPE_CREATORS_API_KEY plus the MCP_USE_OAUTH_SUPABASE_* values
pnpm --filter @contextplusplus/mcp dev   # starts on MCP_PORT (default 3001)

The server exposes one tool:

Tool Input Output
convert_url { "url": "https://..." } Markdown content plus minimal runId, status, and optional cacheState metadata.

Offline CI tests construct the MCP server with fake OAuth, fake usage, and a fake runtime. They do not require live Supabase OAuth or provider network access.

Quickstart — CLI

The binary is contextplusplus. Core commands are process (fetch + render a URL or file), render (re-render a saved artifact), login, logout, whoami, and api-key create|list|revoke. There is no default subcommand — name process explicitly.

# from a checkout
pnpm --filter contextplusplus dev -- process --url https://www.linkedin.com/in/williamhgates --format markdown
pnpm --filter contextplusplus dev -- process --url https://www.tiktok.com/ --endpoint search-users --endpoint-args 'query=taylor swift' --format markdown
pnpm --filter contextplusplus dev -- process --url https://www.youtube.com/@MrBeast --explain --format markdown
printf '%s' '{"url":"https://www.google.com/search?q=contextplusplus","endpoint":"search"}' | pnpm --filter contextplusplus dev -- process - --format markdown

# after publish / npm link
npx contextplusplus process https://www.reddit.com/r/typescript/comments/1abcde/example/ --format markdown
contextplusplus render --raw artifacts/runs/2026-06-13/<run-id>/upstream.json

When CONTEXTPLUSPLUS_API_URL is set, process sends paid/default work to the authenticated backend and attaches credentials in this order: --api-key, CONTEXTPLUSPLUS_API_KEY, then the stored Supabase login session. Use --local for explicit developer-only in-process execution from the checkout. This keeps provider keys and quota enforcement server-side for distributed CLI use.

contextplusplus login
contextplusplus whoami
contextplusplus api-key create --name ci
CONTEXTPLUSPLUS_API_URL=https://api.example.com contextplusplus process --url https://example.com --format markdown
contextplusplus process --local --url https://example.com --format markdown
process flags
Flag Default Meaning
input (positional) A URL or a path to a JSON file, or - for stdin. Mutually exclusive with --url; one of the two is required.
--url <url> Alternative to the positional input.
--endpoint <hint> auto Force a specific endpoint instead of URL detection.
--endpoint-args key=value Repeatable. Adds arguments (e.g. query=...). Quote values with spaces.
--cache-mode <mode> read-through read-through, refresh, bypass, or cache-only. refresh is live and paid — it requires CONFIRM_REFRESH=1.
--save-artifacts / -a off Persist request / upstream / decoded / Markdown / metadata to ARTIFACT_ROOT.
--concurrency <n> 10 Batch concurrency.
--format json|markdown json Output format.
--explain off Prepend the resolved strategy / service / endpoint / attempt status.
--api-key <key> Explicit backend credential. Takes priority over CONTEXTPLUSPLUS_API_KEY and stored login.
--local off Run the in-process developer engine even when CONTEXTPLUSPLUS_API_URL is configured.

render takes a single --raw <path> and prints the re-rendered Markdown for a saved artifact.

Exit codes: 0 success · 2 input validation error · 3 auth failure · 4 subscription required · 5 quota exceeded · 124 provider timeout · 1 any other failure. The CLI never throws a raw stack at the user.

HTTP API

The service contract is in docs/openapi.yaml. (The root scrape-creators-openapi.yaml is the upstream Scrape Creators reference, not this service's contract.)

Method Path Purpose
POST /v1/process Process one request or a batch; returns the run envelope (or Markdown with Accept: text/markdown).
GET /v1/services Discovery: public Scrape Creators services and endpoints. Generic-URL fallback strategies are private implementation details.
GET /v1/metrics JSON snapshot of runtime counters.
GET /v1/metrics/prom The same counters in Prometheus exposition format.
GET /v1/usage Authenticated. Read the caller's plan tier, monthly usage, remaining quota, period end, subscription status, and entitlement state.
GET /v1/customer/dashboard Authenticated. One call: the caller's own usage snapshot plus available tiers and current plan, for the self-serve account page.
GET /v1/customer/usage-history Authenticated. Paginated ledger of the caller's own usage events (?start&end&limit&offset), newest first.
POST /v1/api-keys Authenticated. Create a programmatic API key ({ "name": "...", "expiresAt"?: "...", "scopes"?: ["process", "account-read", "api-key-manage"] }); returns the raw key once plus its metadata (201).
GET /v1/api-keys Authenticated. List the caller's API-key metadata (never the raw secret).
DELETE /v1/api-keys/:keyId Authenticated. Revoke one of the caller's API keys (204).
PATCH /v1/api-keys/:keyId/rotate Authenticated (api-key-manage scope). Mint a new secret on the same key — preserving its name, scopes, expiry and creation time — and invalidate the old one immediately; returns the new raw key once (200). A missing/foreign/revoked key is 404.
POST /v1/billing/checkout Authenticated. Create a Stripe Checkout Session for a standard/pro plan; returns the redirect url.
POST /v1/billing/portal Authenticated. Create a Stripe Customer Portal session (upgrade/cancel/payment method).
POST /v1/billing/webhook Public but Stripe-Signature-verified against the raw body; upserts subscription state idempotently.
GET /health Liveness probe (no /v1 prefix, always public).
Authentication

Authenticated routes accept either credential, resolved per request:

  • Authorization: Bearer <jwt> — a Supabase JWT, verified against SUPABASE_JWT_SECRET (local HS256) or the project JWKS (hosted), with the audience claim checked against SUPABASE_JWT_AUD (default authenticated).
  • X-Api-Key: <key> — a programmatic API key minted via POST /v1/api-keys (or the CLI). The raw key is shown once at creation; the backend persists only sha256(key) and looks the caller up by that hash, so the plaintext is never stored.

A missing/invalid credential returns 401 (AuthenticationRequiredError / InvalidCredentialError). The only unauthenticated routes are GET /health, GET /v1/services, GET /v1/metrics, GET /v1/metrics/prom, and POST /v1/billing/webhook (which is instead signature-verified).

API-key scopes

API keys carry capability scopesprocess (run /v1/process), account-read (read /v1/usage, /v1/plans, /v1/customer/*), and api-key-manage (manage /v1/api-keys). A key created without an explicit scopes array — and every key minted before scopes existed — is granted the full set, so enforcement is fully backward-compatible. Bearer (Supabase JWT) sessions are always full-scope. A key that lacks the scope a route requires is rejected with 403 (InsufficientScopeError); the admin and billing routes are not scope-gated (admin stays gated by the ADMIN_USER_IDS allowlist).

Rate limiting

When the product server is run (pnpm --filter @contextplusplus/api serve), IP-scoped rate limiting is on by default and sits in front of the per-user quota gate. Limits are env-tunable; over-limit requests get a sanitized 429 with Retry-After and RateLimit-* (draft-8) headers. Defaults:

Scope Routes Default limit
global every /v1/* route 600 requests / 15 min per IP
auth /v1/api-keys, /v1/usage, /v1/plans, /v1/customer/*, /v1/billing/{checkout,portal} 30 requests / 15 min per IP
webhook /v1/billing/webhook 600 requests / 1 min per IP

/health is never throttled. Disable with HTTP_RATE_LIMIT_ENABLED=false; tune individual windows/limits with the HTTP_RATE_LIMIT_* variables (see the env table). Behind a reverse proxy, set HTTP_TRUST_PROXY_HOPS to the real hop count so the client IP (and thus the limit bucket) cannot be spoofed via X-Forwarded-For.

Request body (POST /v1/process) is either a single request or a batch:

// single
{ "url": "https://...", "endpoint": "video", "endpointArgs": { "query": "..." }, "timeoutMs": 30000 }

// batch (1–100 items)
{ "requests": [ { "url": "https://..." }, ... ],
  "options": { "saveArtifacts": true, "cacheMode": "read-through", "concurrency": 10 } }

url must be http(s), ≤ 2048 chars, and may not contain credentials. Response is a ProcessRunResponse envelope (runStatus, summary, results[]). Failures use a stable error envelope: { "error": { "tag": string, "message": string, "context"?: object } } — never a raw cause. With Accept: text/markdown, a single successful item returns its Markdown plus x-contextplusplus-* response headers (including x-contextplusplus-request-id); a non-success run returns 502.

Each item carries a flattened provenance summary — { provider, strategyId, tier?, elapsedMs, cacheState?, cost?, estimatedUsd? } — derived from the decisive provider attempt (no extra computation), plus fetchedAt (when the decisive attempt completed) and a stable contentHash (sha256 hex of the item's markdown, or of its decoded data via an order-independent stringify) for change detection. cost/estimatedUsd reflect internal upstream spend and are redacted for non-admin callers — stripped from both provenance and every attempts[].cost / attempts[].usage.estimatedUsd; admin callers (the ADMIN_USER_IDS allowlist) see full cost. provider/strategyId/tier/latency/cache stay visible to everyone.

For a single-item response the boundary also sets additive headers: X-Cache (the cache state normalized to HIT, MISS, or STALE), a weak ETag (the item contentHash), and X-Fetched-At (the fetch timestamp) — usable for conditional/incremental fetching by crawlers and caches.

When a single-item run is rate-limited upstream (ProviderRateLimitedError) and the provider supplied a Retry-After, the response is an HTTP 429 with that Retry-After value passed through, so clients can honor the upstream backoff. (Per-user quota 429s carry no Retry-After.)

Billing & quota

Billing follows Model A: two flat recurring Stripe Prices (standard $7 / pro $17) map app-side to plan_tiers, and per-tier usage limits are enforced in Postgres via check_and_increment_quota — there is no Stripe metered usage. On the authenticated /v1/process path the active subscription is checked and a quota unit is reserved before any provider dispatch: no active subscription returns 402, over-quota returns 429. Customers can read their current account snapshot with GET /v1/usage, which returns { planTier, monthlyLimit, used, remaining, periodEnd, subscriptionStatus, entitled } without reserving quota. Subscribe with POST /v1/billing/checkout ({ "plan": "standard" | "pro" }), manage with POST /v1/billing/portal, and let Stripe drive POST /v1/billing/webhook, which verifies the Stripe-Signature header against the raw request body and upserts subscription state idempotently. Set the STRIPE_* variables below to enable it.

Tests are offline and deterministic: webhooks are exercised with stripe.webhooks.generateTestHeaderString + constructEvent against a fixed whsec_test_secret, and checkout/portal/quota run through fakes. There is no live Stripe call or spend in pnpm run verify. Hosted Supabase/Stripe provisioning and live payment credentials are out of scope here.

TypeScript SDK

A typed client lives in sdk/typescript/. Its types are generated from docs/openapi.yaml (so they never drift from the API), and the client is hand-authored over the global fetch — it adds no runtime dependency.

import { ContextPlusPlusClient } from "@contextplusplus/sdk";

const client = new ContextPlusPlusClient({ baseUrl: "https://api.example.com", apiKey: process.env.CONTEXTPLUSPLUS_API_KEY });
const run = await client.process({ url: "https://example.com/article" });
console.log(run.results[0]?.markdown);

Regenerate the types after any spec change with pnpm run sdk:gen; pnpm run sdk:check (a leg of pnpm run verify) fails CI on any drift, mirroring the live-fixtures discipline. The SDK is committed as source and is excluded from npm pack (it is a repo artifact, not part of the published runtime).

Python SDK

A typed Python client lives in sdk/python/. Its dataclass models are generated from docs/openapi.yaml by scripts/generate_sdk_py.py, and the client is hand-authored over the standard-library urllib — it has no third-party runtime dependency (the models are stdlib dataclasses).

from contextplusplus import ContextPlusPlusClient

client = ContextPlusPlusClient("https://api.example.com", api_key="atmd_...")
run = client.process({"url": "https://example.com/article"})
print(run["results"][0]["markdown"])

Regenerate the models with pnpm run sdk:gen:py (the generator toolchain is dev-only and fully pinned in sdk/python/requirements-dev.txt). Drift is gated by the dedicated python-sdk CI job running pnpm run sdk:check:py, which keeps the Node pnpm run verify gate Python-free. Like the TypeScript SDK, it is committed as source and excluded from npm pack.

Any change to docs/openapi.yaml must regenerate BOTH SDKspnpm run sdk:gen:all runs the TypeScript (sdk:gen) and Python (sdk:gen:py) generators together. Two drift gates enforce it: TypeScript sdk:check inside pnpm run verify, and Python sdk:check:py in the python-sdk job.

Playground

The HTTP server serves a zero-dependency "try it" page at GET /playground (public, no /v1 prefix, so it is not throttled by the /v1 limiters — like /health). Paste a URL, optionally supply your API key, and it calls the same-origin POST /v1/process and renders the returned Markdown, plus copy-paste curl / CLI / MCP snippets for that URL.

Security: the page is a fixed HTML constant with no server-side interpolation, every dynamic value is inserted via textContent (never innerHTML), and it ships a strict CSP (default-src 'none'; connect-src 'self') so it can only talk to this origin. The API key is sent only to this server as X-Api-Key, is not persisted unless you opt in (a checkbox, with a warning), and is never echoed into the generated snippets (which use a $CONTEXTPLUSPLUS_API_KEY placeholder). There is no anonymous/keyless compute path.

Caching

Provider responses are cached under a deterministic, secret-free key. Choose the backend with CACHE_BACKEND:

  • filesystem (default) — artifacts under ARTIFACT_ROOT.
  • redis — requires REDIS_URL; connection is established eagerly at startup.
  • memory — in-process, ephemeral (used by tests and ad-hoc CLI runs).

Per-request behavior is controlled by cacheMode: read-through (default), refresh (re-fetch and overwrite — paid), bypass (ignore cache), cache-only (never hit the network). Cache and key details: CACHE.md.

Observability

  • Tracing is OpenTelemetry-native (@effect/opentelemetry + OTLP/HTTP). Set OTEL_EXPORTER_OTLP_ENDPOINT (default http://localhost:4318); spans post to ${endpoint}/v1/traces.
  • Sentry is optional and gated on SENTRY_DSN (unset → no-op). It is wired into the OTel span pipeline, receives sanitized Effect logs through Sentry Logs, and is initialized before HTTP, CLI, and MCP runtime construction.
  • Metrics are process-local counters at GET /v1/metrics (JSON) and GET /v1/metrics/prom (Prometheus, text/plain; version=0.0.4), prefixed contextplusplus_, with service / endpoint / provider / error_tag labels.
# Prometheus scrape
- job_name: contextplusplus
  static_configs:
    - targets: ['contextplusplus:3000']
  metrics_path: /v1/metrics/prom

Environment variables

Variable Required Default Purpose
SCRAPE_CREATORS_API_KEY Yes API key for all scrape-creators.* strategies.
SCRAPEDO_API_KEY Go-live scrape.do key for the generic-URL T0/T1/T2 tiers (the deployment-readiness check requires it).
KERNEL_API_KEY Generic-URL T3 only Kernel key for the generic-URL T3 headless-browser last-resort tier.
REDIS_URL When CACHE_BACKEND=redis Redis connection string.
REDIS_PASSWORD No Used by the bundled docker-compose.yml Redis service.
CACHE_BACKEND No filesystem redis, filesystem, or memory.
CONTEXTPLUSPLUS_CACHE_PROVIDER No filesystem Backward-compatible alias used only when CACHE_BACKEND is unset.
ARTIFACT_ROOT No ./artifacts Directory for saved run artifacts.
DEFAULT_CONCURRENCY No 10 Default batch concurrency.
UTILIZATION_TARGET No 0.95 Rate-budget utilization target (0 < x ≤ 1).
HOST No Node default Optional HTTP bind host. make local sets 127.0.0.1; make local-lan sets 0.0.0.0.
PORT No 3000 HTTP server port.
MCP_PORT No 3001 MCP server port for apps/mcp's dev / start scripts (pnpm --filter @contextplusplus/mcp dev / start).
MCP_URL No Public MCP base URL used by mcp-use for generated OAuth/widget URLs.
MCP_ALLOWED_ORIGINS No Comma-separated mcp-use origin allow-list / host validation input.
MCP_USE_OAUTH_SUPABASE_PROJECT_ID Hosted MCP OAuth Supabase project ID for mcp-use OAuth.
MCP_USE_OAUTH_SUPABASE_URL Local/self-hosted MCP OAuth Supabase base URL alternative to project ID.
MCP_USE_OAUTH_SUPABASE_PUBLISHABLE_KEY MCP OAuth Supabase publishable key consumed by mcp-use OAuth routes.
CONTEXTPLUSPLUS_API_URL Hosted CLI processing Backend base URL for authenticated CLI process and api-key commands.
CONTEXTPLUSPLUS_API_KEY CLI backend auth Programmatic API key; takes priority over stored login.
CONTEXTPLUSPLUS_AUTH_FILE No ~/.config/contextplusplus/credentials.json Override CLI token-store path; useful for tests/CI.
SUPABASE_PUBLISHABLE_KEY CLI login Supabase publishable key for PKCE login.
SUPABASE_URL Live auth/usage Supabase project URL; backend reads api_keys and writes usage/quota.
SUPABASE_SERVICE_ROLE_KEY Live auth/usage Supabase service-role key (secret). Never expose to clients.
SUPABASE_JWT_SECRET Live JWT verify HS256 signing secret for local Supabase JWT verification (hosted uses JWKS).
SUPABASE_JWT_AUD No authenticated Expected audience claim for verified Supabase JWTs.
STRIPE_SECRET_KEY Live billing Stripe secret key for checkout/portal/webhook (secret).
STRIPE_WEBHOOK_SECRET Live billing Verifies Stripe-Signature on /v1/billing/webhook (secret).
STRIPE_PRICE_STANDARD Live billing Stripe Price id mapped to the standard ($7) tier.
STRIPE_PRICE_PRO Live billing Stripe Price id mapped to the pro ($17) tier.
STRIPE_CHECKOUT_SUCCESS_URL No http://localhost:3000/billing/success Post-checkout success redirect.
STRIPE_CHECKOUT_CANCEL_URL No http://localhost:3000/billing/cancel Post-checkout cancel redirect.
STRIPE_PORTAL_RETURN_URL No http://localhost:3000/billing Customer Portal return URL.
STRIPE_MOCK_HOST / STRIPE_MOCK_PORT / STRIPE_MOCK_PROTOCOL Tests only Point the Stripe SDK at a local stripe-mock for request-shape tests.
HTTP_REQUEST_TIMEOUT_MS No 90000 Outer /v1/process timeout; returns 504 when exceeded.
AUTH_MODE No required required rejects anonymous /v1/process with 401; optional-dev re-enables the local anonymous path.
ALLOW_UNAUTHENTICATED_PROCESS No Legacy alias for AUTH_MODE, honoured only when AUTH_MODE is unset (1/trueoptional-dev).
HTTP_TRUST_PROXY_HOPS No 0 Trusted reverse-proxy hop count for client-IP derivation. Set 1 behind exactly one proxy; never higher than the real hop count.
HTTP_RATE_LIMIT_ENABLED No true (server) Master switch for the IP rate limiters (server.ts enables; the in-process test app leaves them off).
HTTP_RATE_LIMIT_GLOBAL_WINDOW_MS / _GLOBAL_LIMIT No 900000 / 600 Global /v1/* window and per-IP limit.
HTTP_RATE_LIMIT_AUTH_WINDOW_MS / _AUTH_LIMIT No 900000 / 30 Stricter window/limit for credential-verifying routes.
HTTP_RATE_LIMIT_WEBHOOK_WINDOW_MS / _WEBHOOK_LIMIT No 60000 / 600 Window/limit for the Stripe webhook route.
CORS_ALLOWED_ORIGINS No * Single value placed verbatim in Access-Control-Allow-Origin.
OTEL_EXPORTER_OTLP_ENDPOINT No http://localhost:4318 OTLP/HTTP collector endpoint.
SERVICE_VERSION No dev OTel resource version + Sentry release fallback.
NODE_ENV No local Environment label.
SENTRY_DSN No Enables Sentry when set.
SENTRY_ENVIRONMENT No NODE_ENV Sentry environment label.
SENTRY_RELEASE No SERVICE_VERSION Sentry release identifier.
SENTRY_TRACES_SAMPLE_RATE No 0.05 Sentry trace sample rate.
SENTRY_LOGS_ENABLED No true Forward Effect.log* records to Sentry Logs when SENTRY_DSN is set.
SENTRY_LOGS_MIN_LEVEL No info Minimum Sentry log level: trace, debug, info, warn, error, or fatal.

See .env.example for commented examples.

Architecture

Effect-TS layers, imports flowing downward only. Domain, application, and infrastructure live together in packages/core/src/; the three interfaces are separate apps (apps/cli, apps/api, apps/mcp) that depend on packages/core:

interfaces/      apps/cli + apps/api + apps/mcp — the only place an Effect is run
   │
application/     packages/core/src/application: workflows, ports (service contracts), scheduling, cache-key, config schema — no IO
   │
infrastructure/  packages/core/src/infrastructure: Live Layers: cache backends, outbound HTTP + URL safety, providers, markdown, artifacts, observability
   │
domain/          packages/core/src/domain: tagged errors, branded ids, schemas, the ProviderStrategy contract — pure

Each provider has two 1:1 trees: pure definitions under packages/core/src/providers/definitions/<service>/ (URL parsing, endpoint detection, query builders) and Effect strategies under packages/core/src/infrastructure/providers/scrape-creators/<service>/. The catalog is assembled in packages/core/src/infrastructure/providers/catalog-live.ts; the runtime is composed in packages/core/src/infrastructure/runtime.ts. Agent-facing guidance lives in per-folder AGENTS.md files. See docs/ARCHITECTURE.md.

Development

git clone <repo-url> && cd contextplusplus
pnpm install
pnpm --filter contextplusplus dev -- --help

Agents should not use this workstation for package verification. Build, test, fixture, pack, and deploy-prebuild checks run in GitHub Actions on Ubicloud runners; local commands are for interactive development and lightweight agent-feedback checks only. Run pnpm run agent:check locally before pushing to catch whitespace/workflow mistakes without spending CI minutes.

Agent work should normally happen in a task worktree. In Herdr-managed sessions, use native Herdr worktree commands, or $herdr-pm-agent's pm.py spawn-exec helper when available, so each task has an isolated branch and cleanup path. The same optimized GitHub Actions workflow is the proof path for worktrees and normal repository work.

This is a pnpm + Turborepo monorepo: use pnpm run <script> for root scripts and pnpm --filter <pkg> <script> to target one workspace member (package names: @contextplusplus/core, contextplusplus (CLI), @contextplusplus/api, @contextplusplus/mcp, @contextplusplus/website, @contextplusplus/dashboard).

Script Purpose
pnpm --filter contextplusplus dev -- --help Run the CLI from source.
pnpm --filter @contextplusplus/api serve Start the HTTP server (tsx, Sentry preloaded).
pnpm --filter @contextplusplus/mcp dev Start the MCP server (tsx, Sentry preloaded) on MCP_PORT.
pnpm run build turbo run build — builds every buildable app/package (packages/core, apps/cli, apps/api, apps/mcp via tsc -b; apps/website/apps/dashboard via next build); CI-owned for proof.
pnpm run test / pnpm run test:e2e Root unit + integration / end-to-end suites; CI-owned for proof.
pnpm run fixtures:check-live Re-render every fixture and assert it matches rendered.md; CI-owned for proof.
pnpm run verify:fast Fast every-push CI gate: workspace typecheck + each app/package's tests + root test + e2e. This is what the Verify (fast, Node 24) CI job runs on every push.
pnpm run lint:rules Project rule guard (scripts/check-rules.mjs): enforces Effect clean rules (runPromise confinement, no console/logFatal, .js imports, no raw throw outside providers, no legacy imports); workspace-aware — skips the Next.js frontend packages. Also invoked by agent:check and the CI verify step.
pnpm run agent:check Fast local-only agent hygiene: git diff --check, lint:rules, plus actionlint when installed.
pnpm run verify Full CI gate: verify:fast + fixtures + sdk:check + build + CLI smoke + CLI pack. Runs as an explicit step in release.yml on tag publish.

The Makefile is a small convenience layer over the root scripts, but it has not been updated for the pnpm/Turborepo split: it still shells out to npm (npm ci, npm run serve, npm run dev -- --help, npm pack --dry-run), and the root package is no longer installed with npm and no longer has serve or dev scripts. Use the pnpm/pnpm --filter commands above instead until the Makefile is regenerated:

Target Purpose Status
make install Install exact locked dependencies. Broken — npm ci needs a package-lock.json; this repo has only pnpm-lock.yaml
make local Start the HTTP API on 127.0.0.1:3456. Broken — calls npm run serve (no longer a root script)
make local-lan Start the HTTP API on 0.0.0.0:3456 for explicit LAN testing. Broken — calls npm run serve (no longer a root script)
make cli Run CLI help from source. Broken — calls npm run dev (no longer a root script)
make stop / make clean Stop this repo's dev process on PORT; clean also removes dist/. OK
make build / make test / make fixtures / make verify Developer conveniences for the matching package checks; agents use GitHub Actions for proof. Resolve if node_modules already exists (root retains these script names), but assume pnpm install, not make install, populated it
make pack Preview npm package contents. Not meaningful — packs the private root workspace, not the published CLI (apps/cli); use pnpm --filter contextplusplus pack

Override local API settings with PORT=<port> and, for make local, HOST=<host>.

Behavior changes should land with a test. The fixture corpus (test/fixtures/live/) holds 145 captured upstream payloads and their rendered Markdown; regenerate with pnpm run fixtures:render-live only for an intentional template or schema change, never by hand. Final verification is the GitHub Actions run for the committed SHA.

Local git hooks

The repo ships lightweight local hooks in .githooks/:

  • commit-msg — runs commitlint to enforce Conventional Commits with a required scope (types: feat|fix|docs|refactor|test|chore|perf; subject ≤ 50 chars; bare fix: without a scope is rejected).
  • pre-push — runs pnpm run agent:check (whitespace diff + rule-guard + actionlint; sub-second).

Hooks are intentionally minimal: no tsc, no vitest, no build — those are CI-only and would thrash under multiple worktrees.

Wire them once after pnpm install (or manually if you rarely install):

git config core.hooksPath .githooks

pnpm install also runs the prepare script, which sets core.hooksPath automatically. The setting lives in the shared .git/config, so it applies to all worktrees of this repo.

Docker

cp .env.example .env          # set deploy env; see DEPLOYMENT.md
docker compose up --build
curl http://localhost:3000/health
curl -s http://localhost:3000/v1/services | jq '.services | length'

The multi-stage Dockerfile + docker-compose.yml run the API plus Redis. The container listens on PORT, persists artifacts in the api-artifacts volume, and ships a /health healthcheck. See DEPLOYMENT.md for go-live env validation, Supabase migrations, Stripe webhook setup, and the one-proxy HTTP_TRUST_PROXY_HOPS=1 setting.

CI & publishing

  • CI (.github/workflows/ci.yml) runs on every push and manual dispatch on Ubicloud arm64 runners; a changes paths-filter gates the heavy suites. Every push: Verify (Node 24 — pnpm run verify:fast = workspace typecheck + tests + e2e, plus the project rule guard) and a gitleaks Secret scan. On main / dispatch: Package & smoke (build + CLI smoke:cli + sdk:check + CLI pack), parallel to Verify. Path-gated jobs run only on the relevant change (or workflow_dispatch for a full matrix): Compatibility (Node 22), RPC regression (real Postgres), Supabase types drift, Python SDK drift, Web build (website + dashboard). Typical wall-time < 10 min. Production deploys (deploy.yml, deploy-web.yml) are manual, confirm-gated workflow_dispatch only.
  • Completion rule: a non-trivial change is not complete until the committed SHA is green in the intended CI jobs. Keep main clean, current, and pushed after merging verified work.
  • Publishing runs pnpm run verify as an explicit "Verify package" CI step before publish (no prepublishOnly lifecycle hook is used). Pushing a v*.*.* tag triggers .github/workflows/release.yml, which publishes both @contextplusplus/core and contextplusplus to npm with NPM_TOKEN and creates a GitHub Release.

Security & license

Report vulnerabilities per SECURITY.md. Outbound requests are guarded against SSRF (private/loopback/link-local/CGNAT addresses and DNS-rebinding are rejected). Secrets are Redacted and stripped from cache entries, artifacts, logs, and error envelopes.

MIT — see LICENSE.