@zakkster/lite-profiler
Always-on frame and per-phase profiling for HTML5 game loops. performance.now() into power-of-two ring buffers, single-pass percentiles, a frame-time classifier that tells a GC spike apart from a sustained throttle, deterministic per-frame command counters that gate exactly, and a binary capture format you can serialize, ship, and read back.
The in-engine profiler stats.js doesn't have and DevTools can't keep running.
The core is the engine-agnostic, zero-GC capture surface. 1.1.0 added capture comparison + regression gating; 1.2.0 added deterministic per-frame counters. The reactive
lite-signalbridge, thelite-scheduler/lite-ecsadapters, and a GPU HUD are companion packages built on this surface (see Roadmap).
Why lite-profiler?
| Capability | lite-profiler | stats.js | DevTools Performance | Phaser built-in | hand-rolled |
|---|---|---|---|---|---|
| Per-phase (named sub-frame) timing | Yes | No | Manual (User Timing) | No | Manual |
| Zero-GC hot path | Yes | No | n/a | Partial | Depends |
| Percentile stats (p50 / p99) | Yes | No | Manual | No | Manual |
| GC-spike vs throttle classifier | Yes | No | No | No | No |
| Shareable binary capture + reader | Yes (.litecap) |
No | Partial (JSON) | No | No |
| Headless capture (no DOM) | Yes | No | n/a | No | Yes |
| Worker / OffscreenCanvas HUD | Yes | No | n/a | No | Manual |
| Framework-agnostic | Yes | Yes | Yes | No (Phaser only) | Yes |
| TypeScript declarations | Yes | Partial | n/a | Yes | n/a |
| Reactive (signal) surface | Via bridge | No | No | No | No |
Chrome DevTools' Performance panel is the right tool for a deep, one-off flame-graph investigation with full call stacks. lite-profiler is for the other job: telemetry that runs every frame, in production, that you can classify and serialize. Different tools, different moments.
Installation
npm install @zakkster/lite-profiler
ESM only. Requires @zakkster/lite-ring-buffer, @zakkster/lite-stats-math, and @zakkster/lite-canvas-graph at >= 1.0.1 (the patch that fixes their package exports for strict Node ESM resolution). npm pulls them automatically.
Architecture at a Glance
The hot path is imperative and allocation-free: it does nothing but push performance.now() deltas into pre-allocated ring buffers. Everything that reasons about the data — percentiles, the histogram, capture export, the HUD — runs off the hot path, reading from those buffers on your schedule.
flowchart LR
L[game loop] -->|beginFrame / begin / end / endFrame| P[Profiler]
P -->|push - zero alloc, zero signals| R[(Float32 ring buffers)]
R --> S[StatsMath<br/>avg / min / max / p01 / p99]
R --> H[FrameHistogram<br/>bins + classify]
R --> C[encodeCapture<br/>.litecap]
C -.->|round-trip| D[decodeCapture]
R --> U[MeterHud<br/>CanvasGraph]
Quick Start
import { Profiler, FrameHistogram, FrameClass } from '@zakkster/lite-profiler';
const profiler = new Profiler(1024, ['input', 'physics', 'render']);
const hist = new FrameHistogram();
function frame() {
profiler.beginFrame();
profiler.begin('input'); /* read input */ profiler.end('input');
profiler.begin('physics'); /* step world */ profiler.end('physics');
profiler.begin('render'); /* draw */ profiler.end('render');
profiler.endFrame();
requestAnimationFrame(frame);
}
requestAnimationFrame(frame);
// Off the hot path (a timer, every N frames, a devtools panel):
hist.update(profiler.frame);
if (hist.classify() === FrameClass.SPIKING) {
console.warn('intermittent hitches, spikeRatio =', hist.spikeRatio.toFixed(3));
}
Live dashboard demo
demo/index.html is a single-file diagnostic dashboard in the oscilloscope theme: a boids flock instrumented across four phases (spatial, steer, integrate, render) with the full telemetry surface beside it — the classifier verdict, fps, p50 / p99, a per-phase breakdown, and a live frame-time histogram. Drag the boid count up until frames stay over budget (watch it turn THROTTLED); hit inject GC to scatter sparse hitches while the phase bars stay calm (SPIKING — a pause your code didn't cause). Export the session to .litecap from the toolbar.
npm install # the demo loads the deps from node_modules through an import map
npx serve . # then open http://localhost:3000/demo/
The reactive-profiler trap
A profiler measures the hot path, so it must never be a cost on the hot path. The trap is the obvious-but-wrong move: writing a signal (or firing a callback) on every begin/end. At a few hundred phase boundaries per frame that is allocation and propagation churn inside the very loop you are trying to measure — your profiler shows the overhead of your profiler.
The core sidesteps it structurally: begin/end/beginAt/endAt only write a float32 into a fixed ring buffer — no allocation, no callbacks, no reactivity. The reactive surface (the lite-signal bridge package) follows the same rule the lite-ws transport does: the hot path stays silent, and a single rAF-aligned pulse lifts coarse summaries (fps, frameMs.p99, per-phase rollups) into signals about ten times a second. The capture buffers stay flat; only the summary signals move.
Frame Classifier
FrameHistogram buckets frame times on a log-ish scale and then labels the shape of the distribution. Two high-latency signatures look very different in the buckets:
- Sparse spikes — most frames are on budget, a few land in the
33-66/>=66ms buckets. Classic GC pause / occasional hitch. - Sustained elevation — a large share of frames sit just above budget in the
16-33ms bucket. CPU-bound, thermal throttle, or a background tab.
Buckets (ms): [0] <2 [1] 2-4 [2] 4-8 [3] 8-16 [4] 16-33 [5] 33-66 [6] >=66. The 16 and 33 ms edges are the 60fps and 30fps budgets.
classify() is a deliberately simple, documented heuristic over two ratios (jankRatio = fraction >= 16ms, spikeRatio = fraction >= 33ms). It is a label, not a verdict — the raw bins, jankRatio, and spikeRatio are all exposed so you can write your own rule.
stateDiagram-v2
[*] --> STEADY
STEADY --> SPIKING: jankRatio >= 0.05
SPIKING --> THROTTLED: jankRatio >= 0.25
THROTTLED --> SPIKING: jankRatio < 0.25
SPIKING --> STEADY: jankRatio < 0.05
Deterministic counters
Timing is noisy, so you gate it by tolerance. Command counts are not: the draw calls a frame issued, or the floats it uploaded, are integers your code already has at frame end — identical on every host and run. They gate at zero tolerance, exactly, in CI, with no median-of-N and no GPU.
Counters register as an optional third argument and use the same handle / At hot-path idiom as phases (no per-call string hashing); each flushes one value per endFrame():
const p = new Profiler(1024, ['sim', 'draw'], ['drawCalls', 'floatsUploaded']);
const UP = p.counterHandle('floatsUploaded');
// per frame:
p.count('drawCalls'); // by tag
p.countAt(UP, batch.floats); // by handle (hot path)
summarize() reports a counters block — { [tag]: { sum, avg, min, max, p01, p99, last, count } }, with sum an exact integer total — and the counts that must not drift gate at zero:
assertNoRegression(baseline, candidate, {
'counter.floatsUploaded.max': 0, // dirty-range guard: a one-instance change must not re-upload the buffer
'counter.drawCalls.max': 0 // batching guard: the draw-call count must not climb
});
A counter the baseline tracked that the candidate stops reporting is a regression (reason: 'metric missing in candidate') — a gate that can't see the metric it guards must fail, not pass. Per-frame values are exact to 2^24 (Float32 rings); sum is an exact Float64 total beyond that. This is the seam a GPU render layer emits into: recordDraw / recordUpload become count calls that gate headlessly alongside timing.
Full Module Reference
Profiler — capture core
Zero-GC frame and per-phase timing. Phases are registered once at construction; the hot path allocates nothing.
| Member | Description |
|---|---|
new Profiler(capacity?, phases?, counters?) |
capacity rounds up to a power of two (e.g. 600 -> 1024); phases and the optional counters are string tag arrays. |
beginFrame() / endFrame() |
Bracket a frame; endFrame records total frame time and flushes counter accumulators. |
begin(tag) / end(tag) |
Time a phase by tag. No-op for unknown tags. |
beginAt(handle) / endAt(handle) |
Hot-path form using an integer handle — no string hashing per call. |
handle(tag) / tagOf(handle) |
Resolve a tag to a stable handle (-1 if unknown) and back. |
frame / phase(tag) / phaseAt(handle) |
The underlying RingBuffers for stats, histogram, export. |
phaseCount / capacity |
Counts and the resolved buffer capacity. |
count(tag, n?) / countAt(handle, n?) |
Add to a counter for the current frame (accumulates; flushed on endFrame). The At form skips per-call string hashing. |
counterHandle(tag) / counterTagOf(handle) |
Resolve a counter tag to a stable handle (-1 if unknown) and back. |
counter(tag) / counterAt(handle) / counterCount |
The counter RingBuffers and their count. |
reset() / destroy() |
Clear buffers / release them. |
FrameHistogram — distribution + classifier
update(buffer) (zero-alloc, reuses bins), bins: Uint32Array(7), total, modeIndex, jankRatio, spikeRatio, classify(): FrameClass. FrameClass = { STEADY, SPIKING, THROTTLED }.
encodeCapture / decodeCapture — .litecap binary IO
encodeCapture(profiler, scratch?, meta?) -> ArrayBuffer | null (pass a reusable Float32Array scratchpad to avoid allocation; meta is optional JSON-serializable provenance such as { engine, label }; returns null if no frames). decodeCapture(arrayBufferOrView) -> { version, count, numPhases, frames, phases, tags, meta, counters, counterTags } — validates magic, version, and byte length before reading (a v3 capture carries counter data; v1/v2 decode with counters: []). downloadCapture(buffer, filename?) for browsers. LITECAP exposes the format constants.
FrameBudget / budgetMs / isOverBudget — budgets
FrameBudget.{FPS_30, FPS_60, FPS_120} in ms, budgetMs(targetFps), isOverBudget(frameMs, targetFps).
MeterHud — CPU overlay
new MeterHud(canvas, profiler, options?), render(), resize(w, h), setMaxMs(ms), destroy(). Renders the frame-time envelope via lite-canvas-graph with a small ms/fps readout. Worker / OffscreenCanvas friendly via the dpr option.
summarize / diffCaptures / checkRegression / assertNoRegression — comparison
summarize(profiler, meta?) -> CaptureSummary reduces a window to a small JSON-serializable object (frame avg/min/max/p01/p99/fps, jankRatio, spikeRatio, frameClass, histogram bins, per-phase stats, and a per-counter block { [tag]: { sum, avg, min, max, p01, p99, last, count } }) tagged with optional { label, engine, budgetMs }. summarizeCapture(decoded, meta?) does the same from a decoded .litecap. diffCaptures(baseline, candidate) returns per-metric { base, cand, delta, pct }. checkRegression(baseline, candidate, tolerances?) returns { ok, regressions, diff }; assertNoRegression(...) throws a RegressionError (carrying err.report) when a gated metric worsens beyond tolerance. DEFAULT_TOLERANCES gates frame.avg and frame.p99 at +10%.
The .litecap format
A flat little-endian buffer. Frames and each phase are stored oldest-first.
| Offset | Type | Field | Notes |
|---|---|---|---|
0 |
uint8[4] |
magic | 'L' 'C' 'A' 'P' |
4 |
uint8 |
version | 1, 2, or 3 |
5 |
uint32 LE |
count | samples per buffer |
9 |
uint8 |
phases | number of phase buffers |
10 |
float32[count] LE |
frames | total frame times |
10 + 4*count |
float32[count] LE |
phase p | repeated for p = 0 .. phases-1 |
Total size is 10 + (count * 4) + (phases * count * 4) bytes for v1. v2 appends a self-describing trailer after the float data: each phase tag as a uint8 length + UTF-8 bytes (registration order), then a uint32 length + UTF-8 JSON metadata blob. v3 appends one more trailer after the meta blob when counters are present: a uint8 counter count, each counter's count float32 samples (oldest-first), then each counter tag. A capture with no counters still emits v2, so older readers decode it unchanged. decodeCapture bounds-checks every trailer read and still decodes v1/v2 buffers (returning tags: [] / counters: [], meta: null), so older captures keep working with no consumer changes.
Capture comparison & regression gating
summarize() turns a rolling window into a small, self-describing snapshot; comparing two snapshots is how you prove a change did not cost performance. Because the profiler is engine-agnostic, the same workload can be captured under different builds and diffed — which is exactly how this pairs with the reactive stack: profile one graph under lite-signal 1.3.0, again under 1.4.0 (and later 1.7.0), and gate on the delta. A conformance suite says the engine is still correct; this says it is still fast.
import { summarize, assertNoRegression } from '@zakkster/lite-profiler';
// baseline: profile the workload on the current engine, save the summary as JSON
const baseline = summarize(profiler, { label: 'fan-out-1k', engine: 'lite-signal@1.3.0' });
// fs.writeFileSync('baseline.json', JSON.stringify(baseline));
// candidate: the same workload on the next engine
const candidate = summarize(profiler2, { label: 'fan-out-1k', engine: 'lite-signal@1.4.0-beta.1' });
// gate: throws RegressionError if a gated metric regressed beyond tolerance
assertNoRegression(baseline, candidate, {
'frame.avg': 0.10,
'frame.p99': 0.10,
'phase.render.p99': 0.15
});
The summary is plain JSON, so a baseline is git-diffable and lives next to the test. checkRegression() is the non-throwing form when you want to report rather than fail. fps is higher-is-better and gated in the opposite direction automatically; counters are lower-is-better. Metric paths are frame.<metric>, phase.<tag>.<metric>, or counter.<tag>.<metric> (see Deterministic counters).
Recipes
Hot-path handles (skip string hashing in the loop)
const PHYS = profiler.handle('physics');
const DRAW = profiler.handle('render');
// per frame:
profiler.beginAt(PHYS); step(); profiler.endAt(PHYS);
profiler.beginAt(DRAW); draw(); profiler.endAt(DRAW);
Percentiles for a phase
import { StatsMath } from '@zakkster/lite-stats-math';
const stats = new StatsMath(profiler.capacity);
const out = { avg: 0, min: 0, max: 0, p01: 0, p99: 0 };
stats.compute(profiler.phase('render'), out);
console.log('render p99 =', out.p99.toFixed(2), 'ms');
Capture, share, read back
import { encodeCapture, decodeCapture, downloadCapture } from '@zakkster/lite-profiler';
const scratch = new Float32Array(profiler.capacity); // reuse across captures, no per-call alloc
const buf = encodeCapture(profiler, scratch, { engine: 'lite-signal@1.4.0', label: 'session' });
if (buf) downloadCapture(buf, 'session.litecap');
// in an analysis tool later (v2 captures also carry tags + meta):
const { frames, phases, tags, meta } = decodeCapture(buf);
Live overlay
import { MeterHud } from '@zakkster/lite-profiler';
const hud = new MeterHud(document.querySelector('#meter'), profiler, { maxMs: 33 });
// after endFrame():
hud.render();
Roadmap
This package is the focused core. Each layer below ships separately so the core stays dependency-light and signal-free:
lite-profiler-signal— the reactive boundary (mirrorslite-camera->lite-camera-max): coarse telemetry as signals via a rAF-alignedlite-throttlepulse, pluslite-watch-expredicate / rolling-history watchers foronJankand per-phase regression alerts.lite-profiler-scheduler(headline adapter) — profilelite-schedulerpriority lanes: budget used vs allotted, overruns, per-lane percentiles. No private-field access; it rides the scheduler's public surface.lite-profiler-ecs— a documented adapter forlite-ecssystem timing.lite-profiler-gl— alite-glHUD backend rendering thousands of frames across many phases in a single instanced draw.- Diagnostic dashboard — a live showcase profiling a real workload (a
lite-soa-particle-engine/lite-fxstorm), withlite-hotkeyto toggle the overlay.
Capture comparison and regression gating (summarize / diffCaptures / assertNoRegression) landed in the core in 1.1.0; deterministic per-frame counters (count / countAt, counter.<tag>.<metric> gating, LiteCap v3) landed in 1.2.0. A Perfetto / Chrome-trace exporter is under consideration as a fast-follow, but it needs a timeline-capture layer first: the rings store durations, not absolute timestamps or a frame-to-phase association, so a faithful flame chart is a data-model addition rather than a formatting shim.
Testing
npm test # node --test (57 tests, zero dependencies)
npm run bundle-check # esbuild ESM bundle sanity check
prepublishOnly runs both. The suite uses the native node:test runner with no test-framework dependency. The hot path's zero-allocation property is a design contract (fixed-size typed-array ring buffers, no per-frame object creation); the suite verifies the capture window never grows past capacity, that buffers are reused across updates, and that absolute phase timestamps keep sub-millisecond precision at long-uptime performance.now() values.
License
MIT (c) 2026 Zahary Shinikchiev. See LICENSE.txt and CHANGELOG.md.