@silyze/html-prompt-utils v1.0.0
Html Prompt Utils
HTML Prompt Utils is a lightweight toolkit for turning raw HTML into a prompt‑friendly, compressed JSON representation – perfect for LLM prompts, diffing, or storing a minimal DOM snapshot. It consists of a streaming HTML serializer and a selective compression algorithm that keeps only the semantic parts of the DOM (text, ids, classes, and a curated list of attributes).
Features
| Feature | Description |
|---|---|
| Streaming HTML → AST | Parse HTML incrementally with HTMLSerializer, producing a simple DocumentNode tree. |
| Lossy compression | compressNode collapses the tree into a terser CompressedNode, merging selectors and discarding irrelevant markup. |
| Attribute whitelisting | Only meaningful attributes (e.g. href, src, value, …) are preserved, keeping output compact and deterministic. |
| Ignore tags | Configure tags (default: head, script, iframe, …) that should be excluded entirely during parsing. |
| Tree‑agnostic | Works with full HTML strings, server‑sent chunks, or any ReadableStream you wrap in HTMLTextStream. |
Installation
npm install @silyze/html-prompt-utilsQuick start
import {
HTMLTextStream,
HTMLSerializer,
compressNode,
} from "@silyze/html-prompt-utils";
// 1) Wrap your HTML (string | Promise<string>) in a stream helper
const html = new HTMLTextStream(
`<div id="app"><p>Hello <strong>world</strong></p></div>`
);
// 2) Parse it → DocumentNode (AST)
const doc = await HTMLSerializer.parse(html);
// 3) Compress the AST for prompt usage
const compressed = compressNode(doc);
console.log(JSON.stringify(compressed, null, 2));
/*
{
"div#app": {
"p": {
"text": [
"Hello",
"world"
]
}
}
}
*/API Reference
All exports live off the package root:
import {
compressNode,
HTMLTextStream,
HTMLSerializer,
DocumentNode,
CompressedNode,
HTMLPipeTarget,
HTMLStream,
} from "@silyze/html-prompt-utils";Types
| Type | Description |
|---|---|
DocumentNode | A minimal DOM‑like interface { name, attributes?, children? }. |
CompressedNode | A recursively compressed representation (see Compression format below). |
HTMLStream | Object with pipeTo(target) for pumping data into a consumer. |
HTMLPipeTarget | { write(chunk), end(chunk?) } – anything that accepts streamed chunks (e.g. htmlparser2.Parser). |
HTMLTextStream (class)
Wraps a string or Promise<string> as an HTMLStream so it can be consumed by the parser.
new HTMLTextStream(src: string | Promise<string>): HTMLTextStreamHTMLSerializer (class)
| Member | Signature | Notes |
|---|---|---|
constructor(ignoreTags?: string[]) | Creates a serializer instance. | |
static defaultIgnoreTags | readonly string[] – defaults to ["head","script","iframe","meta","style","link"]. | |
static parse(html: HTMLStream, ignoreTags?, options?) | Convenience that builds a Parser (from htmlparser2), feeds it, and resolves to a DocumentNode. | |
root: Promise<DocumentNode> | Promise of the final tree (same as return from parse). | |
currentRoot: DocumentNode | Synchronous access while streaming (advanced). |
Under the hood it implements the htmlparser2.Handler interface (onopentag, ontext, …) so you can wire it manually when needed.
compressNode(root: DocumentNode): CompressedNode | undefined
Traverse a DocumentNode and return a compressed version or undefined if the node is entirely ignorable (e.g. whitespace only).
Compression format
- Outermost keys are CSS‑like selectors –
tag#id.class1.class2. - Special key
textholds raw text content. - If a selector or
textcontains a single child, the array wrapper is stripped. - Attributes are expressed as additional selectors like
[href],[value],[placeholder]. - Only attributes in the preservation whitelist are ever kept:
const preserveAttributes = [
"type",
"placeholder",
"value",
"min",
"max",
"name",
"src",
"alt",
"href",
"target",
"action",
"for",
"selected",
"checked",
"multiple",
"list",
];contenteditable="true"is also kept.- Deep single‑child chains collapse:
<div><span><a>…⇒ selectordiv span a.
Advanced usage
Custom streaming source
import { PassThrough } from "node:stream";
const pass = new PassThrough();
const stream = {
pipeTo: (target) => pass.on("data", target.write).on("end", target.end),
};
const serializer = new HTMLSerializer();
const parser = new Parser(serializer);
stream.pipeTo(parser);
pass.write('<p streaming="yes">');
pass.write("Hello");
pass.end("</p>");
const doc = await serializer.root;Changing ignored tags
const doc = await HTMLSerializer.parse(htmlStream, [
/* your tags */
]);5 months ago