1.0.0 • Published 5 months ago
llm-html-compressor v1.0.0
llm-html-compressor
A specialized HTML compressor designed to optimize HTML content for use as context with Large Language Models (LLMs). Removes unnecessary whitespace, comments, and other "noise" from HTML documents to make them more suitable for LLM processing while preserving semantically important content.
Installation
npm install llm-html-compressor
# or
yarn add llm-html-compressor
Why use llm-html-compressor?
When using HTML content as context for LLMs, unnecessary elements like whitespace, comments, and certain attributes can:
- Consume token quota without adding value
- Add noise that makes it harder for the LLM to focus on important content
- Increase the chance of context truncation
This library provides targeted optimizations specifically designed for LLM context usage, distinct from traditional HTML minifiers which focus on network transmission size.
Usage
Basic Usage
import { compress } from 'llm-html-compressor';
const html = `
<!DOCTYPE html>
<html>
<!-- This is a comment -->
<head>
<title>Example</title>
<style>
body { font-family: Arial, sans-serif; }
</style>
</head>
<body>
<div class="container" id="main">
<h1>Hello, World!</h1>
<p style="color: blue; font-size: 16px;">This is an example.</p>
</div>
<script>
console.log('Hello');
</script>
</body>
</html>
`;
const compressed = compress(html);
console.log(compressed);
// Output: <!DOCTYPE html><html><head><title>Example</title><style>body { font-family: Arial, sans-serif; }</style></head><body><div class="container" id="main"><h1>Hello, World!</h1><p style="color:blue;font-size:16px;">This is an example.</p></div><script>console.log('Hello');</script></body></html>
Advanced Usage with Custom Options
import { createCompressor } from 'llm-html-compressor';
const compressor = createCompressor({
removeComments: true,
collapseWhitespace: true,
removeEmptyAttributes: true,
removeStyleTags: true,
removeScriptTags: true,
preserveLineBreaks: false,
removeDataAttributes: true,
removeHiddenElements: true,
minifyInlineCSS: true,
removeClassAttributes: true,
removeIdAttributes: false
});
const html = `... your HTML here ...`;
const compressed = compressor.compress(html);
API
Functions
compress(html: string): string
Compresses HTML using default options.
createCompressor(options?: Partial<CompressionOptions>): HtmlCompressor
Creates a compressor instance with custom options.
Classes
HtmlCompressor
The main compressor class that can be instantiated directly.
import { HtmlCompressor } from 'llm-html-compressor';
const compressor = new HtmlCompressor(options);
const result = compressor.compress(html);
CompressionOptions
Option | Type | Default | Description |
---|---|---|---|
removeComments | boolean | true | Removes HTML comments |
collapseWhitespace | boolean | true | Collapses multiple whitespace characters into a single space |
removeEmptyAttributes | boolean | true | Removes attributes with empty values |
removeStyleTags | boolean | false | Removes <style> tags and their content |
removeScriptTags | boolean | false | Removes <script> tags and their content |
preserveLineBreaks | boolean | false | Preserves line breaks when collapsing whitespace |
removeDataAttributes | boolean | false | Removes data-* attributes |
removeHiddenElements | boolean | false | Removes elements with display:none or hidden attribute |
minifyInlineCSS | boolean | false | Minifies inline CSS in style attributes |
removeClassAttributes | boolean | false | Removes class attributes |
removeIdAttributes | boolean | false | Removes id attributes |
Use Cases
- Preprocessing HTML for RAG (Retrieval-Augmented Generation) systems
- Optimizing web page content for chatbots and assistants
- Reducing token usage when working with HTML documentation
- Making HTML content more digestible for code analysis with LLMs
License
MIT
1.0.0
5 months ago