1.0.0 • Published 5 months ago

llm-html-compressor v1.0.0

Weekly downloads
-
License
MIT
Repository
github
Last release
5 months ago

llm-html-compressor

npm version License: MIT

A specialized HTML compressor designed to optimize HTML content for use as context with Large Language Models (LLMs). Removes unnecessary whitespace, comments, and other "noise" from HTML documents to make them more suitable for LLM processing while preserving semantically important content.

Installation

npm install llm-html-compressor
# or
yarn add llm-html-compressor

Why use llm-html-compressor?

When using HTML content as context for LLMs, unnecessary elements like whitespace, comments, and certain attributes can:

  1. Consume token quota without adding value
  2. Add noise that makes it harder for the LLM to focus on important content
  3. Increase the chance of context truncation

This library provides targeted optimizations specifically designed for LLM context usage, distinct from traditional HTML minifiers which focus on network transmission size.

Usage

Basic Usage

import { compress } from 'llm-html-compressor';

const html = `
<!DOCTYPE html>
<html>
  <!-- This is a comment -->
  <head>
    <title>Example</title>
    <style>
      body { font-family: Arial, sans-serif; }
    </style>
  </head>
  <body>
    <div class="container" id="main">
      <h1>Hello, World!</h1>
      <p style="color: blue; font-size: 16px;">This is an example.</p>
    </div>
    <script>
      console.log('Hello');
    </script>
  </body>
</html>
`;

const compressed = compress(html);
console.log(compressed);
// Output: <!DOCTYPE html><html><head><title>Example</title><style>body { font-family: Arial, sans-serif; }</style></head><body><div class="container" id="main"><h1>Hello, World!</h1><p style="color:blue;font-size:16px;">This is an example.</p></div><script>console.log('Hello');</script></body></html>

Advanced Usage with Custom Options

import { createCompressor } from 'llm-html-compressor';

const compressor = createCompressor({
  removeComments: true,
  collapseWhitespace: true,
  removeEmptyAttributes: true,
  removeStyleTags: true,
  removeScriptTags: true,
  preserveLineBreaks: false,
  removeDataAttributes: true,
  removeHiddenElements: true,
  minifyInlineCSS: true,
  removeClassAttributes: true,
  removeIdAttributes: false
});

const html = `... your HTML here ...`;
const compressed = compressor.compress(html);

API

Functions

compress(html: string): string

Compresses HTML using default options.

createCompressor(options?: Partial<CompressionOptions>): HtmlCompressor

Creates a compressor instance with custom options.

Classes

HtmlCompressor

The main compressor class that can be instantiated directly.

import { HtmlCompressor } from 'llm-html-compressor';

const compressor = new HtmlCompressor(options);
const result = compressor.compress(html);

CompressionOptions

OptionTypeDefaultDescription
removeCommentsbooleantrueRemoves HTML comments
collapseWhitespacebooleantrueCollapses multiple whitespace characters into a single space
removeEmptyAttributesbooleantrueRemoves attributes with empty values
removeStyleTagsbooleanfalseRemoves <style> tags and their content
removeScriptTagsbooleanfalseRemoves <script> tags and their content
preserveLineBreaksbooleanfalsePreserves line breaks when collapsing whitespace
removeDataAttributesbooleanfalseRemoves data-* attributes
removeHiddenElementsbooleanfalseRemoves elements with display:none or hidden attribute
minifyInlineCSSbooleanfalseMinifies inline CSS in style attributes
removeClassAttributesbooleanfalseRemoves class attributes
removeIdAttributesbooleanfalseRemoves id attributes

Use Cases

  • Preprocessing HTML for RAG (Retrieval-Augmented Generation) systems
  • Optimizing web page content for chatbots and assistants
  • Reducing token usage when working with HTML documentation
  • Making HTML content more digestible for code analysis with LLMs

License

MIT