1.0.3 • Published 15 days ago

html-torch v1.0.3

Weekly downloads
-
License
MIT
Repository
-
Last release
15 days ago

html-torch Open Graph Image

html-torch 🔥

html-torch is a library designed to clean up HTML by removing everything but the tags meaningful to Large Language Models (LLMs). It strips away unnecessary scripts, styles, attributes, and more to tidy up HTML content.

Getting Started

Installation

npm install html-torch

Usage Examples

Here's a basic example of how to use html-torch to clean up an HTML file:

import htmlTorch from 'html-torch';

const html = '<html>....</html>';
const { torchedHTML, summaryJSON } = await htmlTorch(html);
const { elements, selectors } = summaryJSON;

// html (Original) -> 1.4MB
// torchedHTML (Torched) -> 179KB
// elements (Summary JSON) -> 43KB

For more options and detailed usage, refer to the html-torch.ts file.

Node Version Management

Before running this project locally, set up the Node.js version and install the necessary packages using the following commands:

nvm install
nvm use
npm install

Running Tests

To ensure everything is working correctly, you can run the tests using the following command:

npm test

License

This project is licensed under the MIT License - see the LICENSE file for details.

1.0.3

15 days ago

1.0.2

21 days ago

1.0.1

21 days ago

1.0.0

22 days ago

0.0.2

22 days ago