1.0.1 • Published 3 years ago

fast-wasm-scraper v1.0.1

Weekly downloads
3
License
MIT
Repository
github
Last release
3 years ago

fast-wasm-scraper

Continuous integration

A fast alternative for JavaScript-based scraping tools, intended for both frontend and backend. fast-wasm-scraper is practically a wrapper for scraper (intended for parsing HTML and querying with CSS selectors) -- which compiles to WebAssembly.

Installation

$ yarn add fast-wasm-scraper

Examples

Loading

const { Document } = require('fast-wasm-scraper');
const doc = new Document('<html>Hello world!</html>');

doc.root.inner_html;
// => <html>Hello world!</html>

Querying

const { Document } = require('fast-wasm-scraper');
const html = `
<html>
  <body>
    <div>
      <ul>
        <li>One</li>
        <li>Two</li>
        <li>Three</li>
      </ul>
    </div>
  </body>
</html>
`;
const doc = new Document(html);

doc.root.query('li');
// => [
//      Element { name: 'li', inner_html: 'One',   ... },
//      Element { name: 'li', inner_html: 'Two',   ... },
//      Element { name: 'li', inner_html: 'Three', ... },
//    ]

Types

Document

propertytypeDescription
constructor(html: string) => DocumentTakes the raw html as a string and returns a new Document object
rootElementReturns the root element of the Document

Element

propertytypeDescription
namestringReturns the name of the element as a string, ex: 'div'
htmlstringReturns a string representation of this Element and it's descendants
inner_htmlstringReturns the inner content of this Element as a string
attributesMap<string, string>Returns the attributes as a Map<string, string>
query(query_str: string) => Array<Element>Returns an array of Elements from the resulting query
text() => Array<string>Returns an array of strings from descending text nodes

Benchmark

fast-wasm-scrapercheerioJsDOM
RuntimeWebAssembly (from Rust)JavaScriptJavaScript
Parsing, and querying with li, for a document with 100 list items
Sample size (#)877452
Speed (ops/s)539 (+/- 1.37%)318 (+/- 4.75%)38.2 (+/- 11.25%)
Speedup1.69x compared to cheerio, and 14x to JsDOM--

This benchmark was conducted on a rather modest dual core CPU and Node.js v.12.20.0. You can also run the benchmarks locally by cloning the GitHub repository.