1.0.1 • Published 10 years ago

vacuumjs v1.0.1

Weekly downloads
1
License
MIT
Repository
-
Last release
10 years ago

vacuumjs

A low-level node.js web page content extractor based on parse5.

Build Status codecov

Usage

var extract = require('vacuumjs')
var targetDOM = parse5.parse('some page content')
// the reference dom, not optional
var refDOM = parse5.parse('reference page content')
console.log(extract(targetDOM, refDOM))

Principium

  • Layout similairity
  • Text density