1.1.0 • Published 4 years ago
readweb v1.1.0
readweb
Use Pareto principle to read the main content of a web page; no need to analyze markups.
Install
npm i readweb
Usage
const readweb = require('readweb');
readweb('https://en.wikipedia.org/wiki/Wikipedia', {
tags: ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6'],
paretoRatio: 0.7,
fetchOptions: {
highWaterMark: 1024 * 1024
},
toTextOptions: {
selectors: [{ selector: 'img', format: 'skip' }]
}
})
.then(console.log)
.catch(console.error);Options:
selectora cheerio selector, if specified, pareto algorithm will be skippedtagsan array of html tags to filter elements, e.g.['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6']paretoRatioshould be less than1.0but greater than0.5. Default:0.6toTextwhether convert the content to plain text. Default:truefetchOptionsoptions fed tofetch. See node-fetchtoTextOptionsoptions fed tohtml-to-text. See html-to-text
Major Changes
- Use
node-fetchinstead ofmake-fetch-happen; - Use
fetch-cookieto deal with cookies.