Html-aggregator NPM

html-aggregator

Aggregate html snippets from other pages.

Usage

Install with npm install -g html-aggregator.

Run with html-aggregator --templateDir=<directory> --output=<file> --maxLen=<number> input files....

templateDir contains json files that define how to extract data from HTML files:

{
    "selectors": {
        "title": "header.post-header h1",
        "content": "article.post-content"
    },
    "static": {
        "name": "My Name"
    }
}

The values in selectors are CSS selectors that are applied to the input HTML files. static contains static strings.

output is a file defining how to render the scraped data:

<h1>%title%</h1>
<div>By %name%</div>
<div>%content%</div>

The variables defined in a template are referenced by the expression%var%.

For every occurrence of <aggregate url="..." template="..."></aggregate> in every input file

the contents of the given URL is fetched
the contents is parsed with the given template
if the input file name has the form <name>.html.<ext>
a new file <name>.html is created where all <aggregate>s are replaced by the output file having its variables replaced.
Otherwise, <aggregate>'s child nodes are replaced with the output file having its variables replaced.