html-aggregator v0.0.12
html-aggregator
Aggregate html snippets from other pages.
Usage
Install with npm install -g html-aggregator
.
Run with html-aggregator --templateDir=<directory> --output=<file> --maxLen=<number> input files...
.
templateDir
contains json files that define how to extract data from HTML files:
{
"selectors": {
"title": "header.post-header h1",
"content": "article.post-content"
},
"static": {
"name": "My Name"
}
}
The values in selectors
are CSS selectors that are applied to the input HTML files. static
contains static strings.
output
is a file defining how to render the scraped data:
<h1>%title%</h1>
<div>By %name%</div>
<div>%content%</div>
The variables defined in a template are referenced by the expression%var%
.
For every occurrence of <aggregate url="..." template="..."></aggregate>
in every input file
- the contents of the given URL is fetched
- the contents is parsed with the given template
if the input file name has the form
<name>.html.<ext>
a new file
<name>.html
is created where all<aggregate>
s are replaced by theoutput
file having its variables replaced.Otherwise,
<aggregate>
's child nodes are replaced with theoutput
file having its variables replaced.