1.0.1 • Published 3 years ago
rehype-extract-article v1.0.1
Rehype Extract Article
Extract the clean article contents from an HTML page. Remove classes, IDs, & flatten nested children.
Installation
npm install rehype-extract-article
Usage
In your script:
import { unified } from 'unified'
import rehypeRemark from 'rehype-remark'
import rehypeParse from 'rehype-parse'
import remarkStringify from 'remark-stringify'
import rehypeExtractArticle from 'rehype-extract-article'
const processor = unified()
.use(rehypeParse)
.use(rehypeExtractArticle)
.use(rehypeRemark)
.use(remarkStringify)
const htmlString = axios.get('http://some-blog.com/article')
const result = processor.processSync(htmlString)
console.log(result.value)
Running the above code with a valid htmlString
will return a clean markdown containing the extracted contents from the original page.
Tests
Run npm test
to run tests.
Run npm coverage
to produce a test coverage report.