Rehype-extract-article NPM

Rehype Extract Article

Extract the clean article contents from an HTML page. Remove classes, IDs, & flatten nested children.

Installation

npm install rehype-extract-article

Usage

In your script:

import { unified } from 'unified'
import rehypeRemark from 'rehype-remark'
import rehypeParse from 'rehype-parse'
import remarkStringify from 'remark-stringify'
import rehypeExtractArticle from 'rehype-extract-article'

const processor = unified()
  .use(rehypeParse)
  .use(rehypeExtractArticle)
  .use(rehypeRemark)
  .use(remarkStringify)

const htmlString = axios.get('http://some-blog.com/article')
const result = processor.processSync(htmlString)
console.log(result.value)

Running the above code with a valid htmlString will return a clean markdown containing the extracted contents from the original page.

Tests

Run npm test to run tests.

Run npm coverage to produce a test coverage report.

License

MIT © Goran Spasojevic

unist rehype hast html nlp

hast-util-sanitize hast-util-select hast-util-to-string unist-util-remove unist-util-visit

1.0.1

3 years ago

1.0.0

3 years ago