1.5.0 • Published 3 years ago

article-archiver v1.5.0

Weekly downloads
-
License
MIT
Repository
github
Last release
3 years ago

Article Archiver production

The purpose of this library is to convert online articles and blog posts into local markdown by only preserving:

  • article content
  • media assets
  • meta data

The heavy lifting around scraping is done with Cypress and the content is enhanced with Mozilla Readability.


Getting Started

⚠️ This library is under development and not expected to work until the TODO's are completed ⚠️

Installation

npm install -g article-archiver

Usage

npx article-archiver <urls>

Architecture

Architecture

TODO

  • setup cypress
  • configure cypress to scrape URL's
  • implement code cleaner and enhancer
  • implement readability
  • wire up scraper to enhancer
  • setup http server for tmp files
  • setup website-scraper
  • wire up archiver to save local assets to tmp folder
  • setup utf8 and turndown transformers
  • wire up transformer to merge meta data and write to output