1.5.0 • Published 3 years ago
article-archiver v1.5.0
Article Archiver
The purpose of this library is to convert online articles and blog posts into local markdown by only preserving:
- article content
- media assets
- meta data
The heavy lifting around scraping is done with Cypress and the content is enhanced with Mozilla Readability.
Getting Started
⚠️ This library is under development and not expected to work until the TODO's are completed ⚠️
Installation
npm install -g article-archiver
Usage
npx article-archiver <urls>
Architecture
TODO
- setup cypress
- configure cypress to scrape URL's
- implement code cleaner and enhancer
- implement readability
- wire up scraper to enhancer
- setup http server for tmp files
- setup website-scraper
- wire up archiver to save local assets to tmp folder
- setup utf8 and turndown transformers
- wire up transformer to merge meta data and write to output