Blog-backup NPM

blog-backup [https://github.com/jiacai2050/blog-backup/actions/workflows/ci.yml] [https://www.npmjs.com/package/blog-backup]

Backup blog posts to PDF for offline backup / reading in Kindle, built with Puppeteer and ClojureScript.

Setup * Npm #+begin_src bash npm i blog-backup -g #+end_src Before you start, you need a config. Copy [./config.edn] to one of following positions: 1. =${HOME}/.blogbackup.edn= 2. =${XDG_CONF_HOME}/blogbackup/config.edn=

That's it. *** Manual #+begin_src bash

1. install dependencies

npm i

2. compile Node.js script to bin/main.js

lein release

or download already compiled main.js from release page

https://github.com/jiacai2050/blog-backup/releases

3. init config

ln -sf $(pwd)/config.edn ~/.blogbackup.edn #+end_src

Usage #+begin_src bash blog-backup -h -o, --out-dir Dir /tmp/blog-backup Output dir -w, --who Who Whose blog to backup -u, --url URL Save URL as PDF -c, --conf Config ~/.blogbackup.edn Config file -m, --merge-dir input-dir Merge PDFs in dir as one -p, --proxy Proxy HTTP Proxy -P, --puppeteer-opts OPTS Options to set on the browser. format: a=b;c=d -M, --media Media print Media type -u, --user-agent UserAgent UserAgent -v, --verbose -V, --version -h, --help #+end_src * Backup All posts of a blog Demo, backup nathan's posts to /tmp/nathan #+begin_src blog-backup -w nathan -o /tmp/nathan -v #+end_src **** Supported Blog

| who | blog | |--------+-------------------------------------------------| | ljc | https://liujiacai.net/ | | yw | http://www.yinwang.org/ | | yw-wp | https://yinwang0.wordpress.com/author/yinwang0/ | | nathan | http://nathanmarz.com/ | | TBD | ... |

**** Extension In order to backup any posts of a blog, you can try add a new item in [file:config.edn], such as #+begin_src clojure {:id "grab" :base-url "https://engineering.grab.com/" :posts-selector "h2 > a.post-title" :page-tmpl "{{base-url}}{{^first-page}}blog/{{page-num}}/{{/first-page}}" :total-page 18} #+end_src

=base-url=, archives page of a blog
=posts-selector=, selector passed to =document.querySelectorAll= to select all posts. (usually end with =a= tag)
=page-tmpl=, [https://github.com/fotoetienne/cljstache] template used for render url of different archive pages, it's rendered with three keys:
- =base-url= string
- =first-page= boolean
- =page-num= number
=total-page= total num of archives page

Backup single page #+begin_src bash blog-backup -u https://news.ycombinator.com/item?id=23947157 #+end_src Backup all PDFs to one #+begin_src bash blog-backup -m /input-pdfs-dir -o /tmp/output-dir #+end_src *** Hack You can also set http proxy with =-p= or puppeteer.launch [https://pptr.dev/#?product=Puppeteer&version=v5.2.1&show=api-puppeteerlaunchoptions] with =-P= #+begin_src bash blog-backup -p socks5://127.0.0.1:13659 -a https://news.ycombinator.com/item?id=23947157 blog-backup -p socks5://127.0.0.1:13659 -P 'timeout=60;devtools=true' -a https://news.ycombinator.com/item?id=23947157 #+end_src

ChangeLog ** 0.4.1 (2021-06-14)

Config support [https://wiki.archlinux.org/title/XDG_Base_Directory]. ** 0.5.0 (2021-06-20)
Merge all posts to one =all-posts-merged.pdf= file after iterate. ** 0.5.1 (2021-06-20)
Support Merge all posts in command line options directly.