0.0.3 • Published 9 months ago

content-to-reader v0.0.3

Weekly downloads
-
License
ISC
Repository
github
Last release
9 months ago

content-to-reader

Extract meaningful content from any website and turn it into an EPUB file. Send it to your device using your Gmail account if you want. content-to-reader

How to install and use

  1. Install it globally using NPM (or any package manager)
npm i content-to-reader -g
  1. Generate configuration file template by running
content-to-reader get-config config.yaml
  1. Edit configuration file and run this command to create an EPUB and/or send it to your Kindle
content-to-reader create -c ./config.yaml
  1. Enjoy your articles content-to-reader

If you run into any issues refer to FAQ section below.

Use cases

Here are a few use cases and ideas that you may use as a hint

EPUB from a single URL

content-to-reader create https://welldone.com/@user/10_easy_steps_to_whatever

I want to choose what to extract

Sometimes you want to pick elements from a target website yourself or maybe default extraction didn't work well for you. Use selectors.

output: "./news.epub"
pages:
  - "https://clickbaitnews.com/article/some_article_12msad1"
  - url: "https://welldone.com/@user/10_easy_steps_to_whatever"
    selectors:
      - name: "Header"
        first: ".page-content header"
      - all:
          ".page-content .contents":
            [
              "h1",
              "h2",
              "h3",
              "h4",
              "h5",
              "p",
              "code",
              { ".custom-tip": ["p", "div", ".some-class": ["a", "p"]] },
            ]
      - first: ".page-content .comment-section"

Selectors let you pick elements from a target website using CSS Selectors. You can select first or all queried elements to be included in the final EPUB.

Final EPUB will contain all of the elements found by selectors.

You can generate longer CSS Selectors without repetition using YAML's dictionaries and arrays, for example:

- all:
    ".page-content .contents": ["h1", "h2", "h3"]

equals

- all: ".page-content .contents .h1, .page-content .contents .h2, .page-content .contents .h3"

You can nest dictionaries in arrays recursively.

Send to Kindle

content-to-reader allows you to use services like Amazon's "Send To Kindle":

toDevice:
  deviceEmail: your_kindle_A3BcD2@kindle.com
  senderEmail: your_email@gmail.com
  senderPassword: "your password"
pages:
  - https://welldone.com/@user/10_easy_steps_to_whatever

If you've never sent to Kindle using email before, there are a few steps to follow in order to make this work.

First, whitelist your email address in Amazon then create application password for your Gmail account so you can use it in .yaml config file. And that should do it.

Currently only Gmail's SMTP server is supported.

Configuration file template and documentation

# Filename or output path of a result EPUB file. Not required if `toDevice` present.
output: "news.epub"
# In this section you configure automatic sending of a result EPUB file to your device using your Gmail account. Your credentials aren't stored in any way and are used solely for sending a result file to your device. Currently only Gmail is supported. Not required if `output` present.
toDevice:
  # This is an email address of your reader device (ex. Kindle reader).
  deviceEmail: ""
  # This is an email address of your Gmail acccount
  senderEmail: ""
  # This is an application password for your Gmail account. Read up how to generate one: https://support.google.com/mail/answer/185833?hl=en
  senderPassword: ""
# In this section you configure content present in the result EPUB file.
pages:
    # You can extract content automatically by passing URL only.
  - "https://page.com"
    # Or use selectors to pick what you want.
  - url: "https://page.com"
    selectors:
        # You can select first element encountered...
      - name: "Header" # Name is not required but it may help debugging
        first: ".page-content header"
        # ... or all of them.
      - name: "Content"
        all:
          # Use nested selectors to create verbose element queries
          ".page-content .contents":
            [
              "h1",
              "h2",
              "h3",
              "h4",
              "h5",
              "p",
              "code",
              { ".custom-tip": ["p", "div", ".some-class": ["a", "p"]] },
            ]

FAQ

Is your email address known by Amazon? If not then whitelist your email address in Amazon.

Isn't your file too big? Remember that "Send to Kindle" imposes 50mb limit.

Sometimes Amazon just rejects a file for whatever reason. You can use Calibre as a last resort and let it do its magic so Amazon accepts your file. There's a ton of material on this on the Internet.

You can't use your regular Gmail password. Create application password for your Gmail account here: https://support.google.com/mail/answer/185833?hl=en. Now you can use it in .yaml config file.

Currently there is no way to change this behaviour.

License

Licensed under The Prosperity Public License 3.0.0.

Contributions

Any contributions are welcome. If you have an idea or you spotted a bug feel free to open an issue or a pull request.