1.1.0 • Published 5 years ago

scrape-feed v1.1.0

Weekly downloads
6
License
ISC
Repository
github
Last release
5 years ago

scrape-feed

npm version CircleCI ISC License

Reads the contents of JSON, RSS, and Atom feeds from a URL.

Installation

npm install scrape-feed

Usage

Simple use

const { scrapeFeed } = require("scrape-feed")

const feed = await scrapeFeed("https://www.mattmoriarity.com/feed.json")

feed will have information pulled from the feed. See ScrapedFeed in src/index.ts for the structure of feed here.

scrape-feed supports JSON Feed as well as Atom and RSS through feedparser. All feed types produce the same structure, so it's a bit lossy in that way: not all feed information is captured.

Using caching headers

If you are polling feeds regularly and would like to avoid extra work, you can hang on to feed.cachingHeaders and provide it again when you next poll the feed. The caching headers include the Etag and Last-Modified response headers if the response included them. If they are provided when scraping, they will be used to set the If-None-Match and If-Modified-Since request headers, respectively.

A well-behaved server, when given these headers, will return a 304 Not Modified response with no body as long as the content hasn't changed, in which case scrapeFeed will just return null. If you get a null, you can go along your merry way and be happy you didn't waste that bandwidth and those CPU cycles.

const { cachingHeaders } = feed
const feedAgain = await scrapeFeed(
  "https://www.mattmoriarity.com/feed.json",
  cachingHeaders
)
// => null