Cheers2 NPM | npm.io

Cheers

Scrape a website efficiently, block by block, page by page.

Motivations

This is a Cheerio based scraper, useful to extract data from a website using CSS selectors. The motivation behind this package is to provide a simple cheerio-based scraping tool, able to divide a website into blocks, and transform each block into a JSON object using CSS selectors.

Built on top of the excellents :

https://github.com/cheeriojs/cheerio https://github.com/chriso/curlrequest https://github.com/kriskowal/q

CSS mapping syntax inspired by :

https://github.com/dharmafly/noodle

Getting Started

Install the module with: npm install cheers

Usage

Configuration options:

config.url : the URL to scrape
config.blockSelector : the CSS selector to apply on the page to divide it in scraping blocks. This field is optional (will use "body" by default)
config.scrape : the definition of what you want to extract in each block. Each key has two mandatory attributes : selector (a CSS selector or . to stay on the current node) and extract. The possible values for extract are text, html, outerHTML, a RegExp or the name of an attribute of the html element (e.g. "href")

Roadmap

Option to use request instead of curl
Option to change the user agent
Command line tool
Website pagination
Option to use a headless browser
Unit tests

Contributors

Cheers!

License

scraper curl blocks cheers request scrape website pagination css selector cheerio q curlrequest regexp

q cheerio curlrequest

@infinitebrahmanuniverse/nolb-chee @everything-registry/sub-chunk-1318 @zalastax/nolb-chee

0.4.2

11 years ago

0.4.1

11 years ago

0.4.0

11 years ago