1.0.3 • Published 4 years ago

sentinel-scraper v1.0.3

Weekly downloads
6
License
ISC
Repository
github
Last release
4 years ago

Scraper is a tool for scraping web througt of url and selectors

v. 1.0.3

Usage

Call to scraping tool:

const Scraper = require('sentinel-scraper');

Create a instance for scraping an url:

const scraping = new Scraper('The url that do need scraping');

Methods

1. SELECTOR

To scrape sections of a url through its selectors.

scraping.select(selector, expression); // It is necesary insert the parameters.

Parameters:

  1. selector: behaves as a '.querySlectorAll()'.
  2. expression (callback): currentValue, index (optional).

Run a method for scraping a page:

const data = scraping.select('#selector', item => {
  return item.children.item(0).href);
})

/* Output:
  data = [
    http//:www.example.com,
    http//:www.example.com,
    http//:www.example.com,
    http//:www.example.com,
    http//:www.example.com,
    http//:www.example.com,
  ]
*/

// Return an array with format you need. For example:

const data = scraping.select('#selector', item => {
  return {
    title: item.children.item(0).textContent,
    image: item.children.item(0).src,
    url: item.children.item(0).href,
    });
});

/* Output:
  data = [
    {
      title: 'lorem ipsum',
      image: http//:www.example.com/image/image.png,
      url: http//:www.example.com,
    },
    {
      title: 'lorem ipsum',
      image: http//:www.example.com/image/image.png,
      url: http//:www.example.com,
    },
    {
      title: 'lorem ipsum',
      image: http//:www.example.com/image/image.png,
      url: http//:www.example.com,
    }
  ]
*/

// Or create data in format you need without return nothing. For example:

const data = {};
scraping.select('#selector', (item, index) => {
  data[index] = [
    item.children.item(0).textContent,
    item.children.item(0).src,
    item.children.item(0).href,
  ];
});

/* Output:
  data = {
    1: [
      'lorem ipsum',
      http//:www.example.com/image/image.png,
      http//:www.example.com,
    ],
    2: [
      'lorem ipsum',
      http//:www.example.com/image/image.png,
      http//:www.example.com,
    ],
    3: [
      'lorem ipsum',
      http//:www.example.com/image/image.png,
      http//:www.example.com,
    ]
  ]
*/

2. FOR

It is a static method for to scrape an array of urls. It is a factory of new Scraper();

Scraper.for(urls, expression, completeUrl); // It is necesary insert the parameters.

Parameters:

  1. urls: array of urls.
  2. expression (callback): currentValue (instance of Scrape for url).
  3. completeUrl (optional): complete the elements of array.
const urls = [
  'http//:www.example.com/product/1',
  'http//:www.example.com/product/2',
  'http//:www.example.com/product/3',
  'http//:www.example.com/product/4',
  'http//:www.example.com/product/5'
];

Scraper.for(urls, (scrape, index, url) => {
  // the url return url parameter extracted
  // for example: scrape.select();
});

The static method for we will use it when we want scrape different depths.

1.0.3

4 years ago

1.0.2

4 years ago

1.0.1

4 years ago

1.0.0

4 years ago