1.0.3 • Published 4 years ago
sentinel-scraper v1.0.3
Scraper is a tool for scraping web througt of url and selectors
v. 1.0.3
Usage
Call to scraping tool:
const Scraper = require('sentinel-scraper');
Create a instance for scraping an url:
const scraping = new Scraper('The url that do need scraping');
Methods
1. SELECTOR
To scrape sections of a url through its selectors.
scraping.select(selector, expression); // It is necesary insert the parameters.
Parameters:
- selector: behaves as a '
.querySlectorAll()
'. - expression (callback): currentValue, index (optional).
Run a method for scraping a page:
const data = scraping.select('#selector', item => {
return item.children.item(0).href);
})
/* Output:
data = [
http//:www.example.com,
http//:www.example.com,
http//:www.example.com,
http//:www.example.com,
http//:www.example.com,
http//:www.example.com,
]
*/
// Return an array with format you need. For example:
const data = scraping.select('#selector', item => {
return {
title: item.children.item(0).textContent,
image: item.children.item(0).src,
url: item.children.item(0).href,
});
});
/* Output:
data = [
{
title: 'lorem ipsum',
image: http//:www.example.com/image/image.png,
url: http//:www.example.com,
},
{
title: 'lorem ipsum',
image: http//:www.example.com/image/image.png,
url: http//:www.example.com,
},
{
title: 'lorem ipsum',
image: http//:www.example.com/image/image.png,
url: http//:www.example.com,
}
]
*/
// Or create data in format you need without return nothing. For example:
const data = {};
scraping.select('#selector', (item, index) => {
data[index] = [
item.children.item(0).textContent,
item.children.item(0).src,
item.children.item(0).href,
];
});
/* Output:
data = {
1: [
'lorem ipsum',
http//:www.example.com/image/image.png,
http//:www.example.com,
],
2: [
'lorem ipsum',
http//:www.example.com/image/image.png,
http//:www.example.com,
],
3: [
'lorem ipsum',
http//:www.example.com/image/image.png,
http//:www.example.com,
]
]
*/
2. FOR
It is a static method for to scrape an array of urls. It is a factory of new Scraper();
Scraper.for(urls, expression, completeUrl); // It is necesary insert the parameters.
Parameters:
- urls: array of urls.
- expression (callback): currentValue (instance of Scrape for url).
- completeUrl (optional): complete the elements of array.
const urls = [
'http//:www.example.com/product/1',
'http//:www.example.com/product/2',
'http//:www.example.com/product/3',
'http//:www.example.com/product/4',
'http//:www.example.com/product/5'
];
Scraper.for(urls, (scrape, index, url) => {
// the url return url parameter extracted
// for example: scrape.select();
});
The static method for we will use it when we want scrape different depths.