2.0.0 • Published 6 years ago

spiders v2.0.0

Weekly downloads
5
License
MIT
Repository
github
Last release
6 years ago

SPIDERS

Crawl web pages efficiently

Feautures

  • Persistance
  • Optimization
  • Light weight

Installation

npm install spiders

or

yarn add spiders

Simple Usage Demo

ES6 syntax:

let Spiders = require('spiders');
let spidy = new Spider();
//Crawl
spidy.crawl( 'http://urltoscrape' )
	.then( $ => {
		let title = $("title").text();//Jquery functions
		console.log(title);
	})
	

Options

Options can be passed as arguement during object intialization.

The options supports following

{
	persist : './fileToStore',
	toStore : (params,url) => {
	},
	fromStore : (obj ,params, url){
	}
}
  • persist - Used for persistance. See below briefly

  • toStore - returns a object to tell spider how to store given url and params

  • fromStore - specify match condition for the given object & url & params

Persistance

let spider = new Spider({persist:'./songs'});

spider.persist().then(()=>{
	// Spiders gets loaded with previous scraped details
	// Scrape fn here.
})

Methods

crawl( url , params)

Demo

	let Spider = require('spiders');
	let songSpidy = new Spider({
		persist:"./persist/song",
		toStore: (url,params){
			return {url}
		},
		fromStore: (obj,url,params){
			return obj.url == url;
		}
	});
	songspidy.persist().then(scrape);
	
	function scrape(){
		songspidy.scrape('pathtoSong',{lng:'en'}).then($=>{
	let title = $("title").text();
})

Note

For more clarity read my blog on Medium

2.0.0

6 years ago

1.0.5

6 years ago

1.0.4

6 years ago

1.0.3

6 years ago

1.0.2

6 years ago

1.0.1

6 years ago

1.0.0

6 years ago