0.2.3 • Published 9 years ago

spoder v0.2.3

Weekly downloads
7
License
ISC
Repository
-
Last release
9 years ago

#Spoder

Spoder is a simple node.js crawler.

Constructor:

Spoder objects, or as I like to call it, spuds can be created like so:

var spud = new Spoder();

// or
var spud = Spoder(); // the constructor will handle the absence of the `new` keyword.

When you instantiate the object, you can pass an optional bunch of options.

By default, the options are like this:

{
	'agent': Spoder.defaultAgent,
	'headers': {
		'accept': "text/html,application/xml,application/xhtml+xml;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
		'accept-language': 'en-US,en;q=0.8',
		'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
	},
	'jar': require('request').Jar(),
	'root': null,
	'gzip': true,
	'pool': null
};

##Options:

###username: HTTP username.

###password: HTTP password.

###agent:

  • User-Agent for the upcoming requests.

###headers:

  • HTTP headers for the upcoming requests.

###jar:

  • Passed along to request as the jar parameter. Can hold cookies and allow you to login to sites and all kinds of cool stuff.
  • Set it to null to disable the cookiejar.

###root:

  • This is the root url that you will be working with, If it's a string, then before every request, the request's uri will be resolved from root with url.resolve.
  • Set it to null to stop using it.

###gzip:

  • Either true or false. Allows and decodes gzip responses.

###pool:

spud.setup

spod.setup('key', 'value');
spod.setup({key: 'value'});

This method is used to configure the spud.

You can either pass a single key and a value for that option, or pass an object containing the keys and values.

spod.setup('agent', 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36'); // set user agent.

spod.setup({
	'headers': {'X-SomeHeader':'SomeValue'}
});

###Extra-option(s):

The following options are also accepted in setup method.

####mh:

  • merge-headers is the actual form. But it was too long to type in every single time so it was shortened to mh.
  • The name pretty much says it all. mh allows the headers that you passed to be merged with the headers that were already present in the spud. Defaults to true.
var spud = new Spoder({headers: {'X-Key': '1234567890'}});

spud.setup('headers', {'X-Another-Key': '101010101010'}); // will be merged as `mh` defaults to `true`.

console.log(spud.settings.headers);

spud.setup({
	'headers': {'X-Main': 'Alpha'},
	'mh': false
}); // will replace the original headers. Now only `{'X-Main': 'Alpha'}` remains as the headers.

console.log(spud.settings.headers);

spud.req(options)

Makes a request. options should be an object.

The req method accepts all the parameters that the setup method accepts. The parameters will be used ONLY for the request. They will not affect the spud's settings.

The extra parameters that are available are:

uri:

The URI or a parsed url object from url.parse. URI can also be a path that is to be resolved from root.

method:

The http method to use.

qs:

The querystring that will appended to the uri.

form:

The form data. This parameter is directly passed to the request module. Quote from request module readme:

When passed an object or a querystring, this sets body to a querystring representation of value, and adds Content-type: application/x-www-form-urlencoded header.

body:

Payload for requests.

json:

If true, then body must be a JSON-serializable object.

This method returns a promise that will resolve to a Response object.

spud.get(uri, querystring)

Shorthand for:

spud.req({
	method: 'GET',
	uri: uri,
	qs: querystring
})

spud.head(uri, querystring)

Shorthand for:

spud.req({
	method: 'HEAD',
	uri: uri,
	qs: querystring
})

spud.limit(n)

Limits the parallel requests to n. Default value is 1.

spud.then(onComplete, onError, onProgress)

For now, it's used to insert jobs between two requests. These jobs can be chained and will be run one after the other. Any attempt to make a request will be delayed till this task is completed(till the promise is resolved or rejected)

var spud = require('spoder')();
var Q = require('q');
spud.get('http://somedomain.com').then(function(res){
	console.log('response recieved');
});

spud.get('http://somedomain.com').then(function(res){
	console.log('response recieved');
});

spud.then(function(){
	// will be run after all the requests have been completed
	console.log('all the requests have been completed');
	return Q.delay(2000).thenResolve('hi!')
}).then(function(value){
	// run after a delay of 2000ms
	console.log(value, 'sorry for the delay!') // hi! sorry for the delay!
});

// will be run after the above job is completed.

spud.get('http://somedomain.com').then(function(res){
	console.log('response recieved');
});
0.2.3

9 years ago

0.2.2

9 years ago

0.2.1

9 years ago

0.2.0

9 years ago

0.1.2

10 years ago

0.1.1

10 years ago