0.0.5 • Published 7 years ago

httprip v0.0.5

Weekly downloads
6
License
MIT
Repository
github
Last release
7 years ago

httprip - Manages queues of requests for data processing and collection.

Build Status

Usage

npm install httprip
var httprip = require("httprip");

var ripper = httprip()
    .processor(function(error, res, body, resolve) {
        // Perform parsing on body here.

        // Yield each item parse from body.
        ripper.yield("item" + Math.floor(Math.random() * 999));
        ripper.yield("item" + Math.floor(Math.random() * 999));

        // Resolve after we've finished processing.
        resolve();
    })
    .data(function(output) {
        console.log("Retrieved item:", output);
    });


// Queue requests.
ripper.enqueueRip({url: "http://google.com"});
ripper.enqueueRip({url: "http://yahoo.com"});
ripper.enqueueRip({url: "http://bing.com"});

// Wait for finish.
ripper.lastQueued().then(function() {
    console.log("done");
})

Explanation

This project provides a shorthand API for processing multiple web requests that may have a common format with the end goal of extracting data. For example, My oroginal goal for this project was to extract a JSON object with all fonts on 1001fonts.com. The code for doing that is contained in example.js

Processing behavior is done declaratively by the processor and data methods.

Functions supplied to processor are called with the usual request paramaters (error, response and body) as well as an additional resolve. resolve is a Promise resolver method and must be called at the end of processing, as processing may be asynchronous and httprip needs to know when the processing has been comleted. During the course of processing, you must run yield on the ripper object to submit a single element of data you are intending to collect.

Whilst the action you might take in the course of yielding an item could be done in the processor method, the yield and processor methods allow you to submit multiple methods that will all be run for processing and yielding.

To send a request off to the ripper, the enqueueRip method is used. This method accepts a single options parameter, which is the same type of object you would give to the request module, giving you the power to custom craft your requests in any way you want.

Waiting for the last queued request to finish can be done by using the lastQueued method, which gives you the Promise for the last request in the queue.

Methods

Method NameDescription
enqueueRipSubmits a request to the queue. options - Request Options
lastQueuedGets the promise (which fires on completion) for the last request in the queue.
processorAdds a processor function to the ripper processor chain, calling this function for each queued request.
dataAdds a data collector function to the ripper data collector chain. This method will be called for each yield.
yieldYields a single element to all data collector functions. item - Item for forwarding to data collectors
setRequesterSets the internal request instance. Useful for injecting a pre-made request object with defaults. requester - New request instance