0.1.3 • Published 10 years ago

html-scraper v0.1.3

Weekly downloads
6
License
-
Repository
github
Last release
10 years ago

#HTML Scraper The scraper has three components: httpsplitextract, executed in the same order. You make an http request to fetch a page, split the page into different sections and ultimately extract the data from each section using a custom parser.

The best part about this scraper is that you can create a chain of actions that you need to perform.

###Order of execution httpsplitextracthttpsplit and so on…

###Example Say I want extract al the information of students who got admitted to the University of Southern California, Los Angeles. I would do it as follows —

  1. Make an http request to this page — http://edulix.com/universityfinder/university_of_southern_california.
  2. Page consists of multiple anchor tags containing links of each
	# Standard require
    Scraper = require 'HTML-Scraper'
	
    # Specify the key to read urls
    Scraper().http 'url'
    .split '.archive a'
    .extract (doc) ->
        href: "http://tusharm.com" +  doc.attr 'href'
        text: doc.html()
    .http 'href'
    .extract ($) ->
        http: $('a:nth-child(2)').attr('href')
    
    #Launch with base params
    .$launch  url: 'http://tusharm.com/projects.html'
    
    #Returns a promise
    .then (val) -> console.log val
    .done()
0.1.3

10 years ago

0.1.2

10 years ago

0.0.6

11 years ago

0.0.5

11 years ago

0.0.4

11 years ago

0.0.3

11 years ago

0.0.2

11 years ago

0.0.1

11 years ago