0.0.4 • Published 9 years ago

meta-scrape v0.0.4

Weekly downloads
4
License
MIT
Repository
-
Last release
9 years ago

status

Meta-Scrape

Meta-Scrape is an npm package for web scraping purposes. You give it an URL, and it lets you easily get its title, links, images, description, keywords, meta tags....

It's forked from Metainspector. The only difference is, that you can define an content-type before scraping it. So you should receive correct text instead of strange one.

Metainspector is inspired by the Metainspector gem by jaimeiniesta

Scraped data

client.url                	# URL of the page
client.scheme             	# Scheme of the page (http, https)
client.host               	# Hostname of the page (like, markupvalidator.com, without the scheme)
client.rootUrl 			  	# Root url (scheme + host, i.e http://simple.com/)
client.title              	# title of the page, as string
client.links              	# array of strings, with every link found on the page as an absolute URL
client.author               # page author, as string
client.keywords             # keywords from meta tag, as array
client.charset              # page charset from meta tag, as string
client.description        	# returns the meta description, or the first long paragraph if no meta description is found
client.image              	# Most relevant image, if defined with og:image
client.images              	# array of strings, with every img found on the page as an absolute URL
client.feeds            	# Get rss or atom links in meta data fields as array
client.oGtitle      		# opengraph title

Usage

var client = new MetaInspector("http://www.google.com", {});

client.on("fetch", function(){
    console.log("Description: " + client.description;

    console.log("Links: " + client.links.join(","));
});

client.on("error", function(err){
	console.log(error);
});

client.fetch();

TO DO

Finish implementation of the properties below:

Add absolutify url function to return all urls as an absolute url

client.internal_links     	# array of strings, with every internal link found on the page as an absolute URL
client.external_links     	# array of strings, with every external link found on the page as an absolute URL

ZOMG Fork! Thank you!

You're welcome to fork this project and send pull requests. Just remember to include tests.

Copyright (c) 2009-2012 Gabriel Cebrian, released under the MIT license

Bitdeli Badge

0.0.4

9 years ago

0.0.3

9 years ago

0.0.2

9 years ago

0.0.1

9 years ago