1.0.0 • Published 5 years ago

webthief v1.0.0

Weekly downloads
4
License
MIT
Repository
github
Last release
5 years ago

WebThief

Build Status npm version npm license npm repository npm author

Promise and Callback based website-info getter using meta data of websites.

Features

  • Get any web page source code with webthief
  • Get any website logo, title and description
  • Support modren metatag scraping
  • Fully promise and callback based
  • Support with ES6 async/await
  • Support multiple metatag scraping

Support

ES5ES6CallbackPromiseasync/await

Installing

$ npm install webthief

Some Basic Meta Tags in HTML

<meta name="description" content="Website info api"/>
<meta name="keywords" content="webthief, api, nodejs, python"/>
<meta name="subject" content="website subject">
<meta name="copyright" content="nepsho">
<meta name="language" content="en">
<meta name="robots" content="index,follow" />
<meta name="revised" content="Saturday, May 9th, 2019, 0:00 am" />
<meta name="abstract" content="any abstract">
<meta name="topic" content="any topic">
<meta name="summary" content="any summary">
<meta name="author" content="bcrazydreamer, bcrazydreamer@gmail.com">
<meta name="designer" content="bcrazydreamer">
<meta name="reply-to" content="bcrazydreamer@gmail.com">
<meta name="url" content="https://nepsho.github.io/">
<meta name="category" content="any category">

Some OpenGraph Meta Tags in HTML

<meta name="og:title" content="webthief"/>
<meta name="og:type" content="API"/>
<meta name="og:url" content="https://nepsho.github.io/"/>
<meta name="og:image" content="https://nepsho.github.io/lib/img/logo.png"/>
<meta name="og:email" content="bcrazydreamer@gmail.com"/>
<meta name="og:phone_number" content="123-456-7890"/>

Supported meta fields by webthief

S. Noabcd
1logodescriptiontitlekeywordssubject
2copyrightlanguagerobotsrevisedabstract
3reply-totopicsummaryauthordesigner
4country-nameurlcategorysite_nameemail
5phone_number

Examples

const webthief = require("webthief");

To get html of any webpage:

/* Callback method */
webthief.getHtml("https://nepsho.github.io/example/meta_tags.html",(data)=>{
    console.log(data);
})

/* Promise method */
webthief.getHtml("https://nepsho.github.io/example/meta_tags.html").then(function(data) {
	console.log(data);
}).catch(function(error) {
	console.log(error);
});

/* async/await method */
async function demo(){
    var result = await webthief.getHtml("https://nepsho.github.io/example/meta_tags.html");
    console.log(result);
} 

/* Sample output 
    { 
        url : 'https://nepsho.github.io/example/meta_tags.html'
        status : 200,
        success : true,
        html : "<html></html>"
    }
*/

To get meta of any webpage: for meta request a option is required which control and specify the desired output.

var option = {
    fields: ["logo","description","title"] /*fields you want*/
};

or

var option = {
    fields: ["*"] /*for all supported field*/
};
/* Callback method */
webthief.getMeta("https://nepsho.github.io/example/meta_tags.html",option,(data)=>{
    console.log(data);
})

/* Promise method */
webthief.getMeta("https://nepsho.github.io/example/meta_tags.html",option).then(function(data){
    console.log(data)
}).catch(function(error) {
	console.log(error);
});

/* async/await method */
async function demo(){
    var result = await webthief.getMeta("https://nepsho.github.io/example/meta_tags.html",option);
    console.log(result);
} 

/* Sample output 
    {
    	success: true,
	response: {
		logo : "https://nepsho.github.io/lib/img/logo.png",
        	title : "NepSho",
        	description : "Promise and callback based website-info getter using metadata of websites..."
	}
    }
*/

To get images from webpage:

/* Callback method */
webthief.getSiteImages("https://nepsho.github.io/example/meta_tags.html",(data)=>{
    console.log(data);
})

/* Promise method */
webthief.getSiteImages("https://nepsho.github.io/example/meta_tags.html").then(function(data) {
	console.log(data);
}).catch(function(error) {
	console.log(error);
});

/* async/await method */
async function demo(){
    var result = await webthief.getSiteImages("https://nepsho.github.io/example/meta_tags.html");
    console.log(result);
} 

/* Sample output 
    {
    	success: true,
	response: [ArrayOfImages]
    }
*/

Error callback data (In case any error):

//Error return object type
{
    success: false,
    error: "ErrorType",
    detail: "detail message of error"
}

In case of empty option then a default option is automatically set which contain logo, title and description. In this API both core function is designed in such way we can user as promise and as callback.

CLI Usage

$ npm install webthief -g

Valid Fields: meta|getmata, html|gethtml, images|getsiteimages (These options used for cli)

$ webthief [-method-] [-input-] [-option-]

method:

  • Get HTML
    • html | gethtml
  • Get Meta
    • meta | getmeta
  • Get Images
    • images | getsiteimages

input: Basically a valid url.

option: Option parameter basically -d for download html files and images.

CLI Examples

$ webthief html https://nepsho.github.io/example/meta_tags.html
or to download page also
$ webthief html https://nepsho.github.io/example/meta_tags.html -d
$ webthief meta https://nepsho.github.io/example/meta_tags.html
$ webthief images https://nepsho.github.io/example/meta_tags.html
or to download images also
$ webthief images https://nepsho.github.io/example/meta_tags.html -d

licence

MIT licence

Author

@BCrazyDreamer