1.0.7 • Published 5 years ago

@pdisney1/htmlanalyzer v1.0.7

Weekly downloads
1
License
MIT
Repository
github
Last release
5 years ago

HtmlAnalyzer - HTML Analyzer and Tag Extractor

Module that performs HTML Tag Extraction for an HTML input.

Super simple to use

HtmlAnalyzer was desinged to extract tags from any HTML input. It has many convenience methods as well as generic methods for use in any project requiring HTML tag analysis.

const HtmlAnalyzer = require('HtmlAnalyzer');
const htmlAnalyzer = new HtmlAnalyzer();

var alltags = await htmlAnalyzer.getAllTags(url, html);

console.log(alltags);

Table of contents


Convenience Methods

There are 20 methods allowing access to common tags. Below is an example of a convenience method that returns tags used for page navigation:

    const HtmlAnalyzer = require('@pdisney1/htmlanalyzer/HtmlAnalyzer');
    const htmlAnalyzer = new HtmlAnalyzer();

    var navigation_tags = await htmlAnalyer.getNavigationTags(source_url, html);

    console.log(navigation_tags.letter_anchors);
    console.log(navigation_tags.pagination_nav);
    console.log(navigation_tags.group_nav);

Review the HtmlAnalyzer.js file for a list of all the convenience methods.
In addition, this module lets you look for a specific set of tags based on selector information. See Below:

    const HtmlAnalyzer = require('@pdisney1/htmlanalyzer/HtmlAnalyzer');
    const htmlAnalyzer = new HtmlAnalyzer();

    var tags = await htmlanalyzer.getTags(data.url, data.html, 'a[href="http://test.com/product-pills-reviews.html"]');

    console.log(tags);

back to top


Extra Utilities

This module also provides two additional utilites. The getSampleText method provides a way to return text from the HTML page with the html tag information removed. This method limits the results to a predetermined character count.

See Below:

    const HtmlAnalyzer = require('@pdisney1/htmlanalyzer/HtmlAnalyzer');
    const htmlAnalyzer = new HtmlAnalyzer();


    var sample_text = await htmlAnalyer.getTextSample(html, length);

    console.log(sample_text);

In addition, HtmlAnalyzer also provides a language inference method providing the ability to infer the language of an HTML input. See Below:

    const HtmlAnalyzer = require('@pdisney1/htmlanalyzer/HtmlAnalyzer');
    const htmlAnalyzer = new HtmlAnalyzer();


    var languages = await htmlAnalyer.getLanguages(html); 
    console.log(languages);

back to top


Full API

Below is a list of all convenience methods for HTML analysis and class definitions.

getTags

Allows the selection of any HTML tag within a body of HTML.
Inputs : URL - url for the html source. HTML - HTML source data. Selector - Tag selection string used to select a specific set of tags or a singular tag. example inputtype="text".classname Tag Limit - Limits the number of possible tags selected.

    const htmlAnalyzer = new HtmlAnalyzer();

    var selector = "a[href='http://test.com/tester'][id='mainanchor']";
    var textLimit = 1000;

    var tags = await htmlAnalyer.getTags(url, html, selector, textLimit); 
    console.log(tags);

back to top


getSubmitAnchors

back to top

getSearchInputs

back to top


getTextInputs

back to top

getTextImages

back to top


getPasswordInputs

back to top

getSubmitInputs

back to top

getSubmitButtons

back to top

getSubmitForms

back to top

getAllForms

back to top

getAllTextAreas

back to top

getAllInputs

back to top

getAllButtons

back to top

getAllAnchors

back to top

getAllSpans

back to top

getAllSelects

back to top

getAllImages

back to top

getAllFileTags

back to top

getLoginTags

back to top

getNavigationTags

back to top

getSearchInputAndSubmitLinks

back to top

getSearchTags

back to top

getAllTags

back to top

1.0.7

5 years ago

1.0.6

5 years ago

1.0.5

5 years ago

1.0.4

5 years ago

1.0.3

5 years ago

1.0.2

5 years ago

1.0.1

5 years ago

1.0.0

5 years ago