0.0.6 • Published 6 months ago

@axeptio/links-classifier v0.0.6

Weekly downloads
-
License
ISC
Repository
github
Last release
6 months ago

Links Classifier

Use Case

We want to filter links from a given webpage and classify them into different document types, like Privacy Policy, Terms of Service, etc.

Approach

We expose two functions, one for filtering the links, removing external, invalid and duplicate links, and another one for classifying the links into different document types.

Usage

const { filterLinks, classifyLinks, keywords } = require('links-classifier');

const links = document.querySelectorAll('a');

const filteredLinks = filterLinks(
  links, // the links to filter
  window.location, // the context
  ['en', 'fr', 'it'], // valid locales (other languages will be ignored)
  false, // follow subdomains
  console.log // logger function
);

const classifiedLinks = classifyLinks(filteredLinks, keywords, 'fr');

console.log(classifiedLinks);

/*
{
 'privacy_policy': Array(2),
 'terms_of_service': Array(1),
}
 */

Data

This module imports its own dataset, located in data/keywords.js, which contains variations for each document type. It is exposed as a symbol from the index, but you are free to use your own dataset.

0.0.6

6 months ago

0.0.5

6 months ago

0.0.4

6 months ago

0.0.3

7 months ago

0.0.2

7 months ago

0.0.1

7 months ago