1.2.5 • Published 2 years ago

get-array-of-links v1.2.5

Weekly downloads
-
License
ISC
Repository
github
Last release
2 years ago

get-array-of-links

npm version npm

Get an array of link objects with the text and href from every anchor tag on a webpage.

Table of contents

Installation

To install the package, run:

npm install get-array-of-links

Or, if you prefer using Yarn:

yarn add get-array-of-links

Usage

Getting All Links

import { getArrayOfLinks } from 'get-array-of-links';

const links = await getArrayOfLinks('https://www.example.com');

// links is an array of objects with the following properties:
// {
//   text: 'Example Page',
//   href: 'https://www.example.com/path-to-page'
// }

Options

getArrayOfLinks takes an optional options object as its second argument.

const links = await getArrayOfLinks('https://www.example.com', {
  limit: 10,
  useFilters: false,
  customFilters: myCustomFiltersFunction,
  useFormatting: false,
  customFormatting: myCustomFormattingFunction,
});

limit

The maximum number of links to return.

useFilters

Defaults to true and uses the following filterLinks function to filter out unwanted links.

export function filterLinks(links: LinkObject[]): LinkObject[] {
  // Filter out unwanted links
  links = links.filter(link => {
    return (
      link.text.length > 30 &&
      link.text.length < 250 &&
      !link.text.includes('<img') &&
      !link.text.includes('Paid Program') &&
      !link.href.includes('#') &&
      !link.href.includes('sponsored')
    );
  });

  // Filter out duplicate links
  links = links.filter((link, index, array) => {
    return array.findIndex(l => l.href === link.href) === index;
  });

  return links;
};

customFilters

A user-defined function that takes an array of link objects and returns an array of link objects filtered as desired.

useFormatting

Defaults to true and uses the following formatLinks function to format links.

function addBaseUrlIfNeeded(link: LinkObject, url: string): LinkObject {
  if (link.href.includes('/') && !link.href.includes('http')) {
    link.href = `${url}${link.href}`;
  };

  return link;
};

export function formatLinks(links: LinkObject[], url: string): LinkObject[] {
  return links.map(link => {
    addBaseUrlIfNeeded(link, url);

    // Remove \n
    link.text = link.text.replace(/\n/g, '');

    // Remove '...'
    link.text = link.text.replace(/\.\.\./g, '');

    // Replace excess whitespace with single space
    link.text = link.text.replace(/\s\s+/g, ' ');

    return link;
  });
};

customFormatting

A user-defined function that takes an array of link objects and an optional url argument and returns an array of link objects formatted as desired.

Author

1.2.5

2 years ago

1.2.4

2 years ago

1.2.3

2 years ago

1.2.2

2 years ago

1.2.1

2 years ago

1.2.0

2 years ago

1.1.0

2 years ago

1.0.0

2 years ago