1.2.5 • Published 2 years ago
get-array-of-links v1.2.5
get-array-of-links
Get an array of link objects with the text and href from every anchor tag on a webpage.
Table of contents
Installation
To install the package, run:
npm install get-array-of-links
Or, if you prefer using Yarn:
yarn add get-array-of-links
Usage
Getting All Links
import { getArrayOfLinks } from 'get-array-of-links';
const links = await getArrayOfLinks('https://www.example.com');
// links is an array of objects with the following properties:
// {
// text: 'Example Page',
// href: 'https://www.example.com/path-to-page'
// }
Options
getArrayOfLinks takes an optional options object as its second argument.
const links = await getArrayOfLinks('https://www.example.com', {
limit: 10,
useFilters: false,
customFilters: myCustomFiltersFunction,
useFormatting: false,
customFormatting: myCustomFormattingFunction,
});
limit
The maximum number of links to return.
useFilters
Defaults to true and uses the following filterLinks function to filter out unwanted links.
export function filterLinks(links: LinkObject[]): LinkObject[] {
// Filter out unwanted links
links = links.filter(link => {
return (
link.text.length > 30 &&
link.text.length < 250 &&
!link.text.includes('<img') &&
!link.text.includes('Paid Program') &&
!link.href.includes('#') &&
!link.href.includes('sponsored')
);
});
// Filter out duplicate links
links = links.filter((link, index, array) => {
return array.findIndex(l => l.href === link.href) === index;
});
return links;
};
customFilters
A user-defined function that takes an array of link objects and returns an array of link objects filtered as desired.
useFormatting
Defaults to true and uses the following formatLinks function to format links.
function addBaseUrlIfNeeded(link: LinkObject, url: string): LinkObject {
if (link.href.includes('/') && !link.href.includes('http')) {
link.href = `${url}${link.href}`;
};
return link;
};
export function formatLinks(links: LinkObject[], url: string): LinkObject[] {
return links.map(link => {
addBaseUrlIfNeeded(link, url);
// Remove \n
link.text = link.text.replace(/\n/g, '');
// Remove '...'
link.text = link.text.replace(/\.\.\./g, '');
// Replace excess whitespace with single space
link.text = link.text.replace(/\s\s+/g, ' ');
return link;
});
};
customFormatting
A user-defined function that takes an array of link objects and an optional url argument and returns an array of link objects formatted as desired.
Author
- Andy McGunagle - andymcgunagle