Meta-scrapper NPM

This library provides an opportunity to parse meta tags of sites. In order to parse information from the site - it is enough just to use the methods that are described below and provide them with a link.

The library was created for the linkmarker application, but was separated from the application itself, since now there are not very many analogues in NPM

Warning
It's impossible to parse data on the client side because of CORS, so use this library on the backend side

How to use?

Installation is very simple:

pnpm i -d  js-meta-parser  // For PNPM
npm i  -d  js-meta-parser  // For NPM
yarn   add js-meta-parser  // For YARN

Module is available for CJS and ESM.

import {scrapMeta} from 'js-meta-parser';

// OR

const scrapMeta = require('js-meta-parser');

All modules are defined with TypeScript declarations 😌

Examples

Medium

Let's imagine that you want to parse all meta information from Medium

import scrapMeta from 'js-meta-parser';

const mediumMeta = scrapMeta('medium.com');

mediumMeta.then(meta => {
  console.log(meta.info);
})

// Output
{
  title: 'Medium – Where good ideas find you.',
  url: URL {
    href: 'https://medium.com/',
    origin: 'https://medium.com',
  },
  descriptionList: [
    'Medium is an open platform where readers find dynamic thinking, and where expert and undiscovered voices can share their writing on any topic.',
    'Medium is an open platform where readers find dynamic thinking, and where expert and undiscovered voices can share their writing on any topic.'
  ],
  iconList: [
    'https://miro.medium.com/1*m-R_BkNf1Qjr1YbyOIJY2w.png',
    'https://miro.medium.com/fit/c/152/152/1*sHhtYhaCe2Uc3IU0IgKwIQ.png'
  ],
  preview: 'https://miro.medium.com/fit/c/152/152/1*sHhtYhaCe2Uc3IU0IgKwIQ.png',
  themeColor: '#000000',
  locale: 'en_US',
  siteName: 'Medium',
  appId: null,
  type: 'website'
}

As with the previous site, everything is also quite easy here:

const tgMeta = scrapMeta(new URL('https://telegram.org/'));

tgMeta.then(meta => {
  
  // Also we can get unique fields one by one
  console.log(
    meta.title,
    meta.type,
    meta.locale,
    meta.descriptionList,
    meta.iconList,
    // ...
  );
})

// Output
Telegram Messenger, null, 'en_US', [ 'Fast. Secure. Powerful.' ],
[
  'https://telegram.org/img/website_icon.svg?4',
  'https://telegram.org/img/apple-touch-icon.png'
]

Available tags

At the moment, active development is underway, but all the main tags have been tested for performance, tests have been written on them. The following tags are available:

Title
Description (Default + OG)
Icon (Default + OG)
Type (OG)
Site Name (OG)
Preview (Default)
Theme (Meta)
Full URL
Manifest parsing
App ID (FB)
Locale (OG)

meta tags scrapper parser og

axios node-html-parser

@infinitebrahmanuniverse/nolb-meta-@everything-registry/sub-chunk-2158

1.0.1

3 years ago

1.0.0

3 years ago

0.0.1

3 years ago