1.0.1 • Published 1 year ago

meta-scrapper v1.0.1

Weekly downloads
-
License
MIT
Repository
-
Last release
1 year ago

This library provides an opportunity to parse meta tags of sites. In order to parse information from the site - it is enough just to use the methods that are described below and provide them with a link.

The library was created for the linkmarker application, but was separated from the application itself, since now there are not very many analogues in NPM

Warning

It's impossible to parse data on the client side because of CORS, so use this library on the backend side

How to use?

Installation is very simple:

pnpm i -d  js-meta-parser  // For PNPM
npm i  -d  js-meta-parser  // For NPM
yarn   add js-meta-parser  // For YARN

Module is available for CJS and ESM.

import {scrapMeta} from 'js-meta-parser';

// OR

const scrapMeta = require('js-meta-parser');

All modules are defined with TypeScript declarations 😌

Examples

Medium

Let's imagine that you want to parse all meta information from Medium

import scrapMeta from 'js-meta-parser';

const mediumMeta = scrapMeta('medium.com');

mediumMeta.then(meta => {
  console.log(meta.info);
})
// Output
{
  title: 'Medium – Where good ideas find you.',
  url: URL {
    href: 'https://medium.com/',
    origin: 'https://medium.com',
  },
  descriptionList: [
    'Medium is an open platform where readers find dynamic thinking, and where expert and undiscovered voices can share their writing on any topic.',
    'Medium is an open platform where readers find dynamic thinking, and where expert and undiscovered voices can share their writing on any topic.'
  ],
  iconList: [
    'https://miro.medium.com/1*m-R_BkNf1Qjr1YbyOIJY2w.png',
    'https://miro.medium.com/fit/c/152/152/1*sHhtYhaCe2Uc3IU0IgKwIQ.png'
  ],
  preview: 'https://miro.medium.com/fit/c/152/152/1*sHhtYhaCe2Uc3IU0IgKwIQ.png',
  themeColor: '#000000',
  locale: 'en_US',
  siteName: 'Medium',
  appId: null,
  type: 'website'
}

Telegram

As with the previous site, everything is also quite easy here:

const tgMeta = scrapMeta(new URL('https://telegram.org/'));

tgMeta.then(meta => {
  
  // Also we can get unique fields one by one
  console.log(
    meta.title,
    meta.type,
    meta.locale,
    meta.descriptionList,
    meta.iconList,
    // ...
  );
})
// Output
Telegram Messenger, null, 'en_US', [ 'Fast. Secure. Powerful.' ],
[
  'https://telegram.org/img/website_icon.svg?4',
  'https://telegram.org/img/apple-touch-icon.png'
]

Available tags

At the moment, active development is underway, but all the main tags have been tested for performance, tests have been written on them. The following tags are available:

  • Title
  • Description (Default + OG)
  • Icon (Default + OG)
  • Type (OG)
  • Site Name (OG)
  • Preview (Default)
  • Theme (Meta)
  • Full URL
  • Manifest parsing
  • App ID (FB)
  • Locale (OG)