8.7.0 • Published 9 months ago

url-inspector v8.7.0

Weekly downloads
580
License
MIT
Repository
github
Last release
9 months ago

url-inspector

Synopsys

npx url-inspector <url>

Description

Get normalized metadata about what a URL mainly represents.

This is a Node.js module.

Sources of information:

  • HTTP response headers
  • embedded tags in binary formats (using exiftool)
  • OpenGraph, Twitter Cards, schema.org, json+ld, title and meta tags in HTML pages
  • oEmbed endpoints
  • if a URL is mainly wrapping a media, that media might be inspected too

Inspection stops when enough information has been gathered, or when a maximum number of bytes (depending on media type) have been downloaded.

Format

  • url: url of the inspected resource
  • title: title of the resource, or filename, or last component of pathname with query
  • description: optional longer description, without title in it, and only the first line.
  • site: the name of the site, or the domain name
  • mime: RFC 7231 mime type of the resource (defaults to Content-Type) The inspected mime type could be more accurate than the http header.
  • ext: The file extension, only derived from the mime type. Safe to be used as file extension.
  • what: what the resource represents page, image, video, audio, file
  • type: how the resource is used: link, image, video, audio, embed. Example: if what:image and mime:text/html, and no html snippet is found, type will be 'link'.
  • html: the html representation of the resource, according to type and use.
  • script: url of a script that must be installed along with the html representation.
  • date (YYYY-MD-DD format) creation or modification date
  • author: optional credit, author (without the @ prefix and with _ replaced by spaces)
  • keywords: optional array of collected keywords (lowercased words that are not in title words).
  • size: optional Content-Length as integer; discarded when type is embed
  • icon: optional link to the favicon of the site
  • width, height: optional dimensions as integers
  • duration: optional hh:mm:ss string
  • thumbnail: optional a URL to a thumbnail, could be a data-uri for embedded images
  • source: optional a URL that can go in a 'src' attribute; for example a resource can be an html page representing an image type. The URL of the image itself would be stored here; same thing for audio, video, embed types.
  • error: optional an http error code, or string

Install

url-inspector currently requires those external libraries/tools:

  • exiftool
  • libcurl (and libcurl-dev if node-libcurl needs to be rebuilt)

Both programs are well-maintained, and available in most linux distributions.

Usage

import Inspector from 'url-inspector';

const opts = {
 ua: "Mozilla/5.0", // override ua, defaults to somewhat modern browser
 nofavicon: false, // disable additional requests to get a favicon
 nosource: false, // disable main embedded media sub-inspection
 file: true, // local files inspection is only enabled by default when using CLI
 meta: {} // user-entered metadata, to be merged and normalized
 providers: null // custom providers (module path or array)
};

const inspector = new Inspector(opts);

const obj = await inspector.look(url);

Inspector throws http-errors instances.

By default oembed providers are

  • found from a curated list of providers
  • found from a custom list, required from opts.providers
  • discovered in the inspected web pages

It is possible to add custom providers in the options, by passing an array or a path to a module exporting an array.

See src/custom-oembed-providers.js for examples.

To normalize an already existing metadata object, including url rewriting done by providers, and other changes in fields, do:

await inspector.norm(obj);

url-inspector uses node-libcurl to make http requests, and exposes it as:

const req = await Inspector.get(urlObj);

where req.abort() stops the request, req.res is the response stream, and res.statusCode, res.headers are available.

Proxy support

url-inspector configures http(s) proxies through proxy-from-env package and environment variables (http_proxy, https_proxy, all_proxy, no_proxy):

Read proxy-from-env documentation.

License

Open Source, see ./LICENSE.

8.7.0

9 months ago

8.6.0

10 months ago

8.5.0

1 year ago

8.4.1

2 years ago

8.2.3

2 years ago

8.2.2

2 years ago

8.4.0

2 years ago

8.3.0

2 years ago

6.1.0

3 years ago

8.1.0

2 years ago

5.0.2

3 years ago

5.0.1

3 years ago

5.0.0

3 years ago

6.2.0

3 years ago

7.0.0

3 years ago

7.0.1

3 years ago

8.2.1

2 years ago

8.2.0

2 years ago

7.1.0

2 years ago

6.0.1

3 years ago

6.0.0

3 years ago

7.2.0

2 years ago

8.0.0

3 years ago

4.4.0

3 years ago

4.3.0

3 years ago

4.2.3

3 years ago

3.7.5

3 years ago

3.7.4

3 years ago

3.7.3

3 years ago

3.7.2

3 years ago

3.6.2

4 years ago

3.6.1

4 years ago

3.6.0

4 years ago

4.0.1

3 years ago

4.0.0

3 years ago

3.8.0

3 years ago

4.2.2

3 years ago

4.2.1

3 years ago

4.2.0

3 years ago

3.7.1

3 years ago

3.7.0

3 years ago

4.1.0

3 years ago

4.1.2

3 years ago

4.1.1

3 years ago

3.5.0

4 years ago

3.4.0

4 years ago

3.4.4

4 years ago

3.4.3

4 years ago

3.4.2

4 years ago

3.4.1

4 years ago

3.3.1

4 years ago

3.3.3

4 years ago

3.3.2

4 years ago

3.3.0

4 years ago

3.0.3

4 years ago

3.2.0

4 years ago

3.1.0

4 years ago

3.0.2

4 years ago

3.0.1

4 years ago

3.0.0

4 years ago

2.10.2

4 years ago

2.10.1

4 years ago

2.10.0

4 years ago

2.9.0

4 years ago

2.8.2

4 years ago

2.8.1

4 years ago

2.8.0

4 years ago

2.7.0

4 years ago

2.6.0

4 years ago

2.5.0

5 years ago

2.4.3

6 years ago

2.4.2

6 years ago

2.4.1

6 years ago

2.4.0

6 years ago

2.3.3

7 years ago

2.3.2

7 years ago

2.3.1

7 years ago

2.3.0

7 years ago

2.2.0

7 years ago

2.1.8

7 years ago

2.1.7

7 years ago

2.1.6

7 years ago

2.1.5

8 years ago

2.1.4

8 years ago

2.1.3

8 years ago

2.1.2

8 years ago

2.1.1

8 years ago

2.1.0

9 years ago

2.0.3

9 years ago

2.0.2

9 years ago

2.0.1

9 years ago

2.0.0

9 years ago

1.9.5

9 years ago

1.9.4

9 years ago

2.0.0-rc4

9 years ago

2.0.0-rc3

9 years ago

2.0.0-rc2

9 years ago

2.0.0-rc1

9 years ago

1.9.3

9 years ago

1.9.2

9 years ago

1.9.1

9 years ago

1.9.0

9 years ago

1.8.1

9 years ago

1.7.0

9 years ago

1.6.2

9 years ago

1.6.1

9 years ago

1.6.0

9 years ago

1.5.1

9 years ago

1.5.0

9 years ago

1.4.9

9 years ago

1.4.8

9 years ago

1.4.7

9 years ago

1.4.6

9 years ago

1.4.5

9 years ago

1.4.4

9 years ago

1.4.3

9 years ago

1.4.2

9 years ago

1.4.1

9 years ago

1.4.0

9 years ago

1.3.0

9 years ago

1.2.12

9 years ago

1.2.11

9 years ago

1.2.10

9 years ago

1.2.9

9 years ago

1.2.8

9 years ago

1.2.7

9 years ago

1.2.6

9 years ago

1.2.5

9 years ago

1.2.4

9 years ago

1.2.3

9 years ago

1.2.2

9 years ago

1.2.1

9 years ago

1.2.0

9 years ago

1.1.0

9 years ago

1.0.1

9 years ago

1.0.0

9 years ago

1.0.0-rc-3

9 years ago

1.0.0-rc-2

9 years ago

1.0.0-rc-1

9 years ago

1.0.0-rc

9 years ago

0.1.0

9 years ago

0.0.3

9 years ago

0.0.2

9 years ago

0.0.1

9 years ago