1.0.0-rc.2 • Published 3 years ago

reurl v1.0.0-rc.2

Weekly downloads
1
License
MIT
Repository
github
Last release
3 years ago

NPM badge

ReURL

ReUrl is a library for parsing and manipulating URLs. It supports relative- and non-normalized URLs and a number of operations on them. It can be used to parse, resolve, normalize and serialize URLs in separate phases and in such a way that it conforms to the WhatWG URL Standard.

Motivation

I wrote this library because I needed a library that supported non-normalized and relative URLs but I also wanted to be certain that it followed the specification completely.

The WhatWG URL Standard defines URLs in terms of a parser algorithm that resolves URLs, normalizes URLs and serializes URL components in one pass. Thus to implement a library that follows the standard, but also supports a versatile set of operations on relative, and non-normalized URLs, I had to disentangle these phases from the specification and to some extent rephrase the specification in more elementary terms.

Eventually I came up with a small 'theory' of URLs that I found very helpful and I based the library on that. Over time, this theory has become thoroughly documented in this new URL Specification.

API

Overview

The ReUrl library exposes an Url class and a RawUrl class with an identical API. Their only difference is in their handling of percent escape sequences.

For Url objects the URL parser decodes percent escape sequences, getters report percent-decoded values and the set method assumes that its input is percent-decoded unless explicitly specified otherwise.

var url = new Url ('//host/%61bc')
url.file // => 'abc'
url = url.set ({ query:'%def' })
url.query // => '%def'
url.toString () // => '//host/abc?%25def'

For RawUrl objects the parser preserves percent escape sequences, getters report values with percent-escape-sequenes preserved and set expects values in which % signs start a percent-escape sequence.

var url = new RawUrl ('//host/%61bc')
url.file // => '%61bc'
url = url.set ({ query:'%25%64ef' })
url.query // => '%25%64ef'
url.toString () // => '//host/%61bc?%25%64ef'

Url and RawUrl objects are immutable. Modifying URLs is acomplished through methods that return new Url and/ or RawUrl objects, such as the url.set (patch) method described below.

Constructors

Construct a new Url object from an URL-string. The optional conf argument, if present must be a configuration object as described below.

var url = new Url ('sc:/foo/bar')
console.log (url)
// => Url { scheme: 'sc', root: '/', dirs: [ 'foo' ], file: 'bar' }

Construct a new Url object from any object, possibly an Url object itself. The optional conf argument, if present, must be a configuration object as described below. Throws an error if the object cannot be coerced into a valid URL.

var url = new Url ({ scheme:'file', dirs:['foo', 'buzz'], file:'abc' })
console.log (url.toString ())
// => 'file:foo/buzz/abc'

You can pass a configuration object with a parser property to the Url constructor to trigger scheme-specific parsing behaviour for relative, scheme-less URL-strings.

The scheme determines support for windows drive-letters and backslash separators. Drive-letters are only supported in file URL-strings, and backslash separators are limited to file, http, https, ws, wss and ftp URL-strings.

var url = new Url ('/c:/foo\\bar', { parser:'file' })
console.log (url)
// => Url { drive: 'c:', root: '/', dirs: [ 'foo' ], file: 'bar' }
var url = new Url ('/c:/foo\\bar', { parser:'http' })
console.log (url)
// => Url { root: '/', dirs: [ 'c:', 'foo' ], file: 'bar' }
var url = new Url ('/c:/foo\\bar')
console.log (url)
// => Url { root: '/', dirs: [ 'c:', 'foo' ], file: 'bar' }

Properties

Url and RawUrl objects have the following optional properties.

The scheme of an URL as a string. This property is absent if no scheme part is present, e.g. in scheme-relative URLs.

new Url ('http://foo?search#baz') .scheme
// => 'http'
new Url ('/abc/?') .scheme
// => undefined

The username of an URL as a string. This property is absent if the URL does not have an authority or does not have credentials.

new Url ('http://joe@localhost') .user
// => 'joe'
new Url ('//host/abc') .user
// => undefined

A property for the password of an URL as a string. This property is absent if the URL does not have an authority, credentials or password.

new Url ('http://joe@localhost') .pass
// => undefined
new Url ('http://host') .pass
// => undefined
new Url ('http://joe:pass@localhost') .pass
// => 'pass'
new Url ('http://joe:@localhost') .pass
// => ''

A property for the hostname of an URL as a string, This property is absent if the URL does not have an authority.

new Url ('http://localhost') .host
// => 'localhost'
new Url ('http:foo') .host
// => undefined
new Url ('/foo') .host
// => undefined

The port of (the authority part of) of an URL, being either a number, or the empty string if present. The property is absent if the URL does not have an authority or a port.

new Url ('http://localhost:8080') .port
// => 8080
new Url ('foo://host:/foo') .port
// => ''
new Url ('foo://host/foo') .port
// => undefined

A property for the path-root of an URL. Its value is '/' if the URL has an absolute path. The property is absent otherwise.

new Url ('foo://localhost?q') .root
// => undefined
new Url ('foo://localhost/') .root
// => '/'
new Url ('foo/bar')
// => Url { dirs: [ 'foo' ], file: 'bar' }
new Url ('/foo/bar')
// => Url { root: '/', dirs: [ 'foo' ], file: 'bar' }

It is possible for file URLs to have a drive, but not a root.

new Url ('file:/c:')
// => Url { scheme: 'file', drive: 'c:' }
new Url ('file:/c:/')
// => Url { scheme: 'file', drive: 'c:', root: '/' }

A property for the drive of an URL as a string, if present. Note that the presence of drives depends on the parser settings and/ or URL scheme.

new Url ('file://c:') .drive
// => 'c:'
new Url ('http://c:') .drive
// => undefined
new Url ('/c:/foo/bar', 'file') .drive
// => 'c:'
new Url ('/c:/foo/bar') .drive
// => undefined

If present, a nonempty array of strings. Note that the trailing slash determines whether a component is part of the dirs or set as the file property.

new Url ('/foo/bar/baz/').dirs
// => [ 'foo', 'bar', 'baz' ]
new Url ('/foo/bar/baz').dirs
// => [ 'foo', 'bar' ]

If present, a non-empty string.

new Url ('/foo/bar/baz') .file
// => 'baz'
new Url ('/foo/bar/baz/') .file
// => undefined

A property for the query part of url as a string, if present.

new Url ('http://foo?search#baz') .query
// => 'search'
new Url ('/abc/?') .query
// => ''
new Url ('/abc/') .query
// => undefined

A property for the hash part of url as a string, if present.

new Url ('http://foo#baz') .hash
// => 'baz'
new Url ('/abc/#') .hash
// => ''
new Url ('/abc/') .hash
// => undefined

Setting Properties

Url and RawUrl objects are immutable, therefore setting and removing components is achieved via a set method that takes a patch object.

The patch object may contain one or more keys being scheme, user, pass, host, port, drive, root, dirs, file, query and/ or hash. To remove a component you can set its patch' value to null.

If present; – port must be null, a string, or a number – dirs must be an array of strings – root may be anything and is converted to '/' if truth-y and is interpreted as null otherwise – all others must be null or a string.

new Url ('//host/dir/file')
  .set ({ host:null, query:'q', hash:'h' })
  .toString ()
// => '/dir/file?q#h'
Resets

For security reasons, setting the user will remove pass, unless a value is supplied for it as well. Setting the host will remove user, pass and port, unless values are supplied for them as well.

new Url ('http://joe:secret@example.com')
  .set ({ user:'jane' })
  .toString ()
// => 'http://jane@example.com'
new Url ('http://joe:secret@localhost:8080')
  .set ({ host:'example.com' })
  .toString ()
// => 'http://example.com'

The patch may have an additional key percentCoded with a boolean value to indicate that strings in the patch contain percent encode sequences.

This means that you can pass percent-encoded values to Url.set by explicity setting percentCoded to true. The values will then be decoded.

var url = new Url ('//host/')
url = url.set ({ file:'%61bc-%25-sign', percentCoded:true })
url.file // => 'abc-%-sign'
log (url.toString ()) // => '//host/abc-%25-sign'

You can pass percent-decoded values to RawUrl.set by explicitly setting percentCoded to false. Percent characters in values will then be encoded; specifically, they will be replaced with %25.

var rawUrl = new RawUrl ('//host/')
rawUrl = rawUrl.set ({ file:'abc-%-sign', percentCoded:false })
rawUrl.file // => 'abc-%25-sign'
rawUrl.toString () // => '//host/abc-%25-sign'

Note that if no percentCoded value is specified, then Url.set assumes percentCoded to be false whilst RawUrl.set assumes percentCoded to be true.

var url = new Url ('//host/') .set ({ file:'%61bc' })
url.file // => '%61bc'
url.toString () // => '//host/%2561bc'
var rawUrl = new RawUrl ('//host/') .set ({ file:'%61bc' })
url.file // => '%61bc'
rawUrl.toString () // => '//host/%61bc'

Conversions

Converts an Url object to a string. Percent encodes only a minimal set of codepoints. The resulting string may contain non-ASCII codepoints.

var url = new Url ('http://🌿🌿🌿/{braces}/hʌɪ')
url.toString ()
// => 'http://🌿🌿🌿/%7Bbraces%7D/hʌɪ'

Converts an Url object to a string that contains only ASCII code points. Non-ASCII codepoints in components will be percent encoded and/ or punycoded.

var url = new Url ('http://🌿🌿🌿/{braces}/hʌɪ')
url.toASCII ()
// => 'http://xn--8h8haa/%7Bbraces%7D/h%CA%8C%C9%AA'

Uses url.toASCII () to convert url to an RFC3986 URI. Throws an error if url does not have a scheme, because URIs must always have a scheme.

Normalisation

Returns a new Url object by normalizing url. This interprets a.o. . and .. segments within the path and removes default ports and trivial usernames/ passwords from the authority of url.

new Url ('http://foo/bar/baz/./../bee') .normalize () .toString ()
// => 'http://foo/bar/bee'

Percent Coding

Returns a RawUrl object by percent-encoding the properties of url according to the Standard. Prevents double escaping of percent-encoded-bytes in the case of RawUrl objects.

Returns an Url object by percent-decoding the properties of url if it is a RawUrl, and leaving them as-is otherwise.

Goto

Returns a new Url object by 'extending' url with url2, where url2 may be a string, an Url or a RawUrl object.

new Url ('/foo/bar') .goto ('baz/index.html') .toString ()
// => '/foo/baz/index.html'
new Url ('/foo/bar') .goto ('//host/path') .toString ()
// => '//host/path'
new Url ('http://foo/bar/baz/') .goto ('./../bee') .toString ()
// => 'http://foo/bar/baz/./../bee'

If url2 is a string, it will be parsed with the scheme of url as a fallback scheme. TODO: if url has no scheme then …

new Url ('file://host/dir/') .goto ('c|/dir2/') .toString ()
// => 'file://host/c|/dir2/'
new Url ('http://host/dir/') .goto ('c|/dir2/') .toString ()
// => 'http://host/dir/c|/dir2/'

Base URLs

Returns a boolean, indicating if url is a base-URL. What is and is not a base-URL, depends on the scheme of an URL. For example, http- and file-URLs that do not have a host are not base-URLs.

Forcibly convert an Url to a base-URL according to this URL Specification, in accordance with the WHATWG Standard.

  • In file URLs without hostname, the hostname will be set to ''.
  • For URLs that have a scheme being one of http, https, ws, wss or ftp and an absent or empty authority, the authority will be 'stolen from the first nonempty path segment'.
  • In the latter case, an error is thrown if url cannot be forced. This happens if it has no scheme, or if it has an empty host and no non-empty path segment.
new Url ('http:foo/bar') .force () .toString ()
// => 'http://foo/bar'
new Url ('http:/foo/bar') .force () .toString ()
// => 'http://foo/bar'
new Url ('http://foo/bar') .force () .toString ()
// => 'http://foo/bar'
new Url ('http:///foo/bar') .force () .toString ()
// => 'http://foo/bar'

Reference Resolution

Resolve an Url object url against a base URL base according to the strict reference resolution algorithm as defined in RFC3986.

Resolve an Url object url against a base URL base according to the non-strict reference resolution algorithm as defined in RFC3986.

Resolve an Url object url against a base URL base in a way that is compatible with the error-correcting, forcing reference resoluton algorithm as defined in the WHATWG Standard.

Changelog

Version 1.0.0-rc.2

  • Converted the project from a CommonJS Module to an ES Module.
  • Updated the core to use spec-url version 2.0.0-dev.1
  • Changes to the API for reference resolution.

ReUrl now exposes three methods for reference resolution:

  • url.genericResolve (base)
  • url.legacyResolve (base)
  • url.WHATWGResolve (base), also known as
  • url.resolve (base)

License

MIT.

Enjoy!