tall v8.0.0
tall
Promise-based, No-dependency URL unshortner (expander) module for Node.js 12+.
Note: This library is written in TypeScript and type definitions are provided.
Install
Using npm
npm install --save tallor with yarn
yarn add tallUsage
ES6+ usage:
import { tall } from 'tall'
tall('http://www.loige.link/codemotion-rome-2017')
.then(unshortenedUrl => console.log('Tall url', unshortenedUrl))
.catch(err => console.error('AAAW 👻', err))With Async await:
import { tall } from 'tall';
async function someFunction() {
try {
const unshortenedUrl = await tall('http://www.loige.link/codemotion-rome-2017');
console.log('Tall url', unshortenedUrl);
} catch (err) {
console.error('AAAW 👻', err);
}
}
someFunction();ES5:
var { tall } = require('tall')
tall('http://www.loige.link/codemotion-rome-2017')
.then(function(unshortenedUrl) {
console.log('Tall url', unshortenedUrl)
})
.catch(function(err) {
console.error('AAAW 👻', err)
})Options
It is possible to specify some options as second parameter to the tall function.
Available options are the following:
method(default"GET"): any available HTTP methodmaxRedirects(default3): the number of maximum redirects that will be followed in case of multiple redirects.headers(default{}): change request headers - e.g.{'User-Agent': 'your-custom-user-agent'}timeout: (default:120000): timeout in milliseconds after which the request will be cancelledplugins: (default:[locationHeaderPlugin]): a list of plugins for adding advanced behaviours
In addition, any other options available on http.request() or https.request() are accepted. This for example includes rejectUnauthorized to disable certificate checks.
Example:
import { tall } from 'tall'
tall('http://www.loige.link/codemotion-rome-2017', {
method: 'HEAD',
maxRedirect: 10
})
.then(unshortenedUrl => console.log('Tall url', unshortenedUrl))
.catch(err => console.error('AAAW 👻', err))Plugins
Since tall v5, a plugin system for extending the default behaviour of tall is available.
By default tall comes with 1 single plugin, the locationHeaderPlugin which is enabled by default. This plugin follows redirects by looking at the location header in the HTTP response received from the source URL.
You might want to write your own plugins to have more sophisticated behaviours.
Some example?
- Normalise the final URL if the final page has a
<link rel="canonical" href="http://example.com/page/" />tag in the<head>of the document - Follow HTML meta refresh redirects (
<meta http-equiv="refresh" content="0;URL='http://example.com/'" />)
How to write a plugin
A plugin is simply a function with a specific signature:
export interface TallPlugin {
(url: URL, response: IncomingMessage, previous: Follow | Stop): Promise<Follow | Stop>
}So the only thing you need to do is to write your custom behaviour following this interface. But let's discuss briefly what the different elements mean here:
url: Is the current URL being crawledresponse: is the actual HTTP response object representing the currentprevious: the decision from the previous plugin execution (continue following a given URL or stop at a given URL)
Every plugin is executed asynchronously, so a plugin returns a Promise that needs to resolve to a Follow or a Stop decision.
Let's deep dive on these two concepts. Follow and Stop are defined as follows (touché):
export class Follow {
follow: URL
constructor (follow: URL) {
this.follow = follow
}
}
export class Stop {
stop: URL
constructor (stop: URL) {
this.stop = stop
}
}Follow and Stop are effectively simple classes to express an intent: should we follow the follow URL or should we stop at the stop URL?
Plugins are executed following the middleware pattern (or chain of responsability): they are executed in order and the information is propagated from one to the other.
For example if we initialise tall with { plugins: [plugin1, plugin2] }, for every URL, plugin1 will be executed before plugin2 and the decision of plugin1 will passed over onto plugin2 using the previous) parameter.
How to write and enable a plugin
Let's say we want to add a plugin that allows us to follow HTML meta refresh redirects, the code could look like this:
// metarefresh-plugin.ts
import { IncomingMessage } from 'http'
import { Follow, Stop } from 'tall'
export async function metaRefreshPlugin (url: URL, response: IncomingMessage, previous: Follow | Stop): Promise<Follow | Stop> {
let html = ''
for await (const chunk of response) {
html += chunk.toString()
}
// note: this is actually not a great idea, it's best to use something like `cheerio` to properly parse HTML
// but for the sake of illustrating how to use plugins this is good enough here...
const metaHttpEquivUrl = html.match(/meta +http-equiv="refresh" +content="\d;url=(http[^"]+)"/)?.[1]
if (metaHttpEquivUrl) {
return new Follow(new URL(metaHttpEquivUrl))
}
return previous
}Then, this is how you would use your shiny new plugin:
import {tall, locationHeaderPlugin} from 'tall'
import { metaRefreshPlugin } from './metarefresh-plugin'
const finalUrl = await tall('https://loige.link/senior', { plugins: [locationHeaderPlugin, metaRefreshPlugin] })
console.log(finalUrl)Note that we have to explicitly pass the locationHeaderPlugin if we want to retain tall original behaviour.
Contributing
Everyone is very welcome to contribute to this project. You can contribute just by submitting bugs or suggesting improvements by opening an issue on GitHub.
License
Licensed under MIT License. © Luciano Mammino.
3 years ago
3 years ago
3 years ago
3 years ago
4 years ago
4 years ago
4 years ago
5 years ago
5 years ago
5 years ago
6 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
9 years ago
9 years ago
9 years ago