3.0.0 • Published 12 months ago

@wabarc/cairn v3.0.0

Weekly downloads
328
License
MIT
Repository
github
Last release
12 months ago

Cairn

    //   ) )                              
   //         ___     ( )  __       __    
  //        //   ) ) / / //  ) ) //   ) ) 
 //        //   / / / / //      //   / /  
((____/ / ((___( ( / / //      //   / /   

Cairn is an npm package and CLI tool for saving the web page as a single HTML file, it is TypeScript implementation of Obelisk.

Features

Usage

As CLI tool

npm install -g @wabarc/cairn
$ cairn -h

Usage: cairn [options] url1 [url2]...[urlN]

CLI tool for saving web page as single HTML file

Options:
  -v, --version                         output the current version
  -o, --output <string>                 path to save archival result
  -u, --user-agent <string>             set custom user agent
  -p, --proxy [protocol://]host[:port]  use this proxy
  -t, --timeout <number>                maximum time (in second) request timeout
  --no-js                               disable JavaScript
  --no-css                              disable CSS styling
  --no-embeds                           remove embedded elements (e.g iframe)
  --no-medias                           remove media elements (e.g img, audio)
  -h, --help                            display help for command

As npm package

npm install @wabarc/cairn
import { Cairn } from '@wabarc/cairn';
// const cairn = require('@wabarc/cairn');

const cairn = new Cairn();

cairn
  .request({ url: url })
  .options({ userAgent: 'Cairn/2.0.0', proxy: 'socks5://127.0.0.1:1080' })
  .archive()
  .then((archived) => {
    console.log(archived.url, archived.webpage.html());
  })
  .catch((err) => console.warn(`${url} => ${JSON.stringify(err)}`));

Instance methods

cairn#request({ url: string }): this
cairn#options({}): this
  • proxy?: string;
  • userAgent?: string;
  • disableJS?: boolean;
  • disableCSS?: boolean;
  • disableEmbeds?: boolean;
  • disableMedias?: boolean;
  • timeout?: number;
cairn#archive(): Promise
cairn#Archived
  • url: string;
  • webpage: cheerio.Root;
  • status: 200 | 400 | 401 | 403 | 404 | 500 | 502 | 503 | 504;
  • contentType: 'text/html' | 'text/plain' | 'text/*';

Request Params

request
{
  // `url` is archival target.
  url: 'https://www.github.com'
}
options
{
  proxy: 'socks5://127.0.0.1:1080',
  userAgent: 'Cairn/2.0.0',

  disableJS: true,
  disableCSS: false,
  disableEmbeds: false,
  disableMedias: true,

  timeout: 30
}

Response Schema

for v1.x:

The archive method will return webpage body as string.

for v2.x:

{
  url: 'https://github.com/',
  webpage: cheerio.Root,
  status: 200,
  contentType: 'text/html'
}

License

Cairn has been re-licensed under MIT since version 3.0.0. If you are using versions 2 and 1, you should note that it is licensed under GPL 3.0.

This software is released under the terms of the MIT. See the LICENSE file for details.

3.0.0

12 months ago

2.3.0

12 months ago

2.2.1

2 years ago

2.2.0

3 years ago

2.1.2

4 years ago

2.1.1

4 years ago

2.1.0

4 years ago

2.0.1

4 years ago

2.0.0

4 years ago

1.3.0

4 years ago

1.2.1

4 years ago

1.2.0

4 years ago

1.1.2

4 years ago

1.1.1

4 years ago

1.1.0

4 years ago

1.0.1

4 years ago

1.0.0

4 years ago