0.1.2 • Published 8 years ago
tokio v0.1.2
tokio
Web scraping made simple.
Features
- Built on the top of jsdom.
- It runs inline and external scripts on the page.
- You can add resource filter to not load certain external resources.
- Simple and fast, only 100 SLOC and it does not require Electron or Chromium.
Install
yarn add tokioTable of Contents
Usage
const Tokio = require('tokio')
const tokio = new Tokio({
url: 'https://some-website.com'
})
tokio.fetch().then(html => {
console.log(html) //=> string
// Query HTML with cheerio (server-side jQuery)
// https://github.com/cheeriojs/cheerio
const $ = tokio.query(html)
})API
new Tokio(options)
options
options.url
- Type:
string - Required:
required
The URL to fetch.
options.wait
- Type:
numberstring - Default:
50
Wait for certain time (in milliseconds) or dom element to show up.
options.manually
- Type:
booleanstring
Instead of using options.wait, you can manually call window.__tokio_ready__() in your website to tell us that it's ready to be captured.
It can also be a string like i_am_ready so that you can call window.i_am_ready() instead.
options.resourceFilter
- Type:
resource => boolean
Whether to load certain resource. Check out the resource type.
options.requestOptions
proxy:stringA URL for a HTTP proxy to use for the requests.agent: http(s).Agent instance to use.agentOptions: The agent options; defaults to{ keepAlive: true, keepAliveMsecs: 115000 }, see http api for more details.strictSSL: Iftrue, requires SSL certificates be valid; defaults totrue, see request module for more details.userAgent: The user agent string used in requests; defaults toNode.js (#process.platform#; U; rv:#process.version#)headers: An object giving any headers that will be used while loading the HTML fromoptions.url, if applicable.
tokio.fetch()
- Type:
() => Promise<string>
Fetch URL and return corresponding HTML. (JavaScript on this page will be evaluated.)
tokio.query(html, opts)
This is basically cheerio.load(html, opts).
Contributing
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature - Commit your changes:
git commit -am 'Add some feature' - Push to the branch:
git push origin my-new-feature - Submit a pull request :D
Author
tokio © egoist, Released under the MIT License. Authored and maintained by egoist with help from contributors (list).
github.com/egoist · GitHub @egoist · Twitter @_egoistlily