downode v0.1.6
downode
downode is a easy-to-use scraper for general usage. Simple but powerful.
Installation
npm i -S downodeFeatures
Composable: downode supports nested Rule, you can reuse/compose your
Page Rule/Rulearbitrarily.Concurrent control: Control all the network requests with simple config option.
Reference mechanism: You can reference other scraped data easily and asynchronously.
Documentations
Examples
There is a example to scrape Douban Top Rated 250 Movies.
API
downode(entryURL, pageRule, globalOptions?)
scrape the given URL page with given Page Rule
NOTE: if you're using commonjs module, you'll need to use require('downode').default to get this main function
Params
- string
entryURL- The target URL you want to start with. - Object
pageRule- ThePage Rulefor the entry page, a set ofRule. -Rule(Object|String|RefVarWaiter) - Specify what/how to scrape. see Rule's Options Guide - Object
globalOptions- Global config options. -totalConcurrent(number? = 50) - Max concurrent number for global task prority queue. see Concurent Control -mode('default' | 'df' | 'bf') - Global task prority queue mode. see Concurent Control -entryCookie(string) -cookiefor entry request. -rate(number? = 0) - Defaultrateoption forRules. -concurrent(number? = 5) - Defaultconcurrentoption forRules. -request(Object? = 0) - Defaultrequestoption forRules. -userAgents((string[] | string)? = MOST_COMMON_USER_AGENTS) - DefaultuserAgentsoption forRules. -retry(number? = 3) - Defaultretryoption forRules. -retryTimeout(number? = 2000) - DefaultretryTimeoutoption forRules.
Return
- Promise - resolve a result Object with same structure to your Page Rule
waitFor(...refPaths, callback)
Function overloading:
waitFor(refPathsArray, callback)waitFor(refPathsObject, callback)
Create a Reference Variable Waiter. Invoke the callback when all Reference Variables are available.
To learn more about reference mechanism, please head to reference-mechanism
Params
- string[]
refPaths:Reference Pathspassed one by one. - or string[]refPathsArray: A array contains allReference Paths- or objectrefPathsObject: A object contains key value map toReference Paths - Function
callback
Return
- any - Return what callback return.
Debug
# set environment variable
export DEBUG=downode:*
# `downode:info` - basic infomation, like request, download.
# `downode:warn` - retry request, useless rule
# `downode:error` - error infomation, including request error, download error etc.Related
downode is inspired by these projects:
Roadmap
- Proxy Rule Option
- Post Rule Option
- Authorization/Cookie propogation
- CLI support
- Incremental scrape
- Dynamic generate website scrape support
License
MIT