scra v1.4.5
scra
HTTP client, designed mainly for scraping web sites. It is not so complicated as famous request, but it is really simple tool for geting html pages for parsing data or json responses from site internal API. Sometime "less is more".
Features
- GET and POST requests via HTTP/HTTPS
- Supports both promise way and callback way
- Proxy support via tunnel-agent
- Non-utf8 charset decoding via iconv-lite
- Auto decompression gzip/deflate
- Easy JSON-API support
- Cookies parsing/serialization
- Useful request fields
- Informative custom errors via c-e
- No any superfluous things
Install
(node '>=10' required)
npm install scraUsage
scra takes url string or options object as first parameter and callback function as optional second param. If no callback - scra returns promise. Any way scra produce response object, same as http.ClientRequest do, but with few extra fields.
Synopsis
const scra = require('scra');
// promise way
scra('http://httpbin.org/get')
.then((res) => console.log(res.body))
.catch((err) => console.error(err.message));
// callback way
scra('http://httpbin.org/get', (err, res) => {
if(err) console.error(err.message);
if(res) console.log(res.body);
})See more examples in test.
Available options:
url(required) - address for request. String. If protocol is omitted in string - it will be set tohttp:, soexample.commeanshttp://example.com. Ifurlis the only field in options object then first param may be this string, soscra('example.com')is equal toscra({url: 'example.com'}).headers- http-headers for sending with a request. Object with string fields. By default there are three predefined headers:{ 'connection': 'close', 'user-agent': 'astur/scra', 'accept': '*/*', }You may set any headers manually. Also 'Host' header will be set by node
httpmodule, and some headers may be set depending on the other options (see below). Such headers have a higher priority over values inheadersoption (be careful, it is not "as usual").Manually set headers will replace defaults, but will be replaced by headers from options.
data- data for POST request. Ifdatais a string with length more then 0 - this string will be sent as a request body without any conversions (ifcontent-typeheader is not set it will beapplication/x-www-form-urlencoded). Ifdatais an object it will be stringified to json and sent as a request body (content-typeheader will beapplication/json).cookies- cookies to be sent with request. Key-value object. It will be stringified and placed tocookieheader.compressed- Boolean. Iftruesetaccept-encodingheader to'gzip, deflate'. Defaults tofalse.timeout- Number of milliseconds. Time limit for request to be done (if not - error will be thrown). Iftimeoutset to0it means no time limit. Defaults to5000.proxy- address of proxy server. It may be both,httporhttps, and if protocol is omitted it will be'http'. Nowscrasupports proxy viatunnel-agent, so you can use proxy withhttpssites.reverseProxy- object or string, describing how to change target url to reverse-proxy url. IfreverseProxyis object, target url part is in fieldtoand reverse-proxy url part is in fieldfrom. Actually it is just parameters forreplacemethod of url string, so it is possible to use regexp in fieldtoand replacement patterns in fieldfrom. IfreverseProxyis string,scraexpects it is url part before path. So, string'http://reverse-proxy.my.org'is equal to ooject like{to: /^https?:\/\/[^/]+(\/)?/i, from: 'http://reverse-proxy.my.org$1'}agent- custom http/https agent.
Response object extra fields:
rawBody- buffer with response body (decompressed if necessary).charset- charset part fromcontent-typeresponse header.body- response body converted to string (usingiconv-liteifcharsetdefined.) Ifcontent-typeresponse header isapplication/jsonandbodyis valid json string then it will be parsed to object.cookies- key-value object with cookies fromcookiesoption and fromset-cookieresponse header.url- same asurlfield in options.requestHeaders- object with headers sent with request.requestTime- request duration (number of milliseconds).timings- (milliseconds) detailed timings (timestamps of all request phases):start- time when request starts. Just moment before callinghttp(s).request.socket- time when socket has been created (when thehttp(s)module'ssocketevent fires).lookup- time when DNS has been resolved (when thenetmodule'slookupevent fires).connect- time when the server acknowledges the TCP connection (when thenetmodule'sconnectevent fires).secureConnect- (https only) time when TLS handshake has been completed (when thetlsmodule'ssecureConnectevent fires).responce- time when server delivers first byte of response (when thehttp(s)module'sresponceevent fires).end- time when all responce data has been received (when thehttp(s)module'sendevent fires).
timingPhases- (milliseconds) relative durations of each request phase:wait- time spent waiting for socket (timings.socket - timings.start).dns- time spent performing the DNS lookup (timings.lookup - timings.socket).tcp- time it took to establish TCP connection between a source host and destination host (timings.connect - timings.lookup).tls- time spent completing a TLS handshake (timings.secureConnect - timings.connect).responce- time spent waiting for the initial response (timings.responce - (timings.secureConnect || timings.connect)).read- time spent receiving the response data (timings.end - timings.responce).total- time spent performing all phases of request (timings.end - timings.start).
bytes- just how manybytes.sentandbytes.receivedby this request.options- rawscraoptions parameter as string or object.
Custom errors
scra provides two custom error classes:
const scra = require('scra');
const {TimeoutError, NetworkError} = scra;These errors contain several useful additional properties:
url- url used in current request.errorTime- timestamp of the moment when error was thrown.timings- same as in response field (but maybe some timings will be missing because the error occurred before corresponding phases).timingPhases- same as in response field (but maybe some timingPhases will be missing because the error occurred before corresponding phases).timeout- (only inTimeoutError) value oftimeoutoption.cause- (only inNetworkError) error object, thrown by corehttpmodule.
License
MIT
2 years ago
4 years ago
5 years ago
5 years ago
5 years ago
5 years ago
6 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
8 years ago
8 years ago
8 years ago
8 years ago