0.2.3 • Published 5 years ago
pardosa v0.2.3
Pardosa
A spider framework has a Koa like APIs, written by Typescript.
PS: The repository still in developing, the APIs may be changed a lot before version 1.0. And welcome for your suggestions.
Feature
- Koa like APIs, can configurate page processing with middlewares.
- Support schedule request, based on node-schedule.
- Build-in middlewares:
guard
: Print the request and it's processing time.fetch
: Use node-fetch to request page.ctx.res
: node-fetch's Response.ctx.response
: Pardosa's Response- Exposes 3 interfaces of xSelector:
.css()
,.xpath()
,.re()
; .$
: Equivalent toCheerio.load(ctx.response.body)
.
- Exposes 3 interfaces of xSelector:
Router
: Koa Router like APIs.schema
: Use XPath extract data toctx.state
.storage
file()
: Use afterfetch
and beforerouter
.- If
ctx.state.file
exist, savectx.response.body
to pathctx.req.file
. - If
ctx.state.files
exist, save everyctx.state.files[].content
intoctx.state.files[].file
.
- If
inspect
: Print field ofctx
by JSON Path, likestate.file
.
Useage
import * as Pardosa from "pardosa";
import * as fetch from "pardosa/middlewares/fetch";
const spider = new Pardosa({ exitOnIdle: true })
.use(fetch())
.use(async function (ctx, next) {
console.log(ctx.response.xpath('//article').html());
});
spider.source.enqueue('https://github.com/plylrnsdy/pardosa');
spider.start();
More examples.
Install
npm i -P pardosa
If you make a spider using Pardosa with Typescript, install with these declarations dependencies:
npm i -D @types/node-schedule @types/node-fetch @types/cheerio
Contribution
Submit the issues if you find any bug or have any suggestion.
Or fork the repo and submit pull requests.
About
Author:plylrnsdy
Github:pardosa