0.2.3 • Published 6 years ago
pardosa v0.2.3
Pardosa
A spider framework has a Koa like APIs, written by Typescript.
PS: The repository still in developing, the APIs may be changed a lot before version 1.0. And welcome for your suggestions.
Feature
- Koa like APIs, can configurate page processing with middlewares.
- Support schedule request, based on node-schedule.
- Build-in middlewares:
guard: Print the request and it's processing time.fetch: Use node-fetch to request page.ctx.res: node-fetch's Response.ctx.response: Pardosa's Response- Exposes 3 interfaces of xSelector:
.css(),.xpath(),.re(); .$: Equivalent toCheerio.load(ctx.response.body).
- Exposes 3 interfaces of xSelector:
Router: Koa Router like APIs.schema: Use XPath extract data toctx.state.storagefile(): Use afterfetchand beforerouter.- If
ctx.state.fileexist, savectx.response.bodyto pathctx.req.file. - If
ctx.state.filesexist, save everyctx.state.files[].contentintoctx.state.files[].file.
- If
inspect: Print field ofctxby JSON Path, likestate.file.
Useage
import * as Pardosa from "pardosa";
import * as fetch from "pardosa/middlewares/fetch";
const spider = new Pardosa({ exitOnIdle: true })
.use(fetch())
.use(async function (ctx, next) {
console.log(ctx.response.xpath('//article').html());
});
spider.source.enqueue('https://github.com/plylrnsdy/pardosa');
spider.start();More examples.
Install
npm i -P pardosaIf you make a spider using Pardosa with Typescript, install with these declarations dependencies:
npm i -D @types/node-schedule @types/node-fetch @types/cheerioContribution
Submit the issues if you find any bug or have any suggestion.
Or fork the repo and submit pull requests.
About
Author:plylrnsdy
Github:pardosa