0.2.3 • Published 6 years ago
pardosa v0.2.3
Pardosa
A spider framework has a Koa like APIs, written by Typescript.
PS: The repository still in developing, the APIs may be changed a lot before version 1.0. And welcome for your suggestions.
Feature
- Koa like APIs, can configurate page processing with middlewares.
- Support schedule request, based on node-schedule.
- Build-in middlewares:- guard: Print the request and it's processing time.
- fetch: Use node-fetch to request page.- ctx.res: node-fetch's Response.
- ctx.response: Pardosa's Response- Exposes 3 interfaces of xSelector: .css(),.xpath(),.re();
- .$: Equivalent to- Cheerio.load(ctx.response.body).
 
- Exposes 3 interfaces of xSelector: 
 
- Router: Koa Router like APIs.
- schema: Use XPath extract data to- ctx.state.
- storage- file(): Use after- fetchand before- router.- If ctx.state.fileexist, savectx.response.bodyto pathctx.req.file.
- If ctx.state.filesexist, save everyctx.state.files[].contentintoctx.state.files[].file.
 
- If 
 
- inspect: Print field of- ctxby JSON Path, like- state.file.
 
Useage
import * as Pardosa from "pardosa";
import * as fetch from "pardosa/middlewares/fetch";
const spider = new Pardosa({ exitOnIdle: true })
    .use(fetch())
    .use(async function (ctx, next) {
        console.log(ctx.response.xpath('//article').html());
    });
spider.source.enqueue('https://github.com/plylrnsdy/pardosa');
spider.start();More examples.
Install
npm i -P pardosaIf you make a spider using Pardosa with Typescript, install with these declarations dependencies:
npm i -D @types/node-schedule @types/node-fetch @types/cheerioContribution
Submit the issues if you find any bug or have any suggestion.
Or fork the repo and submit pull requests.
About
Author:plylrnsdy
Github:pardosa