0.0.2 • Published 1 year ago

@easyscrape/core v0.0.2

Weekly downloads
-
License
MIT
Repository
github
Last release
1 year ago

EasyScrapeCore

PROJECT IN DEVELOPMENT!

DO NOT USE IT IN PRODUCTION.

EasyScrape It's a mediator to make easier the Web Scraping with JavaScript and TypeScript. With EasyScrape you can extract all data from any website like an API. This is the core of the all Middlewares based on EasyScrape scraping method. Here you can read how to make your own Middleware based on EasyScrape. If you only want to use it, This isn't your Documentation, read the all EasyScrape Implementations on the next list based on your requirements.

EasyScrape For:

  • Cheerio: It scrapes from HTML documents.
  • Puppeteer: It scrapes or controls a navigator like Chromium or other browsers supported by Puppeteer (coming soon)

Documentation Links

How can EasyScrape help you?

Well, EasyScape can Scrape and give you the information that you want exactly like you need.

Installation

Use this command to install EasyScrape's Module in your Project.

# if you use npm
npm install @easyscrape/core

# or yarn
yarn add @easyscrape/core

How could i use it?

Very Easy! Only imports the NodeJS Module that implements EasyScrapeCore for manage a your favorite Scraping Module in your code like this

// Example using EasyScrape for Cheerio
const EasyScrape = require('@easyscrape/cheerio');

Then, load your HTML Code with the Module that you want to use in your project. Supposing that you has an HTML code Like this

<nav id="ShoppingList">
    <ul id="fruits">
        <li class="apple">Apple</li>
        <li class="orange">Orange</li>
        <li class="pear">Pear</li>
    </ul>
    <ul id="meats">
        <li class="pork">Pork meat</li>
        <li class="beef">Beef</li>
        <li class="chicken">Chicken</li>
    </ul>
</nav>

If you are using Cheerio, you can do this.

let $ = EasyScrape.load('<nav id="ShoppingList">...</nav>');

let data = $({
    fruits: {
        _each_: '#fruits li', // Get all "li" elements inside the element with id "fruits" and for each elements do the next
        _text: true // get the inner text
    },
    meats: {
        _each_: '#meats li', // Get all "li" elements inside the element with id "meats" and for each elements do the next
        _text: true // get the inner text
    }
});

The variable "data" contains:

{
    fruits: [
        'Apple',
        'Orange',
        'Pear'
    ],
    meats: [
        'Pork meat',
        'Beef',
        'Chicken'
    ]
}

How can i create my own EasyScrape Middleware?

Its very easy create your own implementation, you can follow the next steps to do it.

Step 0: Preparations

  • THE DOCUMENTATION IS UNDER DEVELOPMENT RIGTH NOW!

  • You remember the Documentation Web Site (coming soon) its your friend! If you don't know what is the use for some method or you need an example, there is the technical documentation. This is a Quickstart guide.
  • If you use Visual Studio Code you can watch all technical documentation for each method only making mouseover the method name.
  • You can read the JSDoc comments each methods or classes.
  • Other form of help your self its reading the Cheerio Implementation.
  • Let's start!

Step 1: Installation

Install EasyScrapeCore in your Project.

Step 2: Main File

You Make a main file for your implementation, using the next structure.

// File: ./MyFirstMiddlewareEasyScrape.ts
import MyMiddlewareESQueriesManager from './MyMiddlewareESQueriesManager';
import {AbstractEasyScrapeMiddleware, IESObject, IESQuery} from '@easyscrape/core';

class MyFirstMiddlewareEasyScrape extends AbstractEasyScrapeMiddleware{
    /**
     * Middleware Information
     */
    SupportFor = {
        LibraryName: 'Cheerio', // Library name that your middleware use
        PackageName: 'cheerio' // NPM Package name
    }; 

    /** 
     * Your Middleware Queries Manager
     */
    protected QueriesManager: MyMiddlewareESQueriesManager = new MyMiddlewareESQueriesManager(this);

    /** 
     * This method says to EasyScrape when it can manage the data 
     */
    canICollect($: any): boolean {
        // Write here one code that return true or false if it can scrape over current node
    }

    /** 
     * Make your Middleware Load method
     * This method its expected that return an function whit one parameter with the types accepted 
     */
    load($: any){
        return (query: IESObject|IESQuery|string) => this.collect($, query);
    }
}
// The next line is very important, because it solve the unnecessary creations of the same middleware and export the module.
export default new MyMiddlewareEasyScrape;

Step 3: Middleware's Queries Manager

The queries manager its an class that contains all instructions to manage all queries that the user can use, the interface "IESQueriesManager" give you the basis queries and its information, but you can create all you need follow the following requirements:

  • All query names must use the prefix "_" at the beginning.
  • Use "$" like a wildcard in the query name to allow that the user customize the query.
  • You can use as many wildcards as you need.

For example: "_select$" handles statements such as "_selectFood", "_selectAllListsElements" or "_select".

// File: ./MyMiddlewareESQueriesManager.ts
import import {
    AbstractESQueriesManager, 
    ESQueriesManagerUtils,
    IESQueriesManager, 
    ESFilterHandle
} from '@easyscrape/core';

class MyMiddlewareESQueriesManager 
extends AbstractESQueriesManager // define the default EasyScrape methods, you can override if you need.
implements IESQueriesManager // says you what method do you need to create
{
    // Your methods here
}
export default MiddlewareESQueriesManager;

Step 4: Build and Share your Middleware

Export your package. Write this in the package.json. Please, name your package using @easyscrape/ followed of your middleware name, like this:

{
    "name": "@easyscrape/mymiddleware",
    "version": "1.0.0",
    "main": "./MyMiddlewareESQueriesManager.js",
    // ...
}

You remember add your Middleware name on the list of this repository so that everyone can use it and know it.