1.0.2 • Published 5 years ago

scraperscript v1.0.2

Weekly downloads
5
License
MIT
Repository
github
Last release
5 years ago

ScraperScript

Travis Downloads Node Version XO code style

ScraperScript is a query language for Web Scraping

Installation

Module available through the npm registry. It can be installed using the npm or yarn command line tools.

# NPM
npm install scraperscript --global
# Or Using Yarn
yarn global add scraperscript

Documentation

Use the command scraperscript myfile or server

Example file.

@https://helloword.site/list
!! A comment ...
- names: html >> body >> div >> h2 @> {number, text, bold} :array
- hasTitle: html >> head >> title == " my string " :boolean
- title: html >> head >> title :string

This return an json:

"error": false,
"errorsMsg": [],
"names": [
	{
		"number": 0,
		"text": "Tiago"
	},
	{
		"number": 0,
		"text": "James"
	}
],
"hasTitle": true,
"title": "my string"

Syntax

Place the URL in the first line: @http://myurl.com

Other lines: - key: query :type

PS: Space is important.

Key

Name

Rules:

  • Use at the beginning of the line
  • Format - key:

Example: - name:

Type

Return type

Rules:

  • Use at the end of the line
  • Format :type

Types:

  • array
  • object
  • boolean
  • string
  • number

Example: :string

Query

String

" my string "

NOTE: "my string" is invalid

Comment

!! my comment in ScrapperScript

Elements

nameOfHtmlElementOne >> nameOfHtmlElementTwo

Map elements String

nameOfHtmlElementOne @> nameOfSubHtmlElement

Map elements Array

nameOfHtmlElementOne @> [nameOfSubHtmlElement]

Map elements Object

nameOfHtmlElementOne @> {nameOfIndex, nameOfData, nameOfSubHtmlElement}

Addition

nameOfHtmlElementOne ++ nameOfHtmlElementTwo

Replace

nameOfHtmlElementOne -- nameOfHtmlElementTwo

Equal comparison or Different

nameOfHtmlElementOne == nameOfHtmlElementTwo

nameOfHtmlElementOne ~= nameOfHtmlElementTwo

OR

nameOfHtmlElementOne || nameOfHtmlElementTwo

Tests

To run the test suite, first install the dependencies, then run test:

# NPM
npm test
# Or Using Yarn
yarn test

Dependencies

  • axios: Promise based HTTP client for the browser and node.js
  • cheerio: Tiny, fast, and elegant implementation of core jQuery designed specifically for the server

Dev Dependencies

  • body-parser: Node.js body parsing middleware
  • express: Fast, unopinionated, minimalist web framework
  • mocha: simple, flexible, fun test framework
  • xo: JavaScript happiness style linter ❤️

Contributors

Pull requests and stars are always welcome. For bugs and feature requests, please create an issue. List of all contributors.

License

MIT © Tiago Danin