1.1.0 • Published 19 days ago

upluscrawler v1.1.0

Weekly downloads
-
License
ISC
Repository
-
Last release
19 days ago

uplus-webcrawler

A web crawler/scraping script that can be configured to crawl many websites with different layouts.

Install the dependancies

  npm install

Documentation

class Crawler

The crawler class can be configured to search for different elements on a webpage.

stored

a list that holds the data scraped by the crawler

selectors

key value pairs of the desired value, and the css selector of the element on the webpage

const crawler = new Crawler({
    "club": '#react-app > div > div > div > div:nth-child(1) > div > div:nth-child(2) > div > div:nth-child(1) > h1',
    "email": '#react-app > div > div > div > div:nth-child(1) > div > div:nth-child(2) > div > div:nth-child(4) > div > div:nth-child(2) > div:nth-child(2)' // 'a[href*="@"]'
});

crawl(page, link)

Used to scrape the website of the given link. Scraped data is automatically stored in the stored variable

Parameters

page: Puppeteer page object

link: string - the url of the page to scrape

1.1.0

19 days ago

1.0.0

20 days ago