actor-jomashop-crawler v0.0.1
Apify Actor - Jomashop
Overview
We use the Apify SDK to build and runn our scrapers. Within this template, we're using Puppeteer as the crawler of choice.
Setup
First, be sure you have Node v12 installed.
Generate a Github Access Token
Set a new shell env variable,
GH_TOKEN
, with your new Github Access Token.Copy
.npmrc-dist
to.npmrc
Then run
npm install
.
You'll need to a login to our Apify account.
- Log into Apify platform:
npm run apify:login
Proxies
Out of the box, this actor template has Apify proxy usage enabled.
You will need to go to your Apify Proxy dashboard, take the password and set as the environment variable APIFY_PROXY_PASSWORD
.
If you choose to not use a proxy you'll need to comment out proxyConfiguration
in the instantiation parameters of PuppeteerCrawler
in the file src/main.ts
.
To use custom proxies check out the documentation.
Note: If you use a trial account, you'll get an error when attempting to use Apify Proxies. Access to Apify Proxy during the trial period is allowed only from actors running on the Apify platform
Running
You can run the actor locally via: npm run dev
Note: To run locally, you'll need to copy .env-dist
to .env
and add the appropriate User ID and API token, both of which can be found on the Apify account page.
Or you can push the actor to the Apify Cloud Platform: npm run apify:push
Scheduling
The Apify SDK does not currently handle scheduling, however scheduling is available as a CLI tool within this template.
Getting started
You must first build this script by running:
npm run build
In order to use it from any path you must then run:
npm link
this will add apify-schedule
as a command to your command line and make it available to use.
Usage
You can view the usage help menu by running:
apify-schedule
The following commands are available via this tool:
Usage: apify-schedule [options]
Options:
-l, --list [number] list schedules (default 100)
-g, --get <scheduleId> get schedule using schedule id
-c, --create run schedule creation wizard
-u, --update <scheduleId> update specified schedule
-d, --delete <scheduleId> delete specified schedule
--asc sort by createdAt field in ascending order
-h, --help display help for command
Currently assigning Actors and Actor Tasks needs to be done manually via Apify Cloud Platform.
Puppeteer Extras (plugins)
puppeteer-extra has been added for extension support. e.g. To add a plugin that can solve captcha: you can do so with puppeteer-extra-plugin-recaptcha which natively supports 2captcha.
PuppeteerCrawler Template
This template is a production ready boilerplate for developing with PuppeteerCrawler
.
Use this to bootstrap your projects using the most up-to-date code.
If you're looking for examples or want to learn more visit:
Documentation reference
4 years ago