0.35.2 • Published 4 years ago

@watchedcom/puppeteer v0.35.2

Weekly downloads
-
License
-
Repository
-
Last release
4 years ago

WATCHED.com puppeteer support

This module gives easy access to puppeteer to help scraping websites.

It has a special router integrated which allows fine grained control of how and if resources are losed.

Setup

The recommended way to setup puppeteer is with a few puppeteer-extra plugins enabled.

npm i --save @watchedcom/puppeteer puppeteer-core puppeteer-extra puppeteer-extra-plugin-anonymize-ua puppeteer-extra-plugin-stealth
# To install chromium
npm i --save puppetter

Inside your addon, add this code. In this example, two puppeteer-extra plugins are used.

import { setupPageRules } from "@watchedcom/puppeteer";
import puppeteer from "puppeteer-extra";
import AnonymizeUserAgentPlugin from "puppeteer-extra-plugin-anonymize-ua";
import StealthPlugin from "puppeteer-extra-plugin-stealth";

puppeteer.use(StealthPlugin({}));
puppeteer.use(AnonymizeUserAgentPlugin());

Usage

There are some utility functions which will make the usage of puppeteer a little more easy.

addon.registerActionHandler("item", async (input, ctx) => {
  const ruleOptions = {
    ctx,
    rules: [
      { url: [input.url, "example.com/api"], action: "allow" },
      { url: "example.com/js", action: "allow", cache: true }
    ],
    blockPopups: true
  };

  // Get a browser instance
  const browser = await puppeteer.launch();
  try {
    const page = (await browser.pages())[0];

    // Setup the page rules
    setupPageRules(page, ruleOptions);

    // Open the website and return it's content
    await page.open(input.url);
    return await page.content();
  } finally {
    // Close the browser
    await browser.close();
  }
});

Callbacks inside page rules

To catch one specific URL and return it from an action handler, the following recipe might help you:

addon.registerActionHandler("resolve", async (input, ctx) => {
  // outerPromise is a helper to handle this kind of situations.
  // See the documentation of this function for more infos.
  const p = outerPromise(5000);

  const pageRules = [
    { url: [input.url, "example.com/api"], action: "allow" },
    { url: "example.com/js", action: "allow", cache: true },
    {
      resourceType: "media",
      url: "example.com/mediapath/",
      action: async request => {
        // This action handler will be called during page load
        const url = await request.url();
        p.resolve(url);
      }
    }
  ];

  // Get a browser instance
  const browser = await puppeteer.launch();
  try {
    const page = (await browser.pages())[0];
    setupPageRules(page, ruleOptions);

    // When calling open, the action function will be triggered
    await page.open(input.url);

    // In case the page was loaded without calling the action
    // function, reject the promise
    p.promise.reject(new Error("Action handler was not called"));
  } finally {
    await browser.close();
  }

  // Wait for the promise
  return await p.promise;
});
0.35.2

4 years ago

0.35.1

4 years ago

0.35.0

4 years ago

0.34.1

4 years ago

0.34.0

4 years ago

0.32.1

4 years ago

0.33.0-alpha.1

4 years ago

0.33.0-alpha.0

4 years ago

0.33.0

4 years ago

0.32.0

4 years ago

0.31.5

4 years ago

0.31.4

4 years ago

0.31.3

4 years ago

0.31.2

4 years ago

0.30.0

4 years ago

0.31.1

4 years ago

0.31.0

4 years ago

0.29.4

4 years ago

0.29.3

4 years ago

0.29.2

4 years ago

0.29.0

4 years ago

0.29.1

4 years ago

0.28.0

4 years ago

0.27.2

4 years ago

0.27.1

4 years ago

0.27.0

4 years ago

0.26.10

4 years ago

0.26.9

4 years ago

0.26.8

4 years ago

0.26.7

4 years ago

0.26.6

4 years ago

0.26.5

4 years ago

0.26.4

4 years ago

0.26.3

4 years ago

0.26.2

4 years ago

0.26.1

4 years ago

0.26.0

4 years ago

0.25.2

4 years ago

0.25.1

4 years ago

0.25.0

4 years ago

0.24.0

4 years ago

0.23.2

4 years ago

0.23.1

4 years ago

0.22.2

4 years ago

0.23.0

4 years ago

0.22.1

4 years ago

0.22.0

4 years ago

0.21.4

4 years ago

0.21.3

4 years ago

0.21.2

4 years ago

0.21.0

4 years ago

0.20.2

4 years ago

0.19.4

4 years ago

0.20.1

4 years ago

0.20.0

4 years ago

0.19.3

4 years ago

0.19.1

4 years ago

0.19.2

4 years ago

0.19.0

4 years ago

0.18.3

4 years ago

0.18.2

4 years ago