1.7.3 • Published 7 days ago

@xapp/arachne v1.7.3

Weekly downloads
86
License
Apache-2.0
Repository
-
Last release
7 days ago

@xapp/arachne

An extremely simple web crawler, based on puppeteer.

Usage in a Lambda

Chromium is required for puppeteer and is typically the limiting factor when trying to get it to run in a Lambda due to its size. This can be overcome with a Lambda Layer, specifically this community maintained layer.

You can include this layer directly in your SLS framework file or SAM Policy template.

A SLS framework example:

functions:
  eventReceiver:
    handler: dist/index.receiver
    layers:
      - "arn:aws:lambda:us-east-1:764866452798:layer:chrome-aws-lambda:31"

In your Lambda source:

import { Browser, LaunchOptions, BrowserConnectOptions, BrowserLaunchArgumentOptions } from "puppeteer";
import { Arachne, ArachnePage, ArachneRequest, MemoryRequestQueue } from "@xapp/arachne";

// Other imports and code
// The important part..

        let browser: Pick<Browser, "close" | "newPage">;
        // The try catch allows to still run it locally if you want, assuming you 
        // have chromium installed on your machine
        try {
            log().debug('Looking for chrome-aws-lambda');

            // eslint-disable-next-line @typescript-eslint/no-var-requires
            const chromium = require('@sparticuz/chrome-aws-lambda');

            browser = await chromium.puppeteer.launch({
                args: chromium.args,
                defaultViewport: chromium.defaultViewport,
                executablePath: await chromium.executablePath,
                headless: chromium.headless,
                ignoreHTTPSErrors: true,
            });
        } catch (e) {
            log().debug("Could not find chrome-aws-lambda layer");
            console.error(e);
        }

        const crawler = Arachne.crawler({
            stealth: true,
            launchOptions, /* timeout set to 5 seconds, default of 30 is too long */
            queue,
            browser,
            pageHandler: async (page: ArachnePage, request: ArachneRequest) => {
            //... handle page load
            }
        });

Lambda Layer Resources

1.7.3

7 days ago

1.7.2

2 months ago

1.7.1

2 months ago

1.7.0

3 months ago

1.6.2

3 months ago

1.6.1

3 months ago

1.5.0

3 months ago

1.3.4

3 months ago

1.3.3

3 months ago

1.3.1

4 months ago

1.2.2

4 months ago

1.2.0

4 months ago

1.1.1

4 months ago

1.1.0

4 months ago

0.8.5

9 months ago

0.8.4

9 months ago

0.8.6

9 months ago

0.10.0

6 months ago

0.9.0

9 months ago

0.7.1

10 months ago

0.9.1

8 months ago

0.6.3

12 months ago

0.5.6

12 months ago

0.6.4

11 months ago

0.5.5

12 months ago

0.5.0

1 year ago

0.7.0

11 months ago

0.6.1

12 months ago

0.6.0

12 months ago

0.5.1

12 months ago

0.3.0

2 years ago

0.4.5

2 years ago

0.4.4

2 years ago

0.4.7

1 year ago

0.4.6

1 year ago

0.4.0

2 years ago

0.4.3

2 years ago

0.2.4

3 years ago

0.2.3

3 years ago

0.2.1

3 years ago

0.2.0

3 years ago

0.1.2

4 years ago

0.1.1

4 years ago

0.1.0

4 years ago

0.0.8

4 years ago

0.0.7

4 years ago

0.0.6

4 years ago

0.0.5

4 years ago

0.0.4

4 years ago

0.0.3

4 years ago