1.9.0 • Published 6 months ago

@xapp/arachne v1.9.0

Weekly downloads
86
License
Apache-2.0
Repository
-
Last release
6 months ago

@xapp/arachne

An extremely simple web crawler, based on puppeteer.

Usage in a Lambda

Chromium is required for puppeteer and is typically the limiting factor when trying to get it to run in a Lambda due to its size. This can be overcome with a Lambda Layer, specifically this community maintained layer.

You can include this layer directly in your SLS framework file or SAM Policy template.

A SLS framework example:

functions:
  eventReceiver:
    handler: dist/index.receiver
    layers:
      - "arn:aws:lambda:us-east-1:764866452798:layer:chrome-aws-lambda:31"

In your Lambda source:

import { Browser, LaunchOptions, BrowserConnectOptions, BrowserLaunchArgumentOptions } from "puppeteer";
import { Arachne, ArachnePage, ArachneRequest, MemoryRequestQueue } from "@xapp/arachne";

// Other imports and code
// The important part..

        let browser: Pick<Browser, "close" | "newPage">;
        // The try catch allows to still run it locally if you want, assuming you 
        // have chromium installed on your machine
        try {
            log().debug('Looking for chrome-aws-lambda');

            // eslint-disable-next-line @typescript-eslint/no-var-requires
            const chromium = require('@sparticuz/chrome-aws-lambda');

            browser = await chromium.puppeteer.launch({
                args: chromium.args,
                defaultViewport: chromium.defaultViewport,
                executablePath: await chromium.executablePath,
                headless: chromium.headless,
                ignoreHTTPSErrors: true,
            });
        } catch (e) {
            log().debug("Could not find chrome-aws-lambda layer");
            console.error(e);
        }

        const crawler = Arachne.crawler({
            stealth: true,
            launchOptions, /* timeout set to 5 seconds, default of 30 is too long */
            queue,
            browser,
            pageHandler: async (page: ArachnePage, request: ArachneRequest) => {
            //... handle page load
            }
        });

Lambda Layer Resources

1.9.0

6 months ago

1.8.13

10 months ago

1.8.8

10 months ago

1.8.7

11 months ago

1.8.12

10 months ago

1.8.6

11 months ago

1.8.3

11 months ago

1.8.2

12 months ago

1.8.0

1 year ago

1.7.7

1 year ago

1.7.5

1 year ago

1.7.4

1 year ago

1.7.3

1 year ago

1.7.2

1 year ago

1.7.1

1 year ago

1.7.0

1 year ago

1.6.2

1 year ago

1.6.1

1 year ago

1.5.0

1 year ago

1.3.4

1 year ago

1.3.3

1 year ago

1.3.1

1 year ago

1.2.2

1 year ago

1.2.0

1 year ago

1.1.1

2 years ago

1.1.0

2 years ago

0.8.5

2 years ago

0.8.4

2 years ago

0.8.6

2 years ago

0.10.0

2 years ago

0.9.0

2 years ago

0.7.1

2 years ago

0.9.1

2 years ago

0.6.3

2 years ago

0.5.6

2 years ago

0.6.4

2 years ago

0.5.5

2 years ago

0.5.0

2 years ago

0.7.0

2 years ago

0.6.1

2 years ago

0.6.0

2 years ago

0.5.1

2 years ago

0.3.0

3 years ago

0.4.5

3 years ago

0.4.4

3 years ago

0.4.7

3 years ago

0.4.6

3 years ago

0.4.0

3 years ago

0.4.3

3 years ago

0.2.4

4 years ago

0.2.3

5 years ago

0.2.1

5 years ago

0.2.0

5 years ago

0.1.2

5 years ago

0.1.1

5 years ago

0.1.0

5 years ago

0.0.8

5 years ago

0.0.7

5 years ago

0.0.6

5 years ago

0.0.5

5 years ago

0.0.4

5 years ago

0.0.3

5 years ago