1.1.6 • Published 2 years ago

apify-click-events v1.1.6

Weekly downloads
-
License
ISC
Repository
github
Last release
2 years ago

ClickManager

TypeScript

npm

Table of Contents

Installation

npm i apify-click-events

Importing

ES6+

import { ClickManager } from 'apify-click-events';

ES5-

const { ClickManager } = require('apify-click-events');

About

When scraping a website, though it's not ideal, sometimes you just have to automate clicking the page. The issue is that too many sites have click traps which open up ads, new tabs, or do certain actions which you don't want when an element is clicked. This can be a result of an event listener on the window object, or the propogation of the click event on the target element.

With this package, easily whitelist/blacklist elements matching certain selectors to ensure the reliability of your actor's clicks, and to eliminate any errors/retries related to being redirected to another page, triggering an unwanted event, or not being able to click an element.

Usage

new ClickManager(options)

NameTypeDefaultDescription
modeMode'WHITELIST'The mode of the ClickManager. Either 'WHITELIST' or 'BLACKLIST'
whiteliststring[][]Selectors to whitelist on every page ClickManager is used. Page-specific selectors can be added to just one page load using the addToWhiteList() method.
blackliststring[][]Selectors to blacklist on every page. Page-specific selectors can be blacklisted using addToBlackList()
blockWindowClickListenersnumber (0, 1, or 2)2The intensity of blocking of window click listeners. 0 - no blocking. 2 - will only be fired if a whitelisted/non-blacklisted selector is clicked. 3 - no click related listeners will even be added to the window.
blockWindowOpenMethodbooleanfalseWhether or not to prevent the window.open method from firing.
allowDebuggerbooleantrueSets the window.debugger to null, which is usually enough to bypass DevTools blocks.
enableOnPagesIncludingstring[][]REQUIRED: Provide an array of strings. Any links matching any of the strings will get the ClickManager script injected into them. The blockCommonAds and optimize options still apply to all pages that go through the crawler.
blockCommonAdsbooleanfalseAutomatically block any requests the browser makes which matches a pre-made list of common ad providers.
optimizebooleanfalseAutomatically block requests for any unnecessary resources such as CSS, images, and gifs.
stopClickPropogationbooleantrueStop whitelisted clicks from propogating into other elements. Sometimes needs to be false.

whitelist and blacklist expect regular CSS selectors. Special selectors exclusively supported in PlayWright will not be valid.

Usage:

const clickManager = new ClickManager({
    whitelist: ['div.button-red'],
    blockWindowClickListeners: 1,
    enableOnPagesIncluding: ['*'],
    stopClickPropogation: false,
});

Note: If you want to use ClickManager on all pages, just use '*' within your enableOnPagesIncluding array. It will match everything.

Injecting the script using injectScripts()

Within your main file, first instantiate an instance of the ClickManager class with your custom options, then spread the return value of its injectScripts() method into your crawler's configuration.

// Import "modes" as well to avoid typos
const { ClickManager, modes } = require('apify-click-events');

Apify.main(async () => {
    // Instantiate the class with your custom options
    const clickManager = new ClickManager({
        mode: modes.BLACKLIST,
        blacklist: ['#accept-choices'],
        blockWindowOpenMethod: true,
        enableOnPagesIncluding: ['w3schools'],
    });

    const requestList = await Apify.openRequestList('start-urls', [
        { url: 'https://w3schools.com' },
    ]);

    const crawler = new Apify.PlaywrightCrawler({
        // injectScripts returns crawler options. Spread it out
        ...clickManager.injectScripts(),
        requestList,
        launchContext: {
            launcher: firefox,
        },
        handlePageFunction: async ({ page }) => {
            await ClickManager.waitForInject(page);

            ...
        },
    });

    await crawler.run();
});

await ClickManager.waitForInject(page, removeElement)

(page: Page, removeElement?: boolean) => Promise<void>

Waits for ClickManager's script to be injected, then logs a confirmation once it's been loaded. This will throw an error if the page's window.location.url doesn't match any of the enableOnPagesIncluding strings.

removeElement defaults to false. This usually isn't necessary to set, but there is a chance that the status element will cause issues. If so, set the second parameter to true.

await ClickManager.addToWhiteList(page, selectors)

(page: Page, selectors: string[]) => Promise<void>

Add page-specific selectors to the whitelist. This will do absolutely nothing if 'BLACKLIST' mode is being used.

await ClickManager.addToBlackList(page, selectors)

(page: Page, selectors: string[]) => Promise<void>

Add page-specific selectors to the blacklist. This will do absolutely nothing if 'WHITELIST' mode is being used.

Note: When you add a selector using addToWhiteList or addToBlackList, it is only added to the page, and will not be whitelisted/blacklisted on other pages. The only selectors which are added to the list for every single page are the static ones which you define within ClickManagerOptions

await ClickManager.checkLists(page)

(page: Page) => Record<string, string[]>

Returns the currently whitelisted/blacklisted selectors for the certain page.

Utilities

await ClickManager.mapClick(page, selector, callback)

(page: Page, selector: string, callback: MapClickCallback) => Promise<unknown[]>

Super useful when you need to click multiple elements that match the same selector, then collect some data after each click (perhaps due to content dynamically changing on the page).

The callback function takes the Page as a parameter (post-click), and expects some value to be returned from it. Once all the selectors have been looped through and clicked, the results will be returned as an array.

export type MapClickCallback = (page: Page) => Promise<unknown>;

Usage:

const arr = await ClickManager.mapClick(page, 'a.button', async (pg) => {
    const tabTitle = await pg.$('div#tab_title');
    const title = await tabTitle.textContent();
    return title;
});

console.log(arr); // array of all the tab titles

await ClickManager.whiteListAndClick(page, selector, option)

(page: Page, selector: string, option?: 'PLAYWRIGHT' | 'BROWSER') => Promise<void>

Whitelist a selector on the page, and then click it. The option field defines how the click should be performed.

await ClickManager.click(page, selector)

(page: Page, selector: string) => Promise<void>

Different from the PlayWright page.click() function. Checks the whitelist/blacklist for whether or not the selector can even be clicked, then clicks it, or throws an error.

await ClickManager.blockWindowOpenMethod(page)

(page: Page) => Promise<void>

Rather than blocking window.open on all pages, you can set blockWindowOpenMethod in ClickManagerOptions to false, and use this method on a page prior to doing any clicks that might result in window.open being called.

await ClickManager.displayHiddenElement(page, selector, classNames)

(page: Page, selector: string, classNames?: string) => Promise<void>

Sometimes, instead of actually clicking the element in order to make is visible on the page, all you need to do is add the "active", "open", or "show" class names, or set its "visibility" to "visible" or "display" to "block". Use this utility to do all four of those things at once to the first element on the page matching the specified selector. Add extra class names to the element with classNames (usually not necessary).

1.1.6

2 years ago

1.1.5

2 years ago

1.1.4

2 years ago

1.1.3

2 years ago

1.1.2

2 years ago

1.1.1

2 years ago

1.1.0

2 years ago

1.0.9

2 years ago

1.0.8

2 years ago

1.0.7

2 years ago

1.0.6

2 years ago

1.0.5

2 years ago

1.0.4

2 years ago

1.0.3

2 years ago

1.0.2

2 years ago

1.0.1

2 years ago

1.0.0

2 years ago