Analyzer-ts NPM

Page analyzer

Page Analyzer is an actor that helps its users find data sources in a website. Its main purpose is to help a user quickly analyze their options for extracting data from a website.

When to use page analyzer

Page analyzer can be used as a first step in a web scraper developement. It's goal is to automate the process of analyzing a website manually using tools like browsers developer tools or Postman to: 1. Analyze the structure of the website 2. Find the CSS selectors of HTML elements containing a keyword 3. Find a keywords in additional sources that might not be visible on the screen like JSON+LD, metadata, schema.org data 4. Observe and replicate XHR requests that might contain the data a user wants to scrape

Input

The input consists of: 1. URL of a website to be analyzed. 2. Keywords - an array of strings the analyzer will try to find in the source code of the website.

Input can be set using the visual input UI through Apify console, or using INPUT.json file inside the actors default key-value store.

{
    // url of a  website to be analyzed
    "url": "http://example.com",
    // array of strings too look for on the website
    "keywords": [
        "About us",
        // numbers are also passed as strings
        "125"
        ],
    // proxy configuration
    "proxyConfig": {
        "useApifyProxy": true
    }
}