0.0.11 • Published 1 year ago

analyzer-ts v0.0.11

Weekly downloads
-
License
ISC
Repository
-
Last release
1 year ago

Page analyzer

Page Analyzer is an actor that helps its users find data sources in a website. Its main purpose is to help a user quickly analyze their options for extracting data from a website.

When to use page analyzer

Page analyzer can be used as a first step in a web scraper developement. It's goal is to automate the process of analyzing a website manually using tools like browsers developer tools or Postman to: 1. Analyze the structure of the website 2. Find the CSS selectors of HTML elements containing a keyword 3. Find a keywords in additional sources that might not be visible on the screen like JSON+LD, metadata, schema.org data 4. Observe and replicate XHR requests that might contain the data a user wants to scrape

Input

The input consists of: 1. URL of a website to be analyzed. 2. Keywords - an array of strings the analyzer will try to find in the source code of the website.

Input can be set using the visual input UI through Apify console, or using INPUT.json file inside the actors default key-value store.

{
    // url of a  website to be analyzed
    "url": "http://example.com",
    // array of strings too look for on the website
    "keywords": [
        "About us",
        // numbers are also passed as strings
        "125"
        ],
    // proxy configuration
    "proxyConfig": {
        "useApifyProxy": true
    }
}

Output

Output of this actor is saved in Apify key-value store of the particular actor run.

Results of the analysis are can be observed by opening the DASHBOARD.html file.

Analyzer also saves other files containing additional analysis data. To learn more about them, please read how analyzer works.