1.0.0 • Published 4 years ago

puppetree v1.0.0

Weekly downloads
-
License
MIT
Repository
-
Last release
4 years ago

Puppetree

npm.io npm.io npm.io npm.io

Puppetree is a wrapper around puppeteer built in with JSDOM, to allow webscraping/crawling from node using the client side DOM architecture.

  • API usage is the same as with puppeteer; however, puppetree adds 5 new query selectors as you would use on the DOM.

  • Puppetree adds querySelector, querySelectorAll, getElementById, getElementsByClassName, and getElementsByTagName

  • Each returning a HybridElement of puppeteers ElementHandle and the DOMs HTMLElement.

Getting Started

const puppetree = require('puppetree');

const browser = await puppetree.launch();
const hybridPage = await browser.newPage();
await hybridPage.goto(url);

\.querySelector

const $hyperlink = await hybridPage.querySelector('a.mylink');
console.log($hyperLink.href) // Logs HTMLAnchorElement href

\.querySelectorAll

const $inputs = await hybridPage.querySelectorAll('div.container input');
for (const $input of $inputs) {
    console.log($input.value) // Logs HTMLInputElement value
}

\.getElementById

const $button = await hybridPage.getElementById('search');
await $button.click(); // Uses ElementHandle click api

\.getElementsByClassName

const $people = await hybridPage.getElementsByClassName('person');
for (const $person of $people) {
    await $person.hover() // Uses ElementHandle hover api
}

\.getElementsByTagName

const $rows = await hybridPage.getElementsByTagName('tr');
for (const $row of $rows) {
    const $p = await $row.querySelector('td p');
    console.log($p.text); // Uses HTMLParagraphElement
}