1.0.0 • Published 4 years ago
puppetree v1.0.0
Puppetree
Puppetree is a wrapper around
puppeteer
built in withJSDOM
, to allow webscraping/crawling from node using the client side DOM architecture.
API usage is the same as with puppeteer; however, puppetree adds 5 new query selectors as you would use on the DOM.
Puppetree adds
querySelector
,querySelectorAll
,getElementById
,getElementsByClassName
, andgetElementsByTagName
Each returning a HybridElement of puppeteers ElementHandle and the DOMs HTMLElement.
Getting Started
const puppetree = require('puppetree');
const browser = await puppetree.launch();
const hybridPage = await browser.newPage();
await hybridPage.goto(url);
\.querySelector
const $hyperlink = await hybridPage.querySelector('a.mylink');
console.log($hyperLink.href) // Logs HTMLAnchorElement href
\.querySelectorAll
const $inputs = await hybridPage.querySelectorAll('div.container input');
for (const $input of $inputs) {
console.log($input.value) // Logs HTMLInputElement value
}
\.getElementById
const $button = await hybridPage.getElementById('search');
await $button.click(); // Uses ElementHandle click api
\.getElementsByClassName
const $people = await hybridPage.getElementsByClassName('person');
for (const $person of $people) {
await $person.hover() // Uses ElementHandle hover api
}
\.getElementsByTagName
const $rows = await hybridPage.getElementsByTagName('tr');
for (const $row of $rows) {
const $p = await $row.querySelector('td p');
console.log($p.text); // Uses HTMLParagraphElement
}
1.0.0
4 years ago