0.2.3 • Published 6 years ago
macabre v0.2.3
macabre
All purpose website scraper based on puppeteer. It will save to either mongo and Relational DB for now (Tested on MySQL). Used mongo native nodejs driver and also sequelize.
How it works, it will use puppeteer as headless browser to navigate to web page, and after defining the ETL function, it can be saved dynamically to the database of your choice. It will create the database and collection/table dynamically.
Getting Started
- Using
node
> 8. npm install macabre
oryarn add macabre
- Having the database of your choice up and running
- Just like in the example folder, this is the
nodejs
example:
const { Macabre } = require("macabre");
const config = {
url: "https://coinmarketcap.com/",
dialect: "mongo",
database: "test",
collection: "crypto"
};
const etl = async (page, next) => {
const array = [];
const currencies = await page.$$eval(".currency-name-container", el => {
return el.map(x => x.innerHTML);
});
const prices = await page.$$eval(".price", el => {
return el.map(x => x.innerHTML);
});
const volumes = await page.$$eval(".volume", el => {
return el.map(x => x.innerHTML);
});
currencies.forEach((n, i) => {
array.push({
name: n,
price: prices[i] || null,
volume: volumes[i] || null
});
});
next(null, array);
};
const test = new Macabre(config, etl);
More Information
- The ETL Function will return
page
as first parameter object which is the page object of puppeteer. - The second one will be
next
function to throw and error or save to database. - There are two parameter of
next
function, the first one is error, and the second one is the value to be saved. - Please refer to
puppeteer
andsequelize
for more information.
Configuration
url: string; // REQUIRED: URL of navigation
dialect: string; // REQUIRED: Database of your choice ["mongo", "mysql", "postgresql"]
database: string; // REQUIRED: Database name
collection: string; // REQUIRED: Collection or table name
username: string; // // Username of database
password: string; // Password of database
host: string; // Host of the database, default is '127.0.0.1'
pool: any; // Object of Pool config based on sequelize
port: number; // Port of the database, default is th default respected database port
storage: string; // For SQLite only