0.2.3 • Published 8 years ago
macabre v0.2.3
macabre
All purpose website scraper based on puppeteer. It will save to either mongo and Relational DB for now (Tested on MySQL). Used mongo native nodejs driver and also sequelize.
How it works, it will use puppeteer as headless browser to navigate to web page, and after defining the ETL function, it can be saved dynamically to the database of your choice. It will create the database and collection/table dynamically.
Getting Started
- Using
node> 8. npm install macabreoryarn add macabre- Having the database of your choice up and running
- Just like in the example folder, this is the
nodejsexample:
const { Macabre } = require("macabre");
const config = {
url: "https://coinmarketcap.com/",
dialect: "mongo",
database: "test",
collection: "crypto"
};
const etl = async (page, next) => {
const array = [];
const currencies = await page.$$eval(".currency-name-container", el => {
return el.map(x => x.innerHTML);
});
const prices = await page.$$eval(".price", el => {
return el.map(x => x.innerHTML);
});
const volumes = await page.$$eval(".volume", el => {
return el.map(x => x.innerHTML);
});
currencies.forEach((n, i) => {
array.push({
name: n,
price: prices[i] || null,
volume: volumes[i] || null
});
});
next(null, array);
};
const test = new Macabre(config, etl);More Information
- The ETL Function will return
pageas first parameter object which is the page object of puppeteer. - The second one will be
nextfunction to throw and error or save to database. - There are two parameter of
nextfunction, the first one is error, and the second one is the value to be saved. - Please refer to
puppeteerandsequelizefor more information.
Configuration
url: string; // REQUIRED: URL of navigation
dialect: string; // REQUIRED: Database of your choice ["mongo", "mysql", "postgresql"]
database: string; // REQUIRED: Database name
collection: string; // REQUIRED: Collection or table name
username: string; // // Username of database
password: string; // Password of database
host: string; // Host of the database, default is '127.0.0.1'
pool: any; // Object of Pool config based on sequelize
port: number; // Port of the database, default is th default respected database port
storage: string; // For SQLite only