1.1.0 • Published 4 years ago
sqlite-simplecrawler-queue v1.1.0
SQLite queue for Simplecrawler
This is an implementation of FetchQueue Interface for simplecrawler queue with SQLite usage as backend.
Preferences: Possibility to pause/stop/kill/terminate running job without queue state losing
Installation
Install from github
npm install git+https://github.com/LeMoussel/SQLite-simplecrawler-queue#master
Install from npm
npm install --save SQLite-simplecrawler-queue
Usage
All you need is the database information such as database file
try {
const sqliteDatabaseName = 'crawlsite.sqlite3'
// Drop Database if exist
SQLiteFetchQueue.dropDatabase(sqliteDatabaseName)
// Connect to a disk file database, you pass the path to the database file.
const crawlerQueue = new SQLiteFetchQueue(sqliteDatabaseName)
// Initialization of the database
crawlerQueue.init()
// Initializing simplecrawler
const crawler = new Crawler('http://example.com')
crawler.maxDepth = 3
crawler.allowInitialDomainChange = false
crawler.filterByDomain = true
crawler.queue = crawlerQueue
crawler.start()
} catch (err) {
console.error(err)
}
Test
npm test
. Check test folder for extra usages.
Additional utilities
- Drop the queue using
dropQueue
method.
// Connect to a disk file database, you pass the path to the database file.
const crawlerQueue = new SQLiteFetchQueue('sqliteDatabaseName', 'queue')
// Drop 'queue' table
crawlerQueue.dropQueue
// Initialization of the database
crawlerQueue.init()
- Drop the database using
SQLiteFetchQueue.dropDatabase
static method.
// Drop Database if exist
SQLiteFetchQueue.dropDatabase('sqliteDatabaseName')
// Connect to a disk file database, you pass the path to the database file.
const crawlerQueue = new SQLiteFetchQueue('sqliteDatabaseName', 'queue')
// Initialization of the database
crawlerQueue.init()
- Export the flexible queue system to disk in a JSON file.
// Flexible queue system which can be frozen to disk
crawlerQueue.freeze('./test/www.test.com.sqlite3.json', (err, result) => {
if (err) {
console.error(err)
}
console.log(`Number of rows saved to JSON File: ${result}`)
})
- Import from a frozen JSON file on disk.
// Flexible queue system which can be defrosted from disk
crawlerQueue.defrost('./test/www.test.com.sqlite3.json', (err, result) => {
if (err) {
console.error(err)
process.exit(1)
}
console.log(`Number of rows inserted: ${result}`)
})
Resources
License
MIT licensed and all it's dependencies are MIT or BSD licensed.