1.1.0 • Published 6 years ago
sqlite-simplecrawler-queue v1.1.0
SQLite queue for Simplecrawler
This is an implementation of FetchQueue Interface for simplecrawler queue with SQLite usage as backend.
Preferences: Possibility to pause/stop/kill/terminate running job without queue state losing
Installation
Install from github
npm install git+https://github.com/LeMoussel/SQLite-simplecrawler-queue#masterInstall from npm
npm install --save SQLite-simplecrawler-queueUsage
All you need is the database information such as database file
try {
const sqliteDatabaseName = 'crawlsite.sqlite3'
// Drop Database if exist
SQLiteFetchQueue.dropDatabase(sqliteDatabaseName)
// Connect to a disk file database, you pass the path to the database file.
const crawlerQueue = new SQLiteFetchQueue(sqliteDatabaseName)
// Initialization of the database
crawlerQueue.init()
// Initializing simplecrawler
const crawler = new Crawler('http://example.com')
crawler.maxDepth = 3
crawler.allowInitialDomainChange = false
crawler.filterByDomain = true
crawler.queue = crawlerQueue
crawler.start()
} catch (err) {
console.error(err)
}Test
npm test. Check test folder for extra usages.
Additional utilities
- Drop the queue using
dropQueuemethod.
// Connect to a disk file database, you pass the path to the database file.
const crawlerQueue = new SQLiteFetchQueue('sqliteDatabaseName', 'queue')
// Drop 'queue' table
crawlerQueue.dropQueue
// Initialization of the database
crawlerQueue.init()- Drop the database using
SQLiteFetchQueue.dropDatabasestatic method.
// Drop Database if exist
SQLiteFetchQueue.dropDatabase('sqliteDatabaseName')
// Connect to a disk file database, you pass the path to the database file.
const crawlerQueue = new SQLiteFetchQueue('sqliteDatabaseName', 'queue')
// Initialization of the database
crawlerQueue.init()- Export the flexible queue system to disk in a JSON file.
// Flexible queue system which can be frozen to disk
crawlerQueue.freeze('./test/www.test.com.sqlite3.json', (err, result) => {
if (err) {
console.error(err)
}
console.log(`Number of rows saved to JSON File: ${result}`)
})- Import from a frozen JSON file on disk.
// Flexible queue system which can be defrosted from disk
crawlerQueue.defrost('./test/www.test.com.sqlite3.json', (err, result) => {
if (err) {
console.error(err)
process.exit(1)
}
console.log(`Number of rows inserted: ${result}`)
})Resources
License
MIT licensed and all it's dependencies are MIT or BSD licensed.