1.0.10 • Published 6 years ago
local-web-crawler v1.0.10
Local web crawler
- Simple web crawler for crawling pages on selected domains.
- You can search for elements using XPaths.
- You can generate map of your web, or you can looking for specific elements, like a links, inputs, etc..
Return values
Output datastructure is array:
[URL, {xpaths results, neibers}]
Parameters
Parameters are specified in constructor
- url of home server (like a
- max pages for crawle (for unlimited use -1)
- array of xpath for look
Example usage
NOTE: Use full url, like (or without www):
http://www.example.com/
because
http://www.example.com/ != http://example.com/
const Crawler = require('./crawler')
const lookFor = ['//input', '//select']
const deep = -1
const homePage = 'http://oks.kiv.zcu.cz'
const crwlIns = new Crawler(homePage, deep, lookFor)
crwlIns.crawle()
console.log(crwlIns.getUrls()) // [string:url, json:output]
What next?
What about filter only outputs whits contains some lookFor
values?
(I prefer standard way, like this)
let crawlerOut = crwlIns.getUrls()
let filtered = [];
for (let i = 0; i < crawlerOut.length; i++) {
let tmp = crawlerOut[i]
if (tmp[1].input || tmp[1].select) {
filtered.push(tmp[0])
}
}
(I prefer standard way, like this) Output of example:
[ [ 'http://oks.kiv.zcu.cz',
{ input: false,
select: false,
neibor: '["http://oks.kiv.zcu.cz/Prevodnik/Uvod","http://oks.kiv.zcu.cz/Forum/Uvod","http://oks.kiv.zcu.cz/OsobniCislo/Uvod"]' } ],
[ 'http://oks.kiv.zcu.cz/Prevodnik/Uvod',
{ input: false,
select: false,
neibor: '["http://oks.kiv.zcu.cz/Prevodnik/Uvod","http://oks.kiv.zcu.cz/Prevodnik/Prevodnik","http://oks.kiv.zcu.cz/Prevodnik/Napoveda"]' } ],
[ 'http://oks.kiv.zcu.cz/Forum/Uvod',
{ input: false,
select: false,
neibor: '["http://oks.kiv.zcu.cz/Forum/Uvod","http://oks.kiv.zcu.cz/Forum/Registrace","http://oks.kiv.zcu.cz/Forum/Napoveda"]' } ],
[ 'http://oks.kiv.zcu.cz/OsobniCislo/Uvod',
{ input: false,
select: false,
neibor: '["http://oks.kiv.zcu.cz/OsobniCislo/Uvod","http://oks.kiv.zcu.cz/OsobniCislo/Generovani","http://oks.kiv.zcu.cz/OsobniCislo/Napoveda"]' } ],
[ 'http://oks.kiv.zcu.cz/Prevodnik/Prevodnik',
{ input: true,
select: true,
neibor: '["http://oks.kiv.zcu.cz/Prevodnik/Uvod","http://oks.kiv.zcu.cz/Prevodnik/Prevodnik","http://oks.kiv.zcu.cz/Prevodnik/Napoveda"]' } ],
[ 'http://oks.kiv.zcu.cz/Prevodnik/Napoveda',
{ input: false,
select: false,
neibor: '["http://oks.kiv.zcu.cz/Prevodnik/Uvod","http://oks.kiv.zcu.cz/Prevodnik/Prevodnik","http://oks.kiv.zcu.cz/Prevodnik/Napoveda"]' } ],
[ 'http://oks.kiv.zcu.cz/Forum/Registrace',
{ input: true,
select: true,
neibor: '["http://oks.kiv.zcu.cz/Forum/Uvod","http://oks.kiv.zcu.cz/Forum/Registrace","http://oks.kiv.zcu.cz/Forum/Napoveda"]' } ],
[ 'http://oks.kiv.zcu.cz/Forum/Napoveda',
{ input: false,
select: false,
neibor: '["http://oks.kiv.zcu.cz/Forum/Uvod","http://oks.kiv.zcu.cz/Forum/Registrace","http://oks.kiv.zcu.cz/Forum/Napoveda"]' } ],
[ 'http://oks.kiv.zcu.cz/OsobniCislo/Generovani',
{ input: true,
select: true,
neibor: '["http://oks.kiv.zcu.cz/OsobniCislo/Uvod","http://oks.kiv.zcu.cz/OsobniCislo/Generovani","http://oks.kiv.zcu.cz/OsobniCislo/Napoveda"]' } ],
[ 'http://oks.kiv.zcu.cz/OsobniCislo/Napoveda',
{ input: false,
select: false,
neibor: '["http://oks.kiv.zcu.cz/OsobniCislo/Uvod","http://oks.kiv.zcu.cz/OsobniCislo/Generovani","http://oks.kiv.zcu.cz/OsobniCislo/Napoveda"]' } ] ]
Problem with chrome driver
If you have some problem with chrome driver, try following commands
apt-get install default-jre
apt-get -f install
# install chrome
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install
Note: Tested on Debian 8.1, 64b and Windows 10, 64b