1.0.10 • Published 6 years ago

local-web-crawler v1.0.10

Weekly downloads
2
License
ISC
Repository
github
Last release
6 years ago

Local web crawler npm version

  • Simple web crawler for crawling pages on selected domains.
  • You can search for elements using XPaths.
  • You can generate map of your web, or you can looking for specific elements, like a links, inputs, etc..

Return values

Output datastructure is array:

[URL, {xpaths results, neibers}]

Parameters

Parameters are specified in constructor

  • url of home server (like a
  • max pages for crawle (for unlimited use -1)
  • array of xpath for look

Example usage

NOTE: Use full url, like (or without www):

http://www.example.com/

because

http://www.example.com/ != http://example.com/
const Crawler = require('./crawler')

const lookFor = ['//input', '//select']
const deep = -1
const homePage = 'http://oks.kiv.zcu.cz'

const crwlIns = new Crawler(homePage, deep, lookFor)

crwlIns.crawle()

console.log(crwlIns.getUrls()) // [string:url, json:output]

What next? What about filter only outputs whits contains some lookFor values? (I prefer standard way, like this)

let crawlerOut = crwlIns.getUrls()
let filtered = [];
for (let i = 0; i < crawlerOut.length; i++) {
	let tmp = crawlerOut[i]
	if (tmp[1].input || tmp[1].select) {
		filtered.push(tmp[0])
	}
}

(I prefer standard way, like this) Output of example:

[ [ 'http://oks.kiv.zcu.cz',
    { input: false,
      select: false,
      neibor: '["http://oks.kiv.zcu.cz/Prevodnik/Uvod","http://oks.kiv.zcu.cz/Forum/Uvod","http://oks.kiv.zcu.cz/OsobniCislo/Uvod"]' } ],
  [ 'http://oks.kiv.zcu.cz/Prevodnik/Uvod',
    { input: false,
      select: false,
      neibor: '["http://oks.kiv.zcu.cz/Prevodnik/Uvod","http://oks.kiv.zcu.cz/Prevodnik/Prevodnik","http://oks.kiv.zcu.cz/Prevodnik/Napoveda"]' } ],
  [ 'http://oks.kiv.zcu.cz/Forum/Uvod',
    { input: false,
      select: false,
      neibor: '["http://oks.kiv.zcu.cz/Forum/Uvod","http://oks.kiv.zcu.cz/Forum/Registrace","http://oks.kiv.zcu.cz/Forum/Napoveda"]' } ],
  [ 'http://oks.kiv.zcu.cz/OsobniCislo/Uvod',
    { input: false,
      select: false,
      neibor: '["http://oks.kiv.zcu.cz/OsobniCislo/Uvod","http://oks.kiv.zcu.cz/OsobniCislo/Generovani","http://oks.kiv.zcu.cz/OsobniCislo/Napoveda"]' } ],
  [ 'http://oks.kiv.zcu.cz/Prevodnik/Prevodnik',
    { input: true,
      select: true,
      neibor: '["http://oks.kiv.zcu.cz/Prevodnik/Uvod","http://oks.kiv.zcu.cz/Prevodnik/Prevodnik","http://oks.kiv.zcu.cz/Prevodnik/Napoveda"]' } ],
  [ 'http://oks.kiv.zcu.cz/Prevodnik/Napoveda',
    { input: false,
      select: false,
      neibor: '["http://oks.kiv.zcu.cz/Prevodnik/Uvod","http://oks.kiv.zcu.cz/Prevodnik/Prevodnik","http://oks.kiv.zcu.cz/Prevodnik/Napoveda"]' } ],
  [ 'http://oks.kiv.zcu.cz/Forum/Registrace',
    { input: true,
      select: true,
      neibor: '["http://oks.kiv.zcu.cz/Forum/Uvod","http://oks.kiv.zcu.cz/Forum/Registrace","http://oks.kiv.zcu.cz/Forum/Napoveda"]' } ],
  [ 'http://oks.kiv.zcu.cz/Forum/Napoveda',
    { input: false,
      select: false,
      neibor: '["http://oks.kiv.zcu.cz/Forum/Uvod","http://oks.kiv.zcu.cz/Forum/Registrace","http://oks.kiv.zcu.cz/Forum/Napoveda"]' } ],
  [ 'http://oks.kiv.zcu.cz/OsobniCislo/Generovani',
    { input: true,
      select: true,
      neibor: '["http://oks.kiv.zcu.cz/OsobniCislo/Uvod","http://oks.kiv.zcu.cz/OsobniCislo/Generovani","http://oks.kiv.zcu.cz/OsobniCislo/Napoveda"]' } ],
  [ 'http://oks.kiv.zcu.cz/OsobniCislo/Napoveda',
    { input: false,
      select: false,
      neibor: '["http://oks.kiv.zcu.cz/OsobniCislo/Uvod","http://oks.kiv.zcu.cz/OsobniCislo/Generovani","http://oks.kiv.zcu.cz/OsobniCislo/Napoveda"]' } ] ]

Problem with chrome driver

If you have some problem with chrome driver, try following commands

apt-get install default-jre
apt-get -f install

# install chrome
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install

Note: Tested on Debian 8.1, 64b and Windows 10, 64b

1.0.10

6 years ago

1.0.9

6 years ago

1.0.8

6 years ago

1.0.7

6 years ago

1.0.6

6 years ago

1.0.5

6 years ago

1.0.4

6 years ago

1.0.3

7 years ago

1.0.2

7 years ago

1.0.1

7 years ago

1.0.0

7 years ago