2.0.0 • Published 8 years ago

wget-parser v2.0.0

Weekly downloads
4
License
MIT
Repository
github
Last release
8 years ago

Table of Contents

Spider parser

.

Parses the spider output from wget into an object structure of links.

This object could then be processed further to create a tree structure of the hierarchy of a website such that sitemap generation could be implemented.

Tested using wget v1.15 on linux.

Usage

var parser = require('wget-parser')
  , buf = new Buffer(0);      // buffer should contain the spider output
console.dir(parser(buf));
  • parser.Parser: The parser class.
  • parser.Link: The class that represents a link.
  • parser.ParseStream: Parse stream class.

Streams support is available, see the test spec for example usage.

wget-parser

A program that reads from stdin and prints the result of the parse as JSON, exits with error code 1 if any broken links are found.

cat test/fixtures/mock.txt | wget-parser
cat test/fixtures/broken.txt | wget-parser; echo $?;

wget-spider

A program that performs a spider with wget and pipes the output to wget-parser:

wget-spider http://google.com

Output

Example output from the parser:

{
  "links": [
    {
      "url": {
        "protocol": "http:",
        "slashes": true,
        "auth": null,
        "host": "google.com",
        "port": null,
        "hostname": "google.com",
        "hash": null,
        "search": null,
        "query": null,
        "pathname": "/",
        "path": "/",
        "href": "http://google.com/"
      },
      "link": "http://google.com/",
      "line": "--2016-02-10 16:11:57--  http://google.com/"
    },
    {
      "url": {
        "protocol": "http:",
        "slashes": true,
        "auth": null,
        "host": "www.google.co.id",
        "port": null,
        "hostname": "www.google.co.id",
        "hash": null,
        "search": "?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
        "query": "gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
        "pathname": "/",
        "path": "/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
        "href": "http://www.google.co.id/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ"
      },
      "link": "http://www.google.co.id/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
      "line": "--2016-02-10 16:11:57--  http://www.google.co.id/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ"
    }
  ],
  "broken": []
}

Developer

Test

To run the test suite:

npm test

Cover

To generate code coverage run:

npm run cover

Lint

Run the source tree through jshint and jscs:

npm run lint

Clean

Remove generated files:

npm run clean

Readme

To build the readme file from the partial definitions:

npm run readme

Generated by mdp(1).

2.0.0

8 years ago

1.0.4

8 years ago

1.0.3

8 years ago

1.0.2

8 years ago

1.0.1

8 years ago