Google-crawler NPM

Google Crawler

This project is an effort to turn a publicly available paste into an NPM package. The original paste is available at the following URL:

http://pastebin.com/VPSC1ndf

It's an express middleware that will spit raw HTML to Google's crawler according to their specification:

https://developers.google.com/webmasters/ajax-crawling/

It allows indexing Javascript heavy applications (SPA) by providing an HTML rendering of pages when they are requested with a special _escaped_fragment_ parameter.

It relies on a PhantomJS backend to run the frontend's Javascript.

Installation

This module is available through NPM:

npm install --save google-crawler

Usage

var express = require('express');
var google_crawler = require('google-crawler');

var server = imports.express();

server.use(google_crawler({
  scraper: 'http://scraper.example.com/img/'
}));

// Continue setting things up..

On your frontend, you'll want to include the following element:

<meta name="fragment" content="!">

Configuration

The middleware accepts the following parameters:

shebang: a boolean to determine wheter or not to build URLs with a shebang.
scraper: an URL pointing to the PhantomJS backend.

Sample backend

PhantomJS backends are expected to be built with phantom-crawler:

https://bitbucket.org/wizzbuy/phantom-crawler

Here's a sample crawler:

phantom.injectJs('crawler/crawler.js');

new Crawler()
  .chrome()
  .debug()
  .crawl(function () {

    return [
      '<!DOCTYPE html>',
      '<html>',
        document.head.outerHTML,
        document.body.outerHTML,
      '</html>'
    ].join('\n');

  })
  .serve(require('system').env.PORT || 8888);

express middleware google escaped_fragment seo

superagent

@infinitebrahmanuniverse/nolb-google-c @everything-registry/sub-chunk-1777

11 years ago

11 years ago

11 years ago

11 years ago

11 years ago

11 years ago