1.0.45 • Published 7 years ago

ghostwriter-middleware v1.0.45

Weekly downloads
93
License
MIT
Repository
github
Last release
7 years ago

Ghostwriter

Ghostwriter prerenders your JavaScript website for search engines, SEO tools, social media crawler and your browser.

ModuleDownloadsVersionLicense
ghostwriter-middlewarenpm downloads totalnpm versionnpm license
ghostwriter-apptoolnpm downloads totalnpm versionnpm license
ghostwriter-html-webpack-pluginnpm downloads totalnpm versionnpm license
ghostwriter-servicenpm downloads totalnpm versionnpm license
ghostwriter-commonnpm downloads totalnpm versionnpm license

Ghostwriter is a replacement for the prerender.io service. In contrast to prerender.io, it does not limit serving of prerendered pages to a particular set of spiders. Quite the contrary, it serves prerendered pages to all crawlers and browsers. Therefore Ghostwriter should NOT be vulnerable to accidental cloaking.

This approach results in one simple requirement for your web application: it should not be scared about prerendered content in the DOM, e.g. it should be able to discard and re-render, or it should be able to reconcile the content. Usually, this is not an issue if your web application is structured correctly. Also, we added a safeguard to control link, script and style tags which get added to the DOM by external libraries. You can read about it in the section "Handling of script, link and style tags" below.

Here you will find a complete example web application based on React: https://github.com/core-process/ghostwriter-example

Ghostwriter Service

The Ghostwriter service is provided as Docker image and as NPM package. Just pick whatever flavor you like best.

NPM package

Install the ghostwriter-service module via:

$ npm install ghostwriter-service --save
# ... or ...
$ yarn add ghostwriter-service

The module provides the service binary ghostwriter-service, which requires the following parameters:

ParameterDescriptionExample
--portPort to listen on8887
--database-uriURL to a MongoDB databasemongodb://database:27017/ghostwriter
--keep-databaseKeep current database

Be aware: If you do not pass the --keep-database parameter to the Ghostwriter service, it will drop and recreate the provided MongoDB database.

We recommend to add the service to your package.json:

{
  ...
  "scripts": {
    "ghostwriter-service":
      "ghostwriter-service --port 8887 --database-uri mongodb://database:27017/ghostwriter",
    ...
  },
  ...
}

The previous entry enables you to run the service via npm:

$ npm run ghostwriter-service

Docker image

Pull the latest ghostwriter-service image via:

$ docker pull quay.io/process_team/ghostwriter-service:latest

The Docker image requires the environment variable DATABASE_URI pointing to a MongoDB database and creates a service listening on port 8888.

Run your Ghostwriter service with the following command:

$ docker run \
    -d \
    --name myghostwriter \
    -p 127.0.0.1:8887:8888 \
    --env DATABASE_URI=mongodb://database:27017/ghostwriter \
    quay.io/process_team/ghostwriter-service:latest

Be aware: The Ghostwriter service provided as Docker container will drop and recreate the provided MongoDB database. Currently, there is no environment variable available to change the behavior.

See the Docker manual for more.

Application Backend

Ghostwriter hooks into your express application with the help of middleware. Install the ghostwriter-middleware module via:

$ npm install ghostwriter-middleware --save
# ... or ...
$ yarn add ghostwriter-middleware

The following example shows how to enable the middleware in your application:

...
// e.g. use app name as 'token' (see config below)
const appName = require('./package.json').name;

// e.g. use git commit id as 'version'
import childProcess from 'child_process';
const gitCommitId =
  childProcess.execSync('git rev-parse HEAD').toString().trim();

// e.g. do not pre-render calls to /api*, used as 'urlTest'
const urlTest =
  (url) => !url.startsWith('/api');

// setup ghostwriter middleware for express
import ghostwriter from 'ghostwriter-middleware';
app.use(ghostwriter({
  token: appName,
  version: gitCommitId,
  sitemaps: [ '/sitemap.xml' ],
  urlTest: urlTest,
  gwUrl: 'http://localhost:8887',
  appUrl: 'http://localhost:8888',
}));
...

The middleware accepts the following parameters:

ParameterDescriptionDefault Value
tokenUnique name of your application instance, e.g. the name from your package.jsonnone (required)
versionA version string to identify the current version of your application, e.g. the git commit idnone (required)
refreshCycleThe number of hours before a rendered page needs to be refreshed1.0
sandbox.viewportWidthThe width of the rendering viewport in pixels1280
sandbox.viewportHeightThe height of the rendering viewport in pixels800
sandbox.completionTimeoutThe number of seconds to wait before rendering fails30.0
sitemapsAn array of sitemap paths used to actively crawl the application[ '/sitemap.xml' ]
gwUrlURL pointing to Ghostwriter (can be on the local network)none (required)
appUrlURL pointing to your application (can be on the local network)'http://localhost'
retriesOnErrorNumber of retries if Ghostwriter fails3

Application Frontend

Your frontend application is required to confirm the successful completion of the rendering process. This is accomplished by defining so-called 'sections' with the help of the ghostwriter-apptools module. Install the ghostwriter-apptools module via:

$ npm install ghostwriter-apptools --save
# ... or ...
$ yarn add ghostwriter-apptools

Define the 'sections' as soon as possible in your entry point. The sections need to be defined via the setup function before they can be confirmed and there has to be only one call to the setup function. The setup function accepts an arbitrary number of 'section' names, e.g.:

...
// define sections which require render confirmation at the very first
import * as ghostwriter from 'ghostwriter-apptool';
ghostwriter.setup('newsticker', 'page');
...

Confirm the rendering of the 'sections' in your components with the done function as soon as the DOM represents the expected rendering result. Think about it twice and read the documentation of your rendering library carefully. The done function expects a valid 'section' name as a single parameter.

Below you will find two React-based examples.

...
import * as ghostwriter from 'ghostwriter-apptool';

// a 'page' which does not load additional data
export default class SomePage extends React.Component {
  componentDidMount() {
    ghostwriter.done('page');
  }
  render() {
    return (<div className="some-page">...</div>);
  }
};
...

// a 'page' which does load additional data (confirm after rendering of data)
export default class AnotherPage extends React.Component {
  componentDidMount() {
    loadPageData((data) => {
      this.setState({ data }, () => {
        ghostwriter.done('page');
      });
    });
  }
  render() {
    return (<div className="another-page">...</div>);
  }
};
...

Handling of script, link and style tags

All instances of the tags <script type="text/javascript">, <link rel="stylesheet"> and <style type="text/css"> will be filtered by Ghostwriter if not marked with the attribute data-ghostwriter-keep. This behavior might be surprising on the first sight but is well-thought. A lot of external libraries clutter the DOM with these tags without proper checks for duplicates. Therefore if not controlled, libraries start to add these tags twice and might trigger undefined behavior. Just add data-ghostwriter-keep to your tags you want to be part of the pre-rendered result, and you are ready to go.

To make things easier, we created the module ghostwriter-html-webpack-plugin as a drop-in replacement for the html-webpack-plugin. The ghostwriter-html-webpack-plugin imports and extends the html-webpack-plugin internally. You can use ghostwriter-html-webpack-plugin exactly like you would use html-webpack-plugin. The only difference in behavior is that ghostwriter-html-webpack-plugin adds the attribute data-ghostwriter-keep to the previously mentioned tags.

Below you will find a simple example setup:

// ghostwriter-html-webpack-plugin is a drop-in
// replacement for html-webpack-plugin
var HtmlWebpackPlugin = require('ghostwriter-html-webpack-plugin');
var webpackConfig = {
  entry: 'index.js',
  output: {
    path: 'dist',
    filename: 'index_bundle.js'
  },
  plugins: [new HtmlWebpackPlugin()]
};

This will generate a file dist/index.html containing the following:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>Webpack App</title>
  </head>
  <body>
    <script src="index_bundle.js" data-ghostwriter-keep></script>
  </body>
</html>

For further information, please see the documentation of the html-webpack-plugin.

Advanced: rendering targets

Ghostwriter identifies a limited set of rendering targets to support fine-tuning of the pre-rendered result. This capability might be useful e.g. to iron-out specific incompatibilities between social networks and their required metadata. Please use this feature with care and do not use it for cloaking purposes.

Currently, Ghostwriter identifies the following rendering targets:

IdentifierTarget
facebookFacebook crawler
twitterTwitter crawler
pinterestPinterest crawler
standardAll other, e.g. regular browser, Google, ...

You can retrieve the current rendering target via the target function of the ghostwriter-apptool module. Example:

import * as ghostwriter from 'ghostwriter-apptool';
import React from 'react';
import DocumentMeta from 'react-document-meta';
...
<div className="some-page">
  <DocumentMeta
    meta={{ property:
      'article:author':
        ghostwriter.target() != 'pinterest'
          ? 'https://www.facebook.com/niklas.salmoukas'
          : 'Niklas Salmoukas'
    }}
    ...
  />
  ...
</div>
...

Advanced: add style hints

Ghostwriter uses PhantomJS internally to perform the pre-rendering of the pages. There are some edge cases which are not perfectly supported by PhantomJS. Most of these issues are ironed-out by Ghostwriter internally. Still, there is one issue left which might need your intervention.

In case you care about perfect pre-rendered pages, and you use modern style attributes in the DOM, which are not supported by PhantomJS, you need to add these styles to the data-ghostwriter-style attribute. Setting this attribute will ensure, the unsupported styles are still included in the pre-rendered page correctly.

Example: PhantomJS does not support the style object-fit. Therefore <img src="..." style="border: 0; object-fit: cover;"> would result in <img src="..." style="border: 0;">. If you render <img src="..." style="border: 0; object-fit: cover;" data-ghostwriter-style="object-fit: cover;"> instead, it will get translated to the expected <img src="..." style="border: 0; object-fit: cover;"> in the pre-rendered code.

Just to be clear: In case you do not use these modern styles in the DOM or in case you do not care if the pre-rendered matches your dynamic application to the point, just leave it out.

1.0.45

7 years ago

1.0.44

7 years ago

1.0.43

7 years ago

1.0.42

7 years ago

1.0.41

7 years ago

1.0.40

7 years ago

1.0.39

7 years ago

1.0.38

7 years ago

1.0.37

7 years ago

1.0.36

7 years ago

1.0.35

7 years ago

1.0.34

7 years ago

1.0.33

7 years ago

1.0.32

7 years ago

1.0.31

7 years ago

1.0.30

7 years ago

1.0.29

7 years ago

1.0.28

7 years ago

1.0.27

7 years ago

1.0.26

7 years ago

1.0.25

7 years ago

1.0.24

7 years ago

1.0.23

7 years ago

1.0.22

7 years ago

1.0.21

7 years ago

1.0.20

7 years ago

1.0.19

7 years ago

1.0.18

7 years ago

1.0.17

7 years ago

1.0.16

7 years ago

1.0.15

7 years ago

1.0.14

7 years ago

1.0.13

7 years ago

1.0.12

7 years ago

1.0.11

7 years ago

1.0.10

7 years ago

1.0.9

7 years ago

1.0.8

7 years ago

1.0.7

7 years ago

1.0.6

7 years ago

1.0.5

7 years ago

1.0.4

7 years ago

1.0.3

7 years ago

1.0.1

7 years ago