0.3.0 • Published 6 years ago

webmiddle-service-browser v0.3.0

Weekly downloads
2
License
MIT
Repository
github
Last release
6 years ago

webmiddle-service-browser

Similar to the HttpRequest service, but it uses Headless Chrome to fetch html pages.

Install

npm install --save webmiddle-service-browser

Usage

import { PropTypes, rootContext } from 'webmiddle';
import Browser from 'webmiddle-service-browser';

const MyService = () => (
  <Browser
    name="rawHtml"
    contentType="text/html"
    url="https://news.ycombinator.com/"
    waitFor=".athing"
  />
);

rootContext.evaluate(<MyService />)
.then(resource => {
  console.log(resource.content); // the html page as a string
});

How it works

The advantage of using such a service is that any JavaScript contained in the page is executed, thus this service is a must for fetching SPAs (single page applications) or any other page with dynamic content created in the client-side.

On the other end, the service has a bigger resource usage footprint, as it needs to spawn separate Headless Chrome processes that communicate with the main Node process.

The service is built on top of the puppeteer library.

It uses the CookieManager as a jar, so that cookies obtained from Browser calls can be shared in the HttpRequest calls and vice versa.

In terms of body conversion, http errors and retries works very similarly to the HttpRequest service.

The main difference is the waitFor property, which tells the service to wait until the selector specified in the property is found on the page.

Such property can be used to wait for client-side parts of the page to be rendered before returning the resource.

If the response content-type isn't relative to an html document, then the waitFor property is ignored and the response body is returned as is.

The default response content-type can be overridden by using the contentType property.

Properties

NameDescription
nameThe name of the returned resource.
contentTypeThe contentType of the returned resource
urlThe url of the http request.
method (optional)The method of the http request, e.g. 'GET', 'POST'. Defaults to 'GET'.
body (optional)The body of the http request.
httpHeaders (optional)Additional http headers to use in the http request.
waitFor (optional)A query selector, such as .articles, that the service needs to wait for.