@codait/max-human-pose-estimator NPM

MAX for TensorFlow.js: Human Pose Estimator

This is a TensorFlow.js port of the MAX Human Pose Estimator pre-trained model. The Human Pose Estimator was trained to detect humans in an image and identifies the body parts, including nose, neck, eyes, shoulders, elbows, wrists, hips, knees, and ankles.

Install

Browser

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
<script src="https://cdn.jsdelivr.net/npm/@codait/max-human-pose-estimator"></script>

By default when the @codait/max-human-pose-estimator JavaScript module is loaded, the model is automatically loaded and the cache warmed up (by running inference against an all zero input). To change this default behavior (and prevent the model from being automatically initialized) set attribute data-init-model="false" in the script tag for the @codait/max-human-pose-estimator.

Node.js

npm install --save @codait/max-human-pose-estimator

Usage

The complete examples for browser and Node.js environments are in the /examples directory.

Browser

Note: When loaded in a browser, the global variable poseEstimator will be available to access the API.

let image = document.getElementById('my-image')

poseEstimator
  .predict(image)
  .then(prediction => {
    console.log(prediction.posesDetected)
  })

Node.js

const { predict } = require('@codait/max-human-pose-estimator')
const { read, MIME_PNG } = require('jimp')
const { createCanvas, loadImage } = require('canvas')

const createCanvasElement = function (imageInput) {
  return new Promise(async (resolve, reject) => {
    const img = await loadImage(imageInput)
    let canvas = createCanvas(img.width, img.height)
    let ctx = canvas.getContext('2d')
    await ctx.drawImage(img, 0, 0)
    resolve(canvas)
  })
}

const imagePath = `file://${ __dirname}/my-image.jpg`

read(imagePath)
  .then(imageData => imageData.scaleToFit(512, 512).getBufferAsync(MIME_PNG))
  .then(imageBuffer => createCanvasElement(imageBuffer))
  .then(imageElement => predict(imageElement))
  .then(prediction => {
    console.log(prediction.posesDetected)
  })

API

loadModel(init)
Loads the model files.
init - if true, a prediction will be triggered using an all zero Tensor to warm up the model (helps increase speed of subsequent predictions when running in a browser). Default is true.
Returns the TensorFlow.js model.
processInput(image, mirrorImage)
Processes the input image to the shape and format expected by the model. The image is resized/scaled (to max width or height of 432px) and converted to a 4D Tensor.
image - an instance of HTMLImageElement, HTMLCanvasElement, or HTMLVideoElement.
mirrorImage - if true the image will be flipped horizontally (e.g., mirrored webcam video). Default is false.
Returns a 4D Tensor representation of the image that can be passed to the model.
runInference(inputTensor)
Runs inference on the input Tensor passed. The output is 4D Tensor comprised of the concatenation of the heatmap and part affinity field map predictions
inputTensor - a 4D Tensor representing an ImageData
Returns the inference results.
processOutput(inferenceResults)
Processes the inference output replacing the output Tensor with calculated poses along with the heat map and part affinity field map predictions.
inferenceResults - the model output from running inference.
Returns an object containing
- heatMap: a 2D array with the predicted heat map
- pafMap: a 2D array with the predicted part affinity field map
- posesDetected: an array of human poses detected in the image
- imageSize: an object with the width and height of the resized image
predict(image, mirrorImage)
Loads the model (if not loaded), processes the input image, runs inference, processes the inference output, and returns a prediction object. This is a convenience function to avoid having to call each of the functions (loadModel, processInput, runInference, processOutput) individually.
image - an instance of HTMLImageElement, HTMLCanvasElement, or HTMLVideoElement.
mirrorImage - if true the image will be flipped horizontally (e.g., mirrored webcam video). Default is false.
Returns an object containing
- heatMap: a 2D array with the predicted heat map
- pafMap: a 2D array with the predicted part affinity field map
- posesDetected: an array of human poses detected in the image
- imageSize: an object with the width and height of the resized image
cocoUtil()
An object containing a set of helper variables for processing the inference output:
- parts: an array of named body parts with the part's index corresponding to its ID
- pairs: a 2D array listing body part connections (e.g., 1, 2 = Neck to Right Shoulder)
- pairsNetwork: a 2D array listing part affinity field indexing corresponding to the each pair of body parts
- colors: a 2D array of RBG color values
config(params)
Set or get configuration params for the post processing calculations. Do not include params to get the configuration values.
params - an object containing parameters to set.
Available parameters include:
- nmsWindowSize: pixel size when applying non-maximum suppression to calculate peaks (default: 6)
- nmsThreshold: minimum pixel score required when calculating peaks (default: 0.001)
- localPAFThreshold: minimum part affinity field score when calculating possible pairs (default: 0.141)
- partScoreThreshold: minimum part score required when calculating parts (default: 0.247)
- pafCountThreshold: minimum part affinity field values when calculating possible pairs (default: 4)
- partCountThreshold: minimum parts required when calculating poses (default: 4)
version
Returns the version