MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. The model is offered on TF.Hub with two variants, Lightning and Thunder. Lightning is intended for latency-critical applications, and Thunder is intended for applications that require high accuracy. Both models run faster than real time (30+ FPS) on most modern desktops and laptops, which proves crucial for live fitness, health, and wellness applications. Please try it out using the live demo.
Via script tags:
<!-- Require the peer dependencies of pose-detection. -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>
<!-- You must explicitly require a TF.js backend if you're not using the TF.js union bundle. -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script>
<!-- Alternatively you can use the WASM backend: <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-wasm/dist/tf-backend-wasm.js"></script> -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/pose-detection"></script>
Via npm:
yarn add @tensorflow-models/pose-detection
yarn add @tensorflow/tfjs-core, @tensorflow/tfjs-converter
Install one of the backends: WebGL:
yarn add @tensorflow/tfjs-backend-webgl
WASM:
yarn add @tensorflow/tfjs-backend-wasm
If you are using the Pose API via npm, you need to import the libraries first.
import * as poseDetection from '@tensorflow-models/pose-detection';
import * as tf from '@tensorflow/tfjs-core';
// Register one of the TF.js backends.
import '@tensorflow/tfjs-backend-webgl';
// import '@tensorflow/tfjs-backend-wasm';
Pass in poseDetection.SupportedModels.MoveNet
from the
posedetection.SupportedModels
enum list along with a detectorConfig
to the
createDetector
method to load and initialize the MoveNet model.
detectorConfig
is an object that defines MoveNet specific configurations:
-
modelType (optional): specify which MoveNet variant to load from the
poseDetection.movenet.modelType
enum list:SINGLEPOSE_LIGHTNING
. Default. The fastest single-pose detector.SINGLEPOSE_THUNDER
. A more accurate but slower single-pose detector.MULTIPOSE_LIGHTNING
. Multi-pose detector that detects up to 6 poses.
-
enableSmoothing (optional): A boolean indicating whether to use temporal filter to smooth the predicted keypoints. Defaults to True. The temporal filter relies on the
currentTime
field of theHTMLVideoElement
. You can override this timestamp by passing in your own timestamp (in milliseconds) as the third parameter. This is useful when the input is a tensor, which doesn't have thecurrentTime
field. Or in testing, to simulate different FPS. -
modelUrl (optional): An optional string that specifies custom url of the MoveNet model. If not provided, it will load the model specified by modelType from tf.hub. This argument is useful for area/countries that don't have access to the model hosted on tf.hub. It also accepts
io.IOHandler
which can be used with tfjs-react-native to load model from app bundle directory using bundleResourceIO. -
minPoseScore (optional): The minimum confidence score a pose needs to have to be considered a valid pose detection.
-
multiPoseMaxDimension (optional): The target maximum dimension to use as the input to the multi-pose model. Must be a multiple of 32 and defaults to 256. The recommended range is [128, 512]. A higher maximum dimension results in higher accuracy but slower speed, whereas a lower maximum dimension results in lower accuracy but higher speed. The input image will be resized so that its maximum dimension will be the given number, while maintaining the input image aspect ratio. As an example: with 320 as the maximum dimension and a 640x480 input image, the model will resize the input to 320x240. A 720x1280 image will be resized to 180x320.
-
enableTracking (optional): A boolean indicating whether detected persons will be tracked across frames. If true, each pose will have an ID that uniquely identifies a person. Only used with multi-pose models.
For more information about tracking, see the documentation here.
-
trackerType (optional): A
TrackerType
indicating which type of tracker to use. Defaults to bounding box tracking. -
trackerConfig (optional): A
TrackerConfig
object that specifies the configuration to use for the tracker. For properties that are not specified, default values will be used.
The following code snippet demonstrates how to load the MoveNet.SinglePose.Lightning model:
const detectorConfig = {modelType: poseDetection.movenet.modelType.SINGLEPOSE_LIGHTNING};
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.MoveNet, detectorConfig);
The following code snippet demonstrates how to load the MoveNet.MultiPose.Lightning model with bounding box tracking enabled:
const detectorConfig = {
modelType: poseDetection.movenet.modelType.MULTIPOSE_LIGHTNING,
enableTracking: true,
trackerType: poseDetection.TrackerType.BoundingBox
};
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.MoveNet, detectorConfig);
Now you can use the detector to detect poses. The estimatePoses
method
accepts both image and video in many formats, including:
tf.Tensor3D
, ImageData
, HTMLVideoElement
, HTMLImageElement
,
HTMLCanvasElement
.
The following code snippet demonstrates how to run the model inference:
const poses = await detector.estimatePoses(image);
Please refer to the Pose API
README
for the basic structure of the returned poses
. When running the multi-pose
MoveNet model the box
field in a returned Pose
will be set with a bounding
box around the detected person. When tracking is enabled, the id
field of a
Pose
will contain a unique ID that identifies a tracked person.
To quantify the inference speed of MoveNet, the model was benchmarked across multiple devices. The model latency (expressed in FPS) was measured on GPU with WebGL, as well as WebAssembly (WASM), which is the typical backend for devices with lower-end or no GPUs.
SinglePose Lightning | SinglePose Thunder | Multipose Lightning
MacBook Pro 15" 2019 Intel core i9. AMD Radeon Pro Vega 20 Graphics. (FPS) |
iPhone 12 (FPS) |
Pixel 5 (FPS) |
Desktop Intel i9-10900K. Nvidia GTX 1070 GPU. (FPS) |
|
---|---|---|---|---|
WebGL | 104 | 77 | 54 | 51 | 43 | 24 | 34 | 12 | 8 | 87 | 82 | 62 |
WASM with SIMD + Multithread |
42 | 21 | N/A | N/A | N/A | 71 | 30 | N/A |
Note that for multi-person detection, the number of detected persons does not impact inference speed and the accuracy of detections is similar to that of SinglePose Lightning.
To see the model’s FPS on your device, try our demo. You can switch the model type and backends live in the demo UI to see what works best for your device.