Towards Natural Prosthetic Hand Gestures: A Common-Rig and Diffusion Inpainting Pipeline

Korea University¹, University of Illinois Urbana-Champaign², NC Research³
IEEE Engineering in Medicine and Biology Society (EMBC) 2024

Abstract

Existing works on prosthetic hands focus on increasing dexterity by carrying out functional tasks. Achieving specific hand movements, such as pointing the index finger, are desired but research on generating the hand movement itself has yet to be widely explored. In this work, we propose a pipeline for generating hand motion from body motion via using the Common-Rig, a kinematic rig representation for effective motion representation, and a diffusion-based inpainting method, which has shown strengths in generalization and stability. Common rigging is applied to a motion capture dataset with both body and hands information, and hand motions are generated while conditioned on the body motions of a hand-zeroed test set. The generated results of our proposed method, compared to two baseline methods, attain smaller fingertip positional errors and diversity closer to that of the ground truth. In addition, the generated motions are implemented on a real robotic system with prosthetic hands for evaluation.

Pipeline

The overall pipeline of our work consists of two phases: 1) Dataset Rigging and 2) Hand Motion Generation. In the 'Dataset Rigging' phase, the joint positions of the Common-Rig is matched with the joint positions of the motion capture skeleton in order to convert the original motion to that of the Common-Rig. In the 'Hand Motion Generation' phase, we utilize a diffusion-based inpainting method to generate the missing hand motions from the body-only motion.

Dataset Rigging

We utilize the Common-Rig, which contains a rigid body structure with pre-defined link lengths, to retarget motions from a motion capture dataset that contains both body and hand movements. Initially, from the joint offsets of the motion capture skeleton, a forward kinematic process is carried out to obtain the homogeneous transformation matrixs. Subsequently, from the obtained transformation matrixs, the position values of the joints of motion capture skeleton are extracted set as the positional targets of the joints of the Common-Rig. Finally, an inverse kinematic process is carried out to obtain the joint angles of the Common-Rig.

(a) The default T-pose of the Common-Rig and the body joints used for the inverse kinematic process.
(b) The hand model for the Common-Rig and the hand joints used for the inverse kinematic process.
(c), (d) An example of the Common-Rigging process: the purple skeleton indicates a specific pose from the motion capture dataset and the orange rig indicates the pose of the Common-Rig.

Hand Motion Generation

In order to generate hand motions from hands-zeroed body motion, we utilized RePaint, a diffusion-based inpainting method which was originally applied in the domain of images. The dataset obtained from the previous 'Dataset Rigging' process is split into a train set and a test set. A diffusion model is trained on the train set using DDPM. Using the pretrained diffusion model and the hands-zeroed test set, the hand motions are inpainted from the body motions. Specifically at each diffusion step, the body motion is obtained by carrying out forward diffusion processes from the hands-zeroed body motion and the hand motion is obtained using the pretrained DDPM model.

Comparison to Baseline Methods

We compare our generation result to two baselines: Supervised Learning (SL) and Body2Hands (B2H). The video above shows the results of the ground truth and the generated results of the motion with the instruction label "Point oneself", whereby two baselines were carried out on the original motion capture dataset and our proposed utilized the 'Dataset Rigging' process. Amongst the three methods, our method was the only one that was able to generated an index finger pointing motion.

Real Robot

We validate the generated hand motions on a real robotic system consisting of a dual-arm system with prosthetic hands. The dual-arm consists of a base with two PAPRAS arms each connected to a PSYONIC Ability Hand.

Real robot results with their corresponding generated results of the motion with the instruction label "Point oneself".
Row 1: Whole pose of the real robot.
Row 2: Hand-zoomed poses of the real robot.
Row 3: Whole pose of the generated results visualized in simulation.
Row 4: Hand-zoomed poses of the generated results visualized in simulation.

Application

An application of the rigging process presented in the paper. The positional targets of the fingers within the Youtube video are obtained using MediaPipe Hand Landmark Detection. Inverse Kinematics is directly applied on the robot model in simulation to obtain the joint angles and the retargeted motion is deployed on the real robot.