[Project page] [Paper] [Arxiv] [Colab]
Cheng Qian1, Julen Urain2, Kevin Zakka3, Jan Peters2
1TU Munich, 2TU Darmsadt, 3UC Berkeley
TLDR: We train a generalist policy for controlling dexterous robot hands to play any songs, using human pianist demonstration videos from internet. We use residual reinforcement learning to learn song-specific policies from demonstrations, and a two-stage diffusion policy to generalize to new songs.
We're thrilled to announce that we've just published a Tutorial that walks you through the entire process of preparing your dataset from videos and MIDI files! 🎹🎥
📓 tutorial/data_preprocessing.ipynb
Inside the notebook, you'll learn how to:
- Estimate homography matrix from video coordinates to real piano coordinates
- Extract fingering and human fingertip trajectories from videos
- Format your data for training
We have a tutorial on Google Colab:
Follow the steps below to set up the PianoMime.
Start by cloning the repository:
git clone https://github.com/sNiper-Qian/pianomime.git-
Open a terminal and run the following command to install the necessary libraries:
sudo apt install libasound2-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg
-
Run the following script to install additional dependencies for RoboPianist:
bash pianomime/scripts/install_deps.sh
-
Install the Python dependencies by running:
pip install -r pianomime/requirements.txt
-
(Optional) Sometimes it is needed to install JAX with the required version:
pip install --upgrade "jax==0.4.23" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html pip install -U "jaxlib==0.4.23+cuda12.cudnn89" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
-
Download the dataset from the following link: https://drive.google.com/file/d/1X8q-PvqyqL2X15wCZevTfAtSDfiHpYAa/view?usp=sharing
-
Download the checkpoints from the following link: https://drive.google.com/file/d/1-wa1UAn_mbPN87D6GIi4PS0VNDE5mbQh/view?usp=sharing
We also provide a tutorial for generate dataset from videos and MIDI files.
You can find the step-by-step guide here: Data Preparation Tutorial
This notebook will walk you through the process of converting your video and MIDI data into a structured dataset, ready for training.
Please use the following citation:
@misc{qian2024pianomimelearninggeneralistdexterous,
title={PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations},
author={Cheng Qian and Julen Urain and Kevin Zakka and Jan Peters},
year={2024},
eprint={2407.18178},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.18178},
}The simulation environment is based on RoboPianist RoboPianist
The diffusion policy is adapted from Diffusion Policy
The inverse-kinematics controller is adapted from Pink
The human demonstration videos are downloaded from YouTube channel PianoX
This project is licensed under the MIT License. See the LICENSE file for details.
