The Cambridge-Imperial APE (Action-Pose-Estimation) dataset is collected for 3D human pose estimation. Different from other public datasets, the APE dataset provides multimodal data:
The APE dataset contains 245 sequences from 7 subjsects performing 7 different categories of actions. Videos of each subject were recorded in different unconstrained environments, changing camera poses and moving background objects.
The settings is considered challenging for traditional 3D human pose estimation because:
In citing the APE dataset, please refer to:
Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-modality Regression Forest
Tsz-Ho Yu, Tae-Kyun Kim, Roberto Cipolla
In proceedings of 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2013
[IEEE Xplore Website]
The action label of each sequence is stored in activityLabel.txt, which assigns an action class of each folder.
The APE dataset provides Both RGB images and depth-maps of the video sequences. They are stored in folders indicated by the action class and subject, e.g. clap04 is the folder that contains the "clapping" sequences performed by subject 4.
Inside each sequence folder, RGB frames and depth-maps are named as "imageX.jpg" and "depthX.jpg" respectively, where X is the frame index. For example, depth00567.jpg is the depth-map for frame 568.
3D pose data are stored in text files indicated by its corresponding folder, e.g. clap04.txt contains the 3D pose data for the clapping action performed by subject 4, corresponding to folder "clap04".
The format of the pose file is described below:
[N (# of frame)] [No use] [No use] [joint 1] [joint 2] ... [joint 15] . . (N lines) . . [No use] [No use] [joint 1] [joint 2] ... [joint 15]
The first line contains the number of frame N in the video sequence, which is followed by N lines describing the 3D pose of the subject in each frame. The first two numbers in each line are place holders and not used at this moment, then each joint is represetned by a 14 numbers:
[joint 1] := [X] [Y] [Z] [location_conf] [R00] [R01] [R02] [R10] [R11] [R12] [R20] [R21] [R22] [rotation_conf]
[X,Y,Z] is the 3D spatial location of the joint. [location_conf] indicates the confidence of its location, a joint is not seen when its confidence is zero (occlusions, missing from data, etc.). The orientation of teh joint is represented by a 3x3 rotation matrix with elements [R00 to R22], there rotation-conf is the confidence of the joint's orientation.
The APE dataset is released under the MIT license.