H2O - Action

Organized by tkwon - Current server time: March 29, 2025, 9:13 p.m. UTC

Current

First phase
May 7, 2022, midnight UTC

End

Competition Ends
Never

H2O: Two Hands Manipulating Objects for First Person Interaction Recognition

Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds. For more detailed information, please visit our project page.

 

MethodVal accuracyTest accuracyModalities
C2D 76.10 70.66 RGB
I3D [1] 85.15 75.21 RGB
Slowfast [2] 86.00 77.69 RGB
H+O [3] 80.49 68.88 train:RGB+hand+obj, test:RGB
ST-GCN [4] 83.47 73.86 train:RGB+hand+obj, test:RGB
TA-GCN [5] 86.78 79.25 train:RGB+hand+obj, test:RGB

 

The last three baselines use RGB images, hand, and object poses for training and use only RGB images for the test. In this challenge (ECCV'22), we expect you will:

  • use hand and object pose (ground-truth) for train and test.
  • use RGB images, hand, and object poses for training and use only RGB images for the test. 

Please indicate what modalities you use in the method description section. (ex. hand+obj, RGB, train:RGB +hand+obj test:RGB)

 

* We don't allow to use pre-trained models for this ECCV'22 competition.

 

References

[1] Carreira et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, CVPR 2017

[2] Feichtenhofer et al. SlowFast Networks for Video Recognition, ICCV 2019

[3] Tekin et al. H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions, CVPR 2019

[4] Yan et al. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition, AAAI 2018

[5] Kwon et al. H2O: Two Hands Manipulating Objects for First Person Interaction Recognition. ICCV 2021.

 

Evaluation Criteria

We have three devided datasets, training, validation, test. You need to train your model only using the training set and select your model using the validation set. Submission are only evaludated the test dataset. For action labels, we calculate action accuracy. Baseline validation and test accuracy are 86.78 and 79.25 respectively in [1].


References
[1] Taein Kwon, Bugra Tekin, Jan Stuhmer, Federica Bogo, and Marc Pollefeys. “H2O: Two Hands Manipulating Objects for First Person Interaction Recognition.” ICCV (2021).

Terms and Conditions

You agree that the DATASET: (a) shall only be downloaded if you agree to these terms; (b) is to be used only for the academic purposes; (c) will not be used for commercial purposes; (d) will not be transferred to any third party. Furthermore, you agree that any publication based on, or containing, the DATASET shall include a reference to the Data set provided under these terms.

Submission Format

To submit your results to the leaderboard you must construct a submission zip file containing a file:

  • action_labels.json - Action label prediction on the test set

 

JSON Submission Format

For action prediction, you need to put action id number as the key and action label number as value of the json file. action_labels.json {"modality": "hand+obj", "1": 32, "2": 11, "3": 14, ... "241": 16, "242": 22 } Make answer.zip with this json file to submit your result to CodaLab.

First phase

Start: May 7, 2022, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 IMOU_ALG 0.9711
2 Necca 0.9669
3 debaumann 0.9463