HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos

Simone Alberto Peirone • Francesca Pistilli • Giuseppe Averta

Welcome to the official repository of our paper "HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos", accepted at ICCV 2025.

📝 Abstract

Human activities are particularly complex and variable, and this makes challenging for deep learning models to reason about them. However, we note that such variability does have an underlying structure, composed of a hierarchy of patterns of related actions. We argue that such structure can emerge naturally from unscripted videos of human activities, and can be leveraged to better reason about their content. We present HiERO, a weakly-supervised method to enrich video segments features with the corresponding hierarchical activity threads. By aligning video clips with their narrated descriptions, HiERO infers contextual, semantic and temporal reasoning with an hierarchical architecture. We prove the potential of our enriched features with multiple video-text alignment benchmarks (EgoMCQ, EgoNLQ) with minimal additional training, and in zero-shot for procedure learning tasks (EgoProceL and Ego4D Goal-Step). Notably, HiERO achieves state-of-the-art performance in all the benchmarks, and for procedure learning tasks it outperforms fully-supervised methods by a large margin (+12.5% F1 on EgoProceL) in zero shot. Our results prove the relevance of using knowledge of the hierarchy of human activities for multiple reasoning tasks in egocentric vision.

🔎 Getting started

🏗️ Environment and data setup

First, clone the current repository and create the required directories:

git clone --recursive git@github.com:sapeirone/HiERO.git && cd HiERO
mkdir -p checkpoints data/ego4d/raw/annotations data/ego4d/raw/features

Then, build a python environment and install all the required dependencies:

python -m venv .env
source .env/bin/activate
pip install -r requirements.txt -f https://data.pyg.org/whl/torch-2.4.0+cu124.html --extra-index-url https://download.pytorch.org/whl/

Finally, download the Ego4d annotations and the pre-extracted features and copy (or link them) in the data/ego4d/raw directory. Download and copy the EgoClip and EgoMCQ annotations from EgoVLP in the data/ego4d/raw/annotations directory.

The resulting data/ego4d/raw should be structured as follows:

data/ego4d/raw
          └─── annotations
          │    ├─── v1
          │    │    │ ego4d.json
          │    │    │ ...
          │    ├──── egoclip.csv
          │    └──── egomcq.json
          │
          └─── features
               ├─── omnivore_video_swinl
               │    │ 64b355f3-ef49-4990-8622-9e9eef68b495.pth
               │    │ ...
               │     
               └─── egovlp
                    │ 64b355f3-ef49-4990-8622-9e9eef68b495.pth
                    │ ...

Pretrained components

For HiERO (EgoVLP), we use the frozen text encoder from EgoVLP. Download (egovlp_text.pth, egovlp_txt_proj.pth) and copy the weights under the pretrained directory:

pretrained
     ├─── egovlp_text.pth
     └─── egovlp_txt_proj.pth

🚀 Training and validation

EgoClip/EgoMCQ

Training and validation on EgoClip/EgoMCQ are implemented in the train.py and validate.py. Pre-trained checkpoints are provided in section Model Zoo.

Training on EgoClip:

python train.py --config-name=omnivore save_to=path/to/ckpt/dir

Validation on EgoMCQ:

python validate.py --config-name=omnivore resume_from=path/to/ckpt/dir/model.pth

EgoProceL

Please refer to egoprocel/README.md.

Ego4d Goal-Step

Please refer to ego4d_goalstep/README.md.

🐘 Model Zoo

Tip

HiERO is backbone-agnostic and can be easily extended to new features extractors. See here for more!

We provide pretrained checkpoints of HiERO for a set of features extractors. Omnivore Video Swin-L features are part of the official Ego4d release and can be downloaded following to the official docs. We provide pre-extracted features for the remaining backbones according to the following table.

Backbone	Config file	Features	Checkpoint	EgoMCQ Inter (%)	EgoMCQ Intra (%)
Omnivore	`omnivore`	ego4d docs	hiero_omnivore.pth	90.1	53.4
EgoVLP	`egovlp`	ego4d_egovlp.zip	hiero_egovlp.pth	91.6	59.6
LaViLa	`lavila`	ego4d_LaViLa-L.zip	hiero_lavila-l.pth	94.6	64.4

Acknowledgements

This study was carried out within the FAIR - Future Artificial Intelligence Research and received funding from the European Union Next-GenerationEU (PIANO NAZIONALE DI RIPRESA E RESILIENZA (PNRR) – MISSIONE 4 COMPONENTE 2, INVESTIMENTO 1.3 – D.D. 1555 11/10/2022, PE00000013). This manuscript reflects only the authors’ views and opinions, neither the European Union nor the European Commission can be considered responsible for them. We acknowledge the CINECA award under the ISCRA initiative, for the availability of high performance computing resources and support.

This codebase is based on our previous works EgoPack and Hier-EgoPack.

Cite Us

@InProceedings{Peirone_2025_ICCV,
    author    = {Peirone, Simone Alberto and Pistilli, Francesca and Averta, Giuseppe},
    title     = {HiERO: Understanding the Hierarchy of Human Behavior Enhances Reasoning on Egocentric Videos},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {19862-19871}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
configs		configs
data		data
ego4d_goalstep		ego4d_goalstep
egoprocel		egoprocel
features-extraction		features-extraction
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
quickstart.ipynb		quickstart.ipynb
requirements.txt		requirements.txt
train.py		train.py
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos

📝 Abstract

🔎 Getting started

🏗️ Environment and data setup

Pretrained components

🚀 Training and validation

EgoClip/EgoMCQ

EgoProceL

Ego4d Goal-Step

🐘 Model Zoo

Acknowledgements

Cite Us

About

Uh oh!

Contributors

Uh oh!

Languages

License

sapeirone/HiERO

Folders and files

Latest commit

History

Repository files navigation

HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos

📝 Abstract

🔎 Getting started

🏗️ Environment and data setup

Pretrained components

🚀 Training and validation

EgoClip/EgoMCQ

EgoProceL

Ego4d Goal-Step

🐘 Model Zoo

Acknowledgements

Cite Us

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages