Segment-Level Skill Assessment in Procedural Videos
Michele Mazzamuto¹², Daniele Di Mauro², Gianpiero Francesca³, Giovanni Maria Farinella¹, Antonino Furnari¹
Co-first authors
¹ IPLAB, University of Catania
² Next Vision s.r.l., Italy
³ Toyota Motor Europe, Belgium
Contains the annotation data and metadata for five procedural video datasets:
- taxonomy.yaml: A YAML file mapping each dataset to its list of procedural tasks and skill categories
- assembly101/: Assembly101 dataset containing assembly and disassembly task annotations
- egoexo4d/: Ego-Exo4D dataset containing bike repair tasks (chain cleaning/lubrication, wheel installation/removal, flat tire fixing)
- epic-tent/: EPIC-Tent dataset containing tent assembly annotations
- ikea/: IKEA furniture assembly dataset with annotations for various furniture pieces (TV bench, side table, coffee table, shelf drawer)
- meccano/: Meccano toy assembly dataset
Each dataset subdirectory contains:
- taxonomy.json: Task-specific taxonomy and skill definitions
- train.txt, val.txt, test.txt: Video identifiers for train/validation/test splits in the form of winner,loser,global goal->subgoal
- rounds/: Contains YAML files (numbered 1.yaml through 6.yaml) organizing video segment pairs for comparative judgment across six annotation rounds. Each file maps task subtasks to lists of paired video segments to be compared for skill assessment.
- rankings/: Contains skill ranking files (named rank_after_round_{n}.yml, i.e., 1.yml through 6.yml) with Elo ratings, Swiss scores, z-scores, and expert level assessments for each video segment after each annotation round.
- transcripts_coarse/: Coarse-grained action transcripts
Contains Amazon Mechanical Turk (AMT) crowdsourced comparative skill assessment results across six annotation rounds per task. Each subdirectory corresponds to a dataset:
- assembly101/: 12 JSON files (6 rounds each for assembly and disassembly tasks)
- Format:
assembly_round_X_results.jsonanddisassembly_round_X_results.json
- Format:
- egoexo4d/: 6 JSON files (one per round for bike repair tasks)
- Format:
bike_round_X_results.json
- Format:
- epic-tent/: Similar structure with tent assembly results
- ikea/: Similar structure with furniture assembly results
- meccano/: Similar structure with meccano assembly results
Each result file contains comparative judgment data with:
Input.video1_urlandInput.video2_url: Paired video identifiers being comparedAnswerExperienced1andAnswerExperienced2: Skill ratings from annotators (0-5 scale)winner: The video deemed to demonstrate higher skill levelagreement_rate: Inter-annotator agreement on the judgment
To download each video dataset follow their documentation, in particular:
- assembly101: follow istruction to download here
- egoexo4d: follow istruction to download here
- epic-tent: follow istruction to download here
- ikea: follow istruction to download here
- meccano: follow istruction to download here
We share a script (tools/extract_clips.py) to extract the annotated clips, the current implementation expect a videos/ directory, containing the original videos, and a clips/ directory, where the output will be extracted, inside every dataset dir, for Ego-Exo4D you should copy keystep_val.json and keystep_train.json from the original dataset inside the egoexo4d directory.