egoPPG: Heart Rate Estimation from Eye Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks (ICCV 2025)
Björn Braun, Rayan Armani, Manuel Meier, Max Moebus, Christian Holz
Sensing, Interaction & Perception Lab, Department of Computer Science, ETH Zürich, Switzerland
egoPPG is a novel vision task for egocentric systems to recover a person’s cardiac activity to aid downstream vision tasks.
Our method, PulseFormer continuously estimates the person’s photoplethysmogram (PPG) from areas around the eyes and fuses motion cues from the headset’s inertial measurement unit to track HR values.
We demonstrate egoPPG’s downstream benefit for a key task on EgoExo4D, an existing egocentric dataset for which we find PulseFormer’s estimates of HR to improve proficiency estimation by 14%.

To train and validate PulseFormer, we collected a dataset of 13+ hours of eye tracking videos from Project Aria and contact-based PPG signals as well as an electrocardiogram (ECG) for ground-truth HR values. Similar to EgoExo4D, 25 participants performed diverse everyday activities such as office work, cooking, dancing, and exercising, which induced significant natural motion and HR variation (44–164 bpm).
To download the egoPPG-DB dataset, which we recorded ourselves for training and evaluating egoPPG, please visit the following link: egoPPG-DB dataset.
You will need to sign a Data Transfer and Use Agreement (DTUA) form to agree to our terms of use. Please note that only members of an institution (e.g., a PI or professor) can sign this DTUA. After you have signed the DTUA, you will receive a download link via email. The dataset is around 220GB in size. The dataset is only for non-commercial, academic research purposes.
To create the environment that we used for our paper, simply run:
conda env create -f environment.yml`
conda activate egoPPG
The following three folders contain all code related to the egoPPG task of predicting HR from eye-tracking videos on egoPPG-DB and EgoExo4D datasets:
- configs/: contains all config files to run preprocessing and ML experiments
- preprocessing/: code to preprocess the egoPPG-DB and EgoExo4D dataset to get training and inference data for ML models
- ml/: code to run ML experiments for HR estimation from eye-tracking videos from egoPPG-DB and EgoExo4D using different models
Code will be up soon. The proficiency_estimation/ folder contains all code related to the downstream proficiency estimation task on EgoExo4D using the TimeSformer architecture and the estimated HR values from the egoPPG task:
Run all scripts from the root folder (egoPPG) due to relative imports.
- Adjust the config file in configs/preprocessing/config_preprocessing_egoppg.yaml according to your paths and desired preprocessing parameters (downsampling, upsampling, window size, etc.).
- Run the preprocessing script using the config file config_preprocessing_egoppg.yaml:
python -m preprocessing.preprocessing_egoppg --cfg_path configs/preprocessing/config_preprocessing_egoppg.yaml
To predict HR on EgoExo4D (for the egoPPG task), you have to preprocess the EgoExo4D dataset first so that you can then run the inference with the models trained on egoPPG-DB.
- To preprocess the EgoExo4D dataset, update the configs/preprocessing/config_preprocessing_egoexo4d.yaml file according to your paths and desired preprocessing parameters (downsampling, upsampling, window size, etc.).
- Run the preprocessing script using the config file config_preprocessing_egoexo4d.yaml:
python -m preprocessing.preprocessing_egoppg --cfg_path configs/preprocessing/config_preprocessing_egoexo4d.yaml
Important:
- The eye tracking videos of the EgoExo4D dataset are originally recorded at only 10 fps, whereas the egoPPG-DB dataset is recorded at 30 fps.
- We found that upsampling the EgoExo4D eye-tracking videos to 30 fps using linear interpolation between frames during preprocessing improves the HR estimation performance. Therefore, we recommend the following settings when aiming to predict HR on EgoExo4D:
- config_preprocessing_egoexo4d.yaml: set upsampling factor to 3 (and downsampling factor to 1)
- config_preprocessing_egoppg.yaml: set downsampling factor to 3 and upsampling factor to 3. This first downsamples the egoPPG-DB data from 30 fps to 10 fps and then upsamples it back to 30 fps using linear interpolation, matching exactly the preprocessing of the EgoExo4D data. This ensures a smaller domain gap between training and inference data when training on egoPPG-DB and inferring on EgoExo4D.
- Note: This data will be saved into a folder named Down1_*_Up3. 1 is the effective downsampling factor (downsampling//upsampling).
Run all scripts from the root folder (egoPPG) due to relative imports.
To train and evaluate ML models for the egoPPG task, you simply have to run the ml/main_ml.py script with different config files located in configs/ml/. The config files specify which model is used and on which dataset the model is trained and evaluated (egoPPG-DB or EgoExo4D). You also have to specify the paths to the preprocessed data in the config files (important: adjust to the configs chosen during the preprocessing step). For example, to train and evaluate the PulseFormer model on egoPPG-DB for HR estimation, run:
python -m ml.main_ml --cfg_path configs/ml/egoppg_egoppg_PulseFormer.yaml
- TOOLBOX_MODE: specify whether you want to train and test or only test
- INPUT_SIGNALS/LABEL_SIGNALS: specify which input and label signals to use (e.g., eye videos as input and PPG as label). The names have to match the names used during preprocessing.
- DATA_PATH: overall path to the folder where you saved the preprocessed data of all datasets during the preprocessing step
- FILE_PATH: if you want to save small files, such as logs, to a different folder than DATA_PATH. Can be the same as DATA_PATH.
- CACHED_PATH: specify the path to the dataset, on which you want to train and test. train and test can be specified separately if you, e.g., want to train on egoPPG-DB and test on EgoExo4D.
- MODEL_NAME: specify which model to use (PulseFormer, TS-CAN, DeepPhys, etc.)
To add a new model, you need to create three new files and make changes in two other files:
- Create a new model file in ml/models/, e.g., my_model.py, which contains the architecture of your model. You can use existing model files as a template.
- Create a new trainer file in ml/trainer/, e.g., my_model_trainer.py, which contains the training and evaluation logic for your model. You can use existing trainer files as a template.
- Create a new config file in configs/ml/, e.g., egoppg_egoppg_my_model.yaml, which specifies the parameters for training and evaluating your model. Naming is usually in the format: trainset_testset_model.yaml.
- In ml/trainer/__init__.py, import your new trainer class.
- In ml/main_ml.py, add a new condition to instantiate your trainer class in BOTH the train_and_test and the test function.
To use a new dataset, you need to create a new preprocessing script and a new config file:
- Create a new preprocessing script in preprocessing/, e.g., preprocessing_my_dataset.py, which contains the logic to preprocess your dataset. You can use existing preprocessing scripts as a template.
- Create a new config file in configs/preprocessing/, e.g., config_preprocessing_my_dataset.yaml, which specifies the parameters for preprocessing your dataset. Naming is usually in the format: config_preprocessing_my_dataset.yaml.
- After the preprocessing is done, you can create new config files in configs/ml/ to train and evaluate models on your new dataset.
- The validation set size is set to 10% of the training set. Adjust in ml/ml_helper.py if needed.
For all training and inference, we used one GeForce RTX 4090 GPU.
The following table contains the results reported in our paper using only random seed 0.
| Model | MAE ↓ | RMSE ↓ | MAPE ↓ | r ↑ |
|---|---|---|---|---|
| Yue et al. [Yue et al., 2023] | 29.63 | 32.99 | 37.86 | 0.10 |
| DeepPhys [Chen et al., 2018] | 28.26 | 31.97 | 36.68 | 0.08 |
| TS-CAN [Liu et al., 2020] | 26.32 | 32.39 | 29.13 | 0.11 |
| ContrastPhys+ [Sun et al., 2024] | 19.12 | 24.13 | 22.57 | 0.21 |
| RhythmMamba [Zou et al., 2024] | 15.05 | 19.78 | 17.46 | -0.16 |
| Baseline eyes | 14.60 | 18.18 | 18.37 | 0.20 |
| PhysMamba [Yan et al., 2024] | 13.94 | 16.86 | 17.76 | 0.61 |
| RhythmFormer [Zou et al., 2024] | 13.13 | 17.43 | 14.73 | 0.51 |
| Baseline skin | 12.40 | 15.54 | 15.29 | 0.50 |
| PhysNet [Yu et al., 2019] | 12.09 | 15.43 | 15.14 | 0.66 |
| PhysFormer [Yu et al., 2022] | 10.71 | 13.97 | 12.69 | 0.72 |
| PulseFormer w/o SA (ours) | 10.49 | 13.62 | 12.83 | 0.73 |
| FactorizePhys [Joshi et al., 2024] | 10.07 | 13.43 | 12.36 | 0.67 |
| PulseFormer w/o MITA (ours) | 8.82 | 12.03 | 10.82 | 0.81 |
| 🔥 PulseFormer (ours) | 7.67 | 10.69 | 9.45 | 0.85 |
Table: Results for HR prediction from eye-tracking videos using different models (PulseFormer, PulseFormer w/o SA, PulseFormer w/o MITA, and established rPPG baselines).
PulseFormer w/o MITA refers to the model PhysNetSA in our repository here.
Additionally to the results reported in our paper, we now also evaluated the performance of PulseFormer and some of the baselines across three seeds to get a better estimate of the performance and its variance across different seeds/machines. The mean results and STD across three seeds are reported below:
| Model | MAE ↓ | RMSE ↓ | MAPE ↓ | r ↑ |
|---|---|---|---|---|
| Baseline eyes | 14.60 | 18.18 | 18.37 | 0.20 |
| FactorizePhys [Joshi et al., 2024] | 13.55 ± 1.13 | 16.64 ± 0.96 | 17.55 ± 1.52 | 0.67 ± 0.03 |
| Baseline skin | 12.40 | 15.54 | 15.29 | 0.50 |
| PhysFormer [Yu et al., 2022] | 11.52 ± 0.45 | 15.53 ± 0.51 | 12.68 ± 0.54 | 0.64 ± 0.03 |
| PhysNet [Yu et al., 2019] | 11.32 ± 0.43 | 14.61 ± 0.43 | 14.22 ± 0.52 | 0.69 ± 0.02 |
| PulseFormer w/o MITA (ours) | 9.68 ± 0.59 | 12.67 ± 0.61 | 12.06 ± 0.91 | 0.80 ± 0.01 |
| 🔥 PulseFormer (ours) | 8.53 ± 0.62 | 11.64 ± 0.70 | 10.49 ± 0.74 | 0.82 ± 0.03 |
Table: Results for HR prediction from eye-tracking videos using different models (averaged across three random seeds).
PulseFormer w/o MITA refers to the model PhysNetSA in our repository here.
Code will be up soon. To dos:
- Explain usage (training and inference)
- Explain create_split_files: script to create the split files for proficiency estimation task
- Show number of used/excluded takes
The EgoExo4D data has to be downloaded from the official EgoExo4D repository: https://ego-exo4d-data.org/#intro You need the annotations, VRS files (for IMU data), the ET videos and the POV videos (for the downstream proficiency estimation task). Alternatively, you can also run PulseFormer without the motion-informed temporal attention (MITA) module. Then, you do not need the VRS files (IMU data). To get the data needed for the proficiency estimation, run the following commands with the Ego4D downloader:
egoexo -o PATH_SAVE_FOLDER --parts take_vrs_noimagestream metadata annotations downscaled_takes/448
If you find our paper, code or dataset useful for your research, please cite our work.
@article{braun2025egoppg,
title={egoppg: Heart rate estimation from eye-tracking cameras in egocentric systems to benefit downstream vision tasks},
author={Braun, Bj{\"o}rn and Armani, Rayan and Meier, Manuel and Moebus, Max and Holz, Christian},
journal={arXiv preprint arXiv:2502.20879},
year={2025}
}
Make sure to also check out egoEMOTION, our work on emotion recognition from egocentric vision systems. egoEMOTION includes over 50 hours of recordings from 43 participants doing emotion-elicitation tasks and naturalistic activities while self-reporting their affective state using the Circumplex Model and Mikels’ Wheel as well as their personality via the Big Five model. Each session provides synchronized data from the Project Aria glasses, videos of the participants' faces, nose PPG, and physiological baselines (ECG, EDA, and breathing rate) for reference.
The structure of the code in this repository is strongly inspired by the rPPG-Toolbox. Make sure to also check it out for other rPPG methods and datasets!