This package provides tools for downloading, preprocessing the raw THINGS-EEG2 data, and generating image embeddings using various vision models.
Warning
This repository builds upon the original data processing by Gifford et al (2022). Please check out their original code and the corresponding paper.
We are in no way associated with the authors. Nonetheless we hope, that this makes things easier (pun intended) to use.
If you only need the CLI functionality, you can run it using one line of code:
uvx run --from things_eeg2_dataset things-eeg2pixi exec --with things_eeg2_dataset things-eeg2git clone [email protected]:ZEISS/things_eeg2_dataset.git
cd things_eeg2_dataset
uv sync
uv pip install --editable .
source .venv/bin/activate
things-eeg2 --help
things-eeg2 --install-completion
# Then restart your shell
# Example for zsh:
source ~/.zshrc# Using UV
uv init
uv add things_eeg2_dataset
source .venv/bin/activate
things-eeg2 --help
things-eeg2 --install-completion
# Then restart your shell
# Example for zsh:
source ~/.zshrc# Using pixi
pixi init
pixi add things_eeg2_dataset
pixi shell
things-eeg2 --help
things-eeg2 --install-completion
# Then restart your shell
# Example for zsh:
source ~/.zshrcYou can understand the data structure that is created by the CLI by referring to paths.py. It contains the ground truth data structure used throughout the project.
The package supports multiple state-of-the-art vision models for generating image embeddings:
| Model | Embedder Class | Description |
|---|---|---|
open-clip-vit-h-14 |
OpenClipViTH14Embedder |
OpenCLIP ViT-H/14 (SDXL image encoder) |
openai-clip-vit-l-14 |
OpenAIClipVitL14Embedder |
OpenAI CLIP ViT-L/14 |
dinov2 |
DinoV2Embedder |
DINOv2 with registers (self-supervised) |
ip-adapter |
IPAdapterEmbedder |
IP-Adapter Plus projections |
Each embedder generates:
- Pooled embeddings: Single vector per image (e.g.,
(1024,)for ViT-H-14) - Full sequence embeddings: All tokens (e.g.,
(257, 1280)for ViT-H-14) - Text embeddings: Corresponding text features from image captions
Output Files:
embeddings/
βββ ViT-H-14_features_training.safetensors # Pooled embeddings
βββ ViT-H-14_features_training_full.safetensors # Full token sequences
βββ ViT-H-14_features_test.safetensors
βββ ViT-H-14_features_test_full.safetensorsfrom things_eeg2_dataset.dataloader import ThingsEEGDataset
dataset = ThingsEEGDataset(
image_model="ViT-H-14",
data_path="/path/to/processed_data",
img_directory_training="/path/to/images/train",
img_directory_test="/path/to/images/test",
embeddings_dir="/path/to/embeddings",
train=True,
time_window=(0.0, 1.0),
)See things_eeg2_dataloader/README.md for detailed usage.
We are happy users of the THINGS-EEG2 dataset, but not associated with the original authors. If you use this code, please cite the THINGS-EEG2 paper:
Gifford, A. T., Lahner, B., Saba-Sadiya, S., Vilas, M. G., Lascelles, A., Oliva, A., ... & Cichy, R. M. (2022). The THINGS-EEG2 dataset. Scientific Data.
This project follows the original THINGS-EEG2 license terms.

