feat(bids): Add BIDS dataset loader for neuroimaging data#7886
Open
The-Obstacle-Is-The-Way wants to merge 12 commits intohuggingface:mainfrom
Open
feat(bids): Add BIDS dataset loader for neuroimaging data#7886The-Obstacle-Is-The-Way wants to merge 12 commits intohuggingface:mainfrom
The-Obstacle-Is-The-Way wants to merge 12 commits intohuggingface:mainfrom
Conversation
This was referenced Nov 29, 2025
CloseChoice
reviewed
Nov 30, 2025
Contributor
There was a problem hiding this comment.
I looked over the code and tested this and it looks absolutely fantastic. Also uploaded a dataset to test:
from datasets import load_dataset
ds = load_dataset("TobiasPitters/ds004884-mini")
ex = ds['train'][0]
ex['nifti']or for streaming:
from datasets import load_dataset
ds = load_dataset("TobiasPitters/ds004884-mini", streaming=True)
ex = next(iter(ds['train']))
ex['nifti']Here's how it's visualized:
@neurolabusc FYI
By the way using this branch (and niivue) I created: https://huggingface.co/spaces/TobiasPitters/bids-neuroimaging
| "run": datasets.Value("string"), | ||
| "path": datasets.Value("string"), | ||
| "nifti": datasets.Nifti(), | ||
| "metadata": datasets.Value("string"), |
Contributor
There was a problem hiding this comment.
I think this might be something for another PR but actually having a dict-like object here would be more beneficial here. Not quite sure how we could achieve that, maybe through pyarrow's mapping and union type or having a dedicated feature for BIDSMetadata (or for dictionaries in general?).
444e464 to
599d670
Compare
This was referenced Dec 14, 2025
- Remove deprecated `trust_remote_code=True` from tests (not needed for packaged modules) - Fix ruff linting errors (import sorting, trailing newlines) - Apply ruff formatter for consistent code style - Convert set() generators to set comprehensions (C401)
- Update setup.py to include nibabel in BIDS extra - Update docs to clarify nibabel is included - Add nibabel availability check in _info() - Move os import to module level - Update test skipif to check both pybids and nibabel
599d670 to
267c86c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds native BIDS (Brain Imaging Data Structure) dataset loading support using PyBIDS, enabling
load_dataset('bids', data_dir='/path/to/bids')workflow for neuroimaging researchers.Contributes to #7804 (Support scientific data formats) - BIDS is a widely-used standard for organizing neuroimaging data built on NIfTI files.
Changes
Core Implementation
src/datasets/packaged_modules/bids/bids.py- GeneratorBasedBuilder implementationsrc/datasets/packaged_modules/bids/__init__.py- Module exportssrc/datasets/packaged_modules/__init__.py- Registration with module registrysrc/datasets/config.py-PYBIDS_AVAILABLEconfig flagsetup.py- Optionalpybids>=0.21.0+ nibabel dependencyFeatures
Documentation & Tests
docs/source/bids_dataset.mdx- User guide with examplestests/packaged_modules/test_bids.py- Unit tests (4 tests)Usage
Test plan
pytest tests/packaged_modules/test_bids.py)make qualitypasses (ruff check)Context
This PR is part of the neuroimaging initiative discussed with @TobiasPitters. Follows the BIDS 1.10.1 specification and leverages the existing Nifti feature for NIfTI file handling.
Related PRs: