Skip to content

Add NIfTI neuroimaging file support #3272

@The-Obstacle-Is-The-Way

Description

@The-Obstacle-Is-The-Way

Description

The Dataset Viewer crashes when loading datasets containing NIfTI (.nii, .nii.gz) neuroimaging files with the error:

Feature type 'Nifti' not found. Available feature types: ['Value', 'ClassLabel', 'Translation', 'TranslationVariableLanguages', 'LargeList', 'List', 'Array2D', 'Array3D', 'Array4D', 'Array5D', 'Audio', 'Image', 'Video', 'Pdf']

Affected Dataset

https://huggingface.co/datasets/hugging-science/arc-aphasia-bids

This is a 273GB neuroimaging dataset in BIDS format containing NIfTI brain scans.

Root Cause

The datasets library added Nifti feature type support in version 4.4.0 (PR huggingface/datasets#7815), but the Dataset Viewer currently pins datasets==4.1.1 which doesn't include this feature type.

Proposed Solution

  1. Bump datasets dependency from 4.1.1 to ^4.4.1
  2. Add nibabel ^5.0.0 dependency for NIfTI file handling
  3. Implement NIfTI support in the viewer pipeline:
    • Add NiftiSource TypedDict and create_nifti_file() in asset.py
    • Add nifti() handler in features.py
    • Update url_preparator.py for Nifti URL signing
    • Update rows.py to prevent truncation of Nifti columns
    • Update rows_utils.py for multithreaded Nifti uploads

Implementation

I have a working implementation ready with full test coverage. Will submit a PR shortly.

Impact

This would enable viewing of neuroimaging datasets on HuggingFace Hub, supporting the neuroscience research community.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions