Proposal: Protein 3D Structure Visualization for Dataset Viewer #7930
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposal: Protein 3D Structure Visualization for HuggingFace Dataset Viewer
Executive Summary
This proposal outlines adding 3D protein structure visualization to the HuggingFace Dataset Viewer, enabling users to interactively view PDB and mmCIF molecular structures directly within the dataset preview interface.
Data Type Support (Updated Architecture)
Supported formats (from recent PRs):
.pdb,.entextensions viaPdbFolderbuilder.cif,.mmcifextensions viaMmcifFolderbuilderNew Implementation Pattern (One Row = One Structure):
Both PRs have been refactored to follow the ImageFolder pattern, where each row in the dataset contains one complete protein structure file. This is the recommended ML-friendly approach:
Key Components:
What gets visualized:
Not applicable (1D sequence only):
Visualization Library Comparison
Bundle sizes verified by downloading actual distribution files from npm/CDN (January 2026)
Recommendation: 3Dmol.js
Primary choice: 3Dmol.js
Rationale:
Why not Mol?* As Georgia noted, Mol* is heavy (~1.3 MB gzipped). While it's the industry standard for RCSB PDB, it's overkill for a dataset preview where users just need to verify structure data looks correct.
Alternative for power users: If users need advanced features like density maps, ligand interactions, or sequence alignment overlay, consider PDBe Molstar as an optional "full viewer" mode.
Summary
Recommended approach:
Backend implementation (Updated):
structurecolumn contains the complete file content ready for 3D renderingNext Steps