Proposal: Protein 3D Structure Visualization for Dataset Viewer #7930
+0
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposal: Protein 3D Structure Visualization for HuggingFace Dataset Viewer
Executive Summary
This proposal outlines adding 3D protein structure visualization to the HuggingFace Dataset Viewer, enabling users to interactively view PDB and mmCIF molecular structures directly within the dataset preview interface.
Data Type Support
Supported formats (from recent PRs):
What gets visualized:
Not applicable (1D sequence only):
Visualization Library Comparison
Bundle sizes verified by downloading actual distribution files from npm/CDN (January 2026)
Recommendation: 3Dmol.js
Primary choice: 3Dmol.js
Rationale:
Why not Mol?* As Georgia noted, Mol* is heavy (~1.3 MB gzipped). While it's the industry standard for RCSB PDB, it's overkill for a dataset preview where users just need to verify structure data looks correct.
Alternative for power users: If users need advanced features like density maps, ligand interactions, or sequence alignment overlay, consider PDBe Molstar as an optional "full viewer" mode.
Architecture for Dataset Viewer Integration
Lazy Loading Pattern (React/Next.js)
Core Viewer Component (3Dmol.js)
Integration Points in Dataset Viewer
File Type Detection
Data Flow
UI/UX Considerations
Viewer Controls
Style Dropdown Options
Loading State
Implementation Phases
Phase 1: Basic Viewer (MVP)
Phase 2: Enhanced Features
Phase 3: Advanced (Optional)
Bundle Impact Analysis
Without lazy loading: +150 KB to initial bundle (acceptable but not ideal)
With lazy loading:
Comparison with other viewers:
The protein viewer is comparable to other specialized viewers and well within acceptable limits for lazy-loaded content.
Alternative Approach: CDN Loading
If bundle size is critical:
Pros: Zero bundle impact
Cons: External dependency, potential availability issues
Files to Modify (in dataset-viewer repo)
Since dataset-viewer is closed-source, this proposal should be shared with the HuggingFace team. They would need to:
package.json- Add 3dmol dependencycomponents/viewers/ProteinViewer.tsxcomponents/viewers/Protein3DViewerCore.tsxSummary
Recommended approach:
Why 3Dmol.js over Mol?*:
Key insight: The PDB and mmCIF loaders we implemented (PRs #7925, #7926) extract the 3D coordinates needed for visualization. The viewer just needs to consume the raw file content.
Next Steps