Pure Go implementation of the HDF5 file format - No CGo required
A modern, pure Go library for reading and writing HDF5 files without CGo dependencies. v0.13.0: HDF5 2.0.0 compatibility with security hardening, AI/ML datatypes, and 86.1% code coverage!
- β Pure Go - No CGo, no C dependencies, cross-platform
- β Modern Design - Built with Go 1.25+ best practices
- β HDF5 2.0.0 Compatibility - Read/Write: v0, v2, v3 superblocks | Format Spec v4.0 with checksum validation
- β Full Dataset Reading - Compact, contiguous, chunked layouts with GZIP
- β Rich Datatypes - Integers, floats, strings (fixed/variable), compounds
- β Memory Efficient - Buffer pooling and smart memory management
- β Production Ready - Read support feature-complete
- βοΈ Comprehensive Write Support - Datasets, groups, attributes + Smart Rebalancing!
go get github.com/scigolib/hdf5package main
import (
"fmt"
"log"
"github.com/scigolib/hdf5"
)
func main() {
// Open HDF5 file
file, err := hdf5.Open("data.h5")
if err != nil {
log.Fatal(err)
}
defer file.Close()
// Walk through file structure
file.Walk(func(path string, obj hdf5.Object) {
switch v := obj.(type) {
case *hdf5.Group:
fmt.Printf("π %s (%d children)\n", path, len(v.Children()))
case *hdf5.Dataset:
fmt.Printf("π %s\n", path)
}
})
}Output:
π / (2 children)
π /temperature
π /experiments/ (3 children)
- Installation Guide - Install and verify the library
- Quick Start Guide - Get started in 5 minutes
- Reading Data - Comprehensive guide to reading datasets and attributes
- Datatypes Guide - HDF5 to Go type mapping
- Troubleshooting - Common issues and solutions
- FAQ - Frequently asked questions
- API Reference - GoDoc documentation
- Architecture Overview - How it works internally
- Performance Tuning - B-tree rebalancing strategies for optimal performance
- Rebalancing API - Complete API reference for rebalancing options
- Examples - Working code examples (7 examples with detailed documentation)
When deleting many attributes, B-trees can become sparse (wasted disk space, slower searches). This library offers 4 rebalancing strategies:
Fast deletions, but B-tree may become sparse
// No options = no rebalancing (like HDF5 C library)
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate)Use for: Append-only workloads, small files (<100MB)
Batch processing: rebalances when threshold reached
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
hdf5.WithLazyRebalancing(
hdf5.LazyThreshold(0.05), // Trigger at 5% underflow
hdf5.LazyMaxDelay(5*time.Minute), // Force rebalance after 5 min
),
)Use for: Batch deletion workloads, medium/large files (100-500MB)
Performance: ~2% overhead, occasional 100-500ms pauses
Background processing: rebalances in background goroutine
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
hdf5.WithLazyRebalancing(), // Prerequisite!
hdf5.WithIncrementalRebalancing(
hdf5.IncrementalBudget(100*time.Millisecond),
hdf5.IncrementalInterval(5*time.Second),
),
)
defer fw.Close() // Stops background goroutineUse for: Large files (>500MB), continuous operations, TB-scale data
Performance: ~4% overhead, zero user-visible pause
Auto-tuning: library detects workload and selects optimal mode
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
hdf5.WithSmartRebalancing(
hdf5.SmartAutoDetect(true),
hdf5.SmartAutoSwitch(true),
),
)Use for: Unknown workloads, mixed operations, research environments
Performance: ~6% overhead, adapts automatically
| Mode | Deletion Speed | Pause Time | Use Case |
|---|---|---|---|
| Default | 100% (baseline) | None | Append-only, small files |
| Lazy | 95% (10-100x faster than immediate!) | 100-500ms batches | Batch deletions |
| Incremental | 92% | None (background) | Large files, continuous ops |
| Smart | 88% | Varies | Unknown workloads |
Learn more:
- Performance Tuning Guide: Comprehensive guide with benchmarks, recommendations, troubleshooting
- Rebalancing API Reference: Complete API documentation
- Examples: 4 working examples demonstrating each mode
Version: v0.13.0 (RELEASED 2025-11-13 - HDF5 2.0.0 Compatibility) β
HDF5 2.0.0 Ready: Security-hardened with AI/ML datatypes, Format Spec v4.0 compliance, and 86.1% coverage! π
-
File Structure:
- Superblock parsing (v0, v2, v3) with checksum validation (CRC32)
- Object headers v1 (legacy HDF5 < 1.8) with continuations
- Object headers v2 (modern HDF5 >= 1.8) with continuations
- Groups (traditional symbol tables + modern object headers)
- B-trees (leaf + non-leaf nodes for large files)
- Local heaps (string storage)
- Global Heap (variable-length data)
- Fractal heap (direct blocks for dense attributes) β¨ NEW
-
Dataset Reading:
- Compact layout (data in object header)
- Contiguous layout (sequential storage)
- Chunked layout with B-tree indexing
- GZIP/Deflate compression
- Filter pipeline for compressed data β¨ NEW
-
Datatypes (Read + Write):
- Basic types: int8-64, uint8-64, float32/64
- AI/ML types: FP8 (E4M3, E5M2), bfloat16 - IEEE 754 compliant β¨ NEW
- Strings: Fixed-length (null/space/null-padded), variable-length (via Global Heap)
- Advanced types: Arrays, Enums, References (object/region), Opaque
- Compound types: Struct-like with nested members
-
Attributes:
- Compact attributes (in object header) β¨ NEW
- Dense attributes (fractal heap foundation) β¨ NEW
- Attribute reading for groups and datasets β¨ NEW
- Full attribute API (Group.Attributes(), Dataset.Attributes()) β¨ NEW
-
Navigation: Full file tree traversal via Walk()
-
Code Quality:
- Test coverage: 86.1% overall (target: >70%) β
- Lint issues: 0 (34+ linters) β
- TODO items: 0 (all resolved) β
- Official HDF5 test suite: 433 files, 98.2% pass rate β
-
Security β¨ NEW:
- 4 CVEs fixed (CVE-2025-7067, CVE-2025-6269, CVE-2025-2926, CVE-2025-44905) β
- Overflow protection throughout (SafeMultiply, buffer validation) β
- Security limits: 1GB chunks, 64MB attributes, 16MB strings β
- 39 security test cases, all passing β
Production-ready write support with all features! β
Dataset Operations:
- β Create datasets (all layouts: contiguous, chunked, compact)
- β Write data (all datatypes including compound)
- β Dataset resizing with unlimited dimensions
- β Variable-length datatypes: strings, ragged arrays
- β Compression (GZIP, Shuffle, Fletcher32)
- β Array and enum datatypes
- β References and opaque types
- β Attribute writing (dense & compact storage)
- β Attribute modification/deletion
Links:
- β Hard links (full support)
- β Soft links (symbolic references - full support)
- β External links (cross-file references - full support)
Read Enhancements:
- β Hyperslab selection (data slicing) - 10-250x faster!
- β Efficient partial dataset reading
- β Stride and block support
- β Chunk-aware reading (reads ONLY needed chunks)
Validation:
- β Official HDF5 Test Suite: 98.2% pass rate (380/387 files)
- β Production quality confirmed
Future Enhancements:
β οΈ Advanced filters (LZF, SZIP)β οΈ Thread-safety with mutexes + SWMR modeβ οΈ Parallel I/O
Next Steps - See ROADMAP.md for complete timeline and versioning strategy.
- Go 1.25 or later
- No external dependencies for the library
# Clone repository
git clone https://github.com/scigolib/hdf5.git
cd hdf5
# Run tests
go test ./...
# Build examples
go build ./examples/...
# Build tools
go build ./cmd/...# Run all tests
go test ./...
# Run with race detector
go test -race ./...
# Run with coverage
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.outContributions are welcome! This is an early-stage project and we'd love your help.
Before contributing:
- Read CONTRIBUTING.md - Git workflow and development guidelines
- Check open issues
- Review the Architecture Overview
Ways to contribute:
- π Report bugs
- π‘ Suggest features
- π Improve documentation
- π§ Submit pull requests
- β Star the project
| Feature | This Library | gonum/hdf5 | go-hdf5/hdf5 |
|---|---|---|---|
| Pure Go | β Yes | β CGo wrapper | β Yes |
| Reading | β Full | β Full | β Limited |
| Writing | β Full | β Full | β No |
| HDF5 1.8+ | β Yes | β No | |
| Advanced Datatypes | β All | β Yes | β No |
| Test Suite Validation | β 98.2% (433 files) | β No | |
| Maintained | β Active | β Inactive | |
| Thread-safe | β No |
* Different File instances are independent. Concurrent access to same File requires user synchronization (standard Go practice). Full thread-safety with mutexes + SWMR mode planned for future releases.
This project is licensed under the MIT License - see the LICENSE file for details.
- The HDF Group for the HDF5 format specification
- gonum/hdf5 for inspiration
- All contributors to this project
Professor Ancha Baranova - This project would not have been possible without her invaluable help and support. Her assistance was crucial in bringing this library to life.
- π Documentation - Architecture and guides
- π Issue Tracker
- π¬ Discussions - Community Q&A and announcements
- π HDF Group Forum - Official HDF5 community discussion
Status: Stable - HDF5 2.0.0 compatible with security hardening Version: v0.13.0 (4 CVEs fixed, AI/ML datatypes, 86.1% coverage, 0 lint issues) Last Updated: 2025-11-13
Built with β€οΈ by the HDF5 Go community Recognized by HDF Group Forum β