ExperimentHub provides an interface to access and manage data from the Bioconductor ExperimentHub service directly in Python.
It is designed to work within the BiocPy ecosystem, converting R data objects (like SingleCellExperiment or SummarizedExperiment) into their Python equivalents (e.g., SummarizedExperiment) using rds2py.
Note
This is an experimental package. It may not work with all RDS files from ExperimentHub. Currently, this package filters ExperimentHub resources to provide access to:
- File Formats:
.rds - R Classes:
SingleCellExperiment,SummarizedExperiment,RangedSummarizedExperiment,GRangesetc
Files are converted to their respective BiocPy representations or common Python formats.
To get started, install the package from PyPI
pip install experimenthubThe registry manages the local cache of ExperimentHub metadata and resources. On the first run, it downloads the metadata database.
from experimenthub import ExperimentHubRegistry
# Initialize the registry (downloads metadata if needed)
eh = ExperimentHubRegistry()ExperimentHub contains thousands of datasets. Use the search() method to find resources by title, description, or species.
# Search for mouse-related datasets
results = eh.search("mus musculus")
# Print the first few matches
for record in results[:5]:
print(f"{record.ehub_id}: {record.title}")
# Output:
# EH1041: Brain scRNA-seq data, sample ...,
# EH1042: Brain scRNA-seq data, gene ...,
# ...You can retrieve detailed metadata for a specific ID.
record = eh.get_record("EH4663")
print(f"Title: {record.title}")
print(f"Species: {record.species}")
print(f"Genome: {record.genome}")
print(f"Description: {record.description}")
print(f"R Class: {record.preparer_dataclass}")
## Output:
# Title: Lohoff biorXiv spatial coordinates (sample 2)
# Species: Mus musculus
# Genome: mm10
# Description: Cell spatial coordinates for sample 2 for the E8.5 seqFISH dataset from biorXiv
# R Class: characterThe load() method handles the download, caching, and loading of the dataset.
If the resource is an R data file (.rds) containing a supported Bioconductor object (e.g., SingleCellExperiment), it is automatically read and converted to an equivalent python object using rds2py.
# Load a data.frame as an BiocFrame object
data = eh.load("EH4663")
print(data)
# BiocFrame with 8425 rows and 3 columns
# x y z
# <FloatList> <FloatList> <IntegerList>
# embryo1_Pos0_cell10_z5 0.7084368794499625 -2.7071263060540645 5
# embryo1_Pos0_cell100_z5 0.9763043488304248 -2.517971233335359 5
# embryo1_Pos0_cell101_z5 0.9749347757408557 -2.6739635081030855 5
# ... ... ...
# embryo1_Pos28_cell97_z5 -1.3992279805347039 3.1761928631722824 5
# embryo1_Pos28_cell98_z5 -1.389353519722718 3.1349508225406666 5
# embryo1_Pos28_cell99_z5 -1.394992277928857 2.5812717935734355 5This project has been set up using BiocSetup and PyScaffold.