Skip to content

BiocPy/experimenthub

Repository files navigation

PyPI-Server Unit tests

experimenthub

ExperimentHub provides an interface to access and manage data from the Bioconductor ExperimentHub service directly in Python.

It is designed to work within the BiocPy ecosystem, converting R data objects (like SingleCellExperiment or SummarizedExperiment) into their Python equivalents (e.g., SummarizedExperiment) using rds2py.

Note

This is an experimental package. It may not work with all RDS files from ExperimentHub. Currently, this package filters ExperimentHub resources to provide access to:

  • File Formats: .rds
  • R Classes: SingleCellExperiment, SummarizedExperiment, RangedSummarizedExperiment, GRanges etc

Files are converted to their respective BiocPy representations or common Python formats.

Install

To get started, install the package from PyPI

pip install experimenthub

Usage

Initialize the Registry

The registry manages the local cache of ExperimentHub metadata and resources. On the first run, it downloads the metadata database.

from experimenthub import ExperimentHubRegistry

# Initialize the registry (downloads metadata if needed)
eh = ExperimentHubRegistry()

Searching for Resources

ExperimentHub contains thousands of datasets. Use the search() method to find resources by title, description, or species.

# Search for mouse-related datasets
results = eh.search("mus musculus")

# Print the first few matches
for record in results[:5]:
    print(f"{record.ehub_id}: {record.title}")
# Output:
# EH1041: Brain scRNA-seq data, sample ...,
# EH1042: Brain scRNA-seq data, gene ...,
# ...

Inspecting Metadata

You can retrieve detailed metadata for a specific ID.

record = eh.get_record("EH4663")

print(f"Title: {record.title}")
print(f"Species: {record.species}")
print(f"Genome: {record.genome}")
print(f"Description: {record.description}")
print(f"R Class: {record.preparer_dataclass}")

## Output:
# Title: Lohoff biorXiv spatial coordinates (sample 2)
# Species: Mus musculus
# Genome: mm10
# Description: Cell spatial coordinates for sample 2 for the E8.5 seqFISH dataset from biorXiv
# R Class: character

Loading Data

The load() method handles the download, caching, and loading of the dataset.

If the resource is an R data file (.rds) containing a supported Bioconductor object (e.g., SingleCellExperiment), it is automatically read and converted to an equivalent python object using rds2py.

# Load a data.frame as an BiocFrame object
data = eh.load("EH4663")

print(data)
# BiocFrame with 8425 rows and 3 columns
#                                           x                   y             z
#                                 <FloatList>         <FloatList> <IntegerList>
#  embryo1_Pos0_cell10_z5  0.7084368794499625 -2.7071263060540645             5
# embryo1_Pos0_cell100_z5  0.9763043488304248  -2.517971233335359             5
# embryo1_Pos0_cell101_z5  0.9749347757408557 -2.6739635081030855             5
#                                         ...                 ...           ...
# embryo1_Pos28_cell97_z5 -1.3992279805347039  3.1761928631722824             5
# embryo1_Pos28_cell98_z5  -1.389353519722718  3.1349508225406666             5
# embryo1_Pos28_cell99_z5  -1.394992277928857  2.5812717935734355             5

Note

This project has been set up using BiocSetup and PyScaffold.

About

Access ExperimentHub resources

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages