Skip to content

deweihu96/SPACE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPACE: STRING proteins as complementary embeddings

Table of Contents

Introduction

Official repository for the paper in Bioinformatics: SPACE: STRING proteins as complementary embeddings, in which we precalculated:

  • cross-species network embeddings
  • ProtT5 sequence embeddings

for all eukaryotic proteins in STRING v12.0.

You can download all the embeddings from the STRING website:

  • protein.network.embeddings.v12.0.h5
  • protein.sequence.embeddings.v12.0.h5

SPACE

Reproduce the results in the paper

Please follow this document.

How to Cite

If you use this work in your research, please cite the SPACE paper:

Hu, Dewei, et al. "SPACE: STRING proteins as complementary embeddings." Bioinformatics (2025): btaf496. https://doi.org/10.1101/2024.11.25.625140

and the STRING database:

Szklarczyk, D., Nastou, K., Koutrouli, M., Kirsch, R., Mehryary, F., Hachilif, R., ... & von Mering, C. (2025). The STRING database in 2025: protein networks with directionality of regulation. Nucleic Acids Research, 53(D1), D730-D737. https://doi.org/10.1093/nar/gkae1113

How to load the embeddings

The following code reads the cross-species network embedding file 9606.protein.network.embeddings.v12.0.h5.

Python example

pip install h5py
import h5py

filename = '9606.protein.network.embeddings.v12.0.h5'

with h5py.File(filename, 'r') as f:
    meta_keys = f['metadata'].attrs.keys()
    for key in meta_keys:
        print(key, f['metadata'].attrs[key])

    embedding = f['embeddings'][:]
    proteins = f['proteins'][:]
	
    # protein names are stored as bytes, convert them to strings
    proteins = [p.decode('utf-8') for p in proteins]

R example:

Install the rhdf5 package to read the embedding files. The following code reads the embedding file 9606.protein.network.embeddings.v12.0.h5.

# Install required packages if not already installed
# install.packages("rhdf5")

# Load the library
library(rhdf5)

filename <- '9606.protein.network.embeddings.v12.0.h5'

metadata <- h5readAttributes(filename, "metadata")
for (key in names(meta_keys)) {
    print(paste(key, meta_keys[[key]]))
}

embeddings <- h5read(filename, "embeddings")
proteins <- h5read(filename, "proteins")

Read combined files

Read the combined network embedding file of all eukaryotes with Python

import h5py

filename = 'protein.network.embeddings.v12.0.h5'

with h5py.File(filename, 'r') as f:
    meta_keys = f['metadata'].attrs.keys()
    for key in meta_keys:
        print(key, f['metadata'].attrs[key])
  
    species = '4932'  # if we check the brewer's yeast
    embeddings = f['species'][species]['embeddings'][:]
    proteins = f['species'][species]['proteins'][:]
	
    # protein names are stored as bytes, convert them to strings
    proteins = [p.decode('utf-8') for p in proteins]

Read the combined file with R

library(rhdf5)

filename <- 'protein.network.embeddings.v12.0.h5'

meta_keys <- h5attributes(h5file$metadata)
for (key in names(meta_keys)) {
    print(paste(key, meta_keys[[key]]))
}

species <- '4932'  # for brewer's yeast
embeddings <- h5read(filename, paste0('species/', species, '/embeddings'))
proteins <- h5read(filename, paste0('species/', species, '/proteins'))

Contact

dewei.hu@sund.ku.dk.

Star history

Star History Chart

License

MIT.

About

SPACE: STRING proteins as complementary embeddings

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors