Skip to content

ma-compbio/Steamboat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Steamboat

Documentation Status

Steamboat is an interpretable machine learning framework leveraging a self-supervised, multi-head attention model that uniquely decomposes the gene expression of a cell into multiple key factors:

  • intrinsic cell programs,
  • neighboring cell communication, and
  • long-range interactions.

These pieces of information are used to generate cell embedding, cell network, and reconstructed gene expression.

fig1-v3-abstract

System requirements

Hardware

Steamboat can run on a laptop, desktop, or server. The experiments were done on a desktop computer with a 6-core Ryzen 5 3600 CPU and an RTX 3080 GPU. A GPU can significantly reduce the time needed to train the models.

Operating system

Steamboat is python-based and run on all mainsteam operating systems. It has been tested on Windows 10 and Springdale Linux.

Software dependencies

Package Tested with
Python 3.11.5
Torch 2.1.2 (w/ cuda 12.1)
Scanpy 1.9.6
Squidpy 1.5.0
Scipy 1.11.4
Numpy 1.26.2
Networkx 3.1
Matplotlib 3.8.0
Seaborn 0.13.2
Scikit-learn 1.2.2

Installation

We recommend using Miniconda to create an virtual environment.

conda create -n steamboat
conda activate steamboat

Please follow the official guide to install the appropriate Pytorch version for you system and hardware. Then, please install the required packages with pip install -r requirements.txt.

Steamboat can be imported directly after adding its directory to the path.

git clone https://github.com/ma-compbio/Steamboat
import sys
sys.path.append("/path/of/the/cloned/repository")

Depending on your network, installation may take 10 to 30 minutes.

Basic workflow

import steamboat as sf # "sf" = "Steamboat Factorization"
import steamboat.tools

First, make a list (adatas) of one or more AnnData objects, and preprocess them.

adatas = sf.prep_adatas(adatas, log_norm=True)
dataset = sf.make_dataset(adatas)

Create a Steamboat model and fit it to the data.

model = sf.Steamboat(short_features, n_heads=10, n_scales=3)
model = model.to("cuda") # if you GPU acceleration is supported.
model.fit(dataset)

After training, you can check the trained metagenes.

sf.tools.plot_all_transforms(model, top=1)

For clustering and segmentation, run the following lines. Change the resolution to your liking.

sf.tools.neighbors(adata)
sf.tools.leiden(adata, resolution=0.1)
sf.tools.segment(adata, resolution=0.5)

Demos

A few examples in Jupyter notebook are included in the examples folder:

  1. Illustration (simulated)
  2. Ovarian cancer data
  3. Mouse brain
  4. Colorectal cancer data

The simulation demo takes about five minutes to run. The mouse brain data takes one hour to train. Other demos take about ten minutes each.

Data used in these examples are available in Google Drive.

Documentation

For the full API and real data examples, please visit our documentation.

About

Steamboat is an interpretable framework to study multiscale cellular interactions in tissues

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages