This repository contains the codebase accompanying the paper Fundamental errors in RNA velocity arising from the omission of cell growth.
In this work, we investigate current metabolic-labeling RNA velocity methods using stochastic simulations and analyses of existing datasets. We focus on the effects of cell growth on RNA velocity estimates
This repository is organized in two main components:
ssa_simulations: Stochastic simulation analysesexisting_data_analyses: Analyses of published metabolic labeling datasets
Simulation workflow is organized into sequential steps:
0_Gillespie_Sims_Cell_Growth: Simulations of growing cell volumes1_Gillespie_Sims_Gillespie: Simulations of mRNA kinetics in growing cells2_Gillespie_Sims_Cell_Sequencing: Simulation of the sequencing process3_Gillespie_Sims_Analysis: RNA velocity inference and evaluation of parameter recovery4_Comparing_Effects: Comparison of different simulation conditions and additional figures
Because steps 0–2 are computationally intensive and time-consuming, we provide a subset of our outputs for these stages in the repo.
The notebooks 3_Gillespie_Sims_Analysis.ipynb and 4_Comparing_Effects.ipynb can be run directly to reproduce the analyses and figures presented in the paper
For 3_Gillespie_Sims_Analysis.ipynb, index the combinations array in cell 3 with the parameters in cell 2 to view the results of different simulations.
induction_simulations contains simulations of gene induction for varying growth rates. The structure is similar to that of SSA simulations.
The primary downstream analyses are performed in Induction_Investigation.ipynb. As before, the early stages of the simulations are intensive, so we provide a subset of simulation output in the repo.
This folder contains code for analyzing existing metabolic labeling RNA-seq datasets used in the paper.
For each dataset, download the data from GEO and place data into ./data/{author}{year}. For example, ./data/gupta2022.
The analysis notebooks and scripts should then run without modification.