Skip to content

Commit 7c3499b

Browse files
Merge pull request #6 from biomedbigdata/dev
Release 0.1.0
2 parents 7c8a7d2 + 94d0c34 commit 7c3499b

File tree

6 files changed

+155
-1232
lines changed

6 files changed

+155
-1232
lines changed

README.md

Lines changed: 23 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1,103 +1,75 @@
11
[![PyPI version](https://badge.fury.io/py/dysregnet.svg)](https://badge.fury.io/py/dysregnet)
22

33
# DysRegNet package
4-
5-
64
DysRegNet, is a method for inferring patient-specific regulatory alterations (dysregulations) from gene expression profiles. DysRegNet uses linear models to account for confounders and residual-derived z-scores to assess significance.
7-
8-
95
## Installation
106
To install the package from PyPI please run:
11-
12-
`pip install dysregnet`
13-
7+
```bash
8+
pip install dysregnet
9+
```
1410

1511
or you can install it from git:
16-
17-
`git clone https://github.com/biomedbigdata/DysRegNet_package.git && cd DysRegNet_package`
18-
19-
`python setup.py install`
20-
21-
12+
```bash
13+
git clone https://github.com/biomedbigdata/DysRegNet_package.git && cd DysRegNet_package
14+
python setup.py install
15+
```
2216

2317
## Data input
18+
The inputs of the package are the following Pandas DataFrame objects:
2419

25-
The inputs of the package are the following Pandas DataFrame object:
26-
27-
28-
- expression_data - Gene expression matrix with the format: patients as rows (first column - patients/samples ids), and genes as columns.
20+
- expression_data - Gene expression matrix in the format: patients as rows (first column - patients/samples ids), and genes as columns.
2921
- GRN - Gene Regulatory Network (GRN) with two columns in the following order ['TF', 'target'].
3022
- meta - Metadata with the first column containing patients/samples ids and other columns for the condition and the covariates.
3123

32-
3324
The patients id or samples ids must be the same in the "expression_data" and "meta". Additionally, gene names or ids must match the ones in the "GRN" DataFrame.
3425

3526
In the condition column of the meta DataFrame, the control samples should be encoded as 0 and case samples as 1.
3627

3728
The gene regulatory network should be provided by the user. You can either use an experimental validated GRN or learn it from control samples. We recommend using software like [arboreto](https://github.com/aertslab/arboreto) since you can use its output directly to DysRegNet.
3829

39-
40-
41-
42-
4330
## Parameters
44-
45-
4631
Additionally, you can provide the following parameters:
4732

48-
49-
5033
- conCol: Column name for the condition in the meta DataFrame.
5134

5235
- CatCov: List of categorical variable names. They should match the name of their columns in the meta Dataframe.
5336

5437
- ConCov: List of continuous covariates. They should match the name of their columns in the meta Dataframe.
5538

56-
- zscoring: Boolean, default: False. zscoring of expression data (if needed).
39+
- zscoring: If True, DysRegNet will scale the expression of each gene and all continuous confounders based on their mean and standard deviation in the control samples.
5740

5841
- bonferroni_alpha: P-value threshold for multiple testing correction
5942

60-
- normaltest: Boolean. If True, Run a normality test for residuals "scipy.stats.normaltest". If residuals are not normal, the edge will not be considered in the analysis.
43+
- normaltest: If True, DysRegNet runs a normality test for residuals "scipy.stats.normaltest". If residuals are not normal, the edge will not be considered in the analysis.
6144

62-
- normaltest_alpha: p-value threshold for normaltest (if True).
45+
- normaltest_alpha: P-value threshold for normaltest (if True).
6346

6447
- R2_threshold: R-squared (R2) threshold from 0 to 1 (optional). If the fit is weaker, the edge will not be considered in the analysis.
6548

66-
- direction_condition: Boolean. If True: only include dysregulation that are relevant for the interactions (down-regulation of an activation or up-regulation of a supressions). Please check the paper for more details.
49+
- direction_condition: If True, DysRegNet will only consider case samples with positive residuals (target gene overexpressed) for models with a negative TF coefficient as potentially dysregulated. Similarly, for positive TF coefficients, only case samples with negative residuals are considered. Please check the paper for more details.
6750

51+
The parameters are also annotated with dockstrings for more details.
6852

6953
## Get Started
70-
71-
72-
Please note that the functions are annotated with dockstrings for more details.
73-
7454
Import the package and pandas:
75-
76-
7755
```python
7856
import dysregnet
7957
import pandas as pd
8058
```
8159

82-
83-
8460
Define the confounding variables or the design matrix
85-
8661
```python
87-
# The condition column
62+
# define condition column (0 indicated control, 1 indicates case)
8863
conCol='condition'
8964

90-
# categorical variable columns in meta dataframe.
91-
# these columns will be transformed to variables for regression
65+
# define categorical confounder columns in meta dataframe
9266
CatCov=['race','gender']
9367

94-
# continuous variable columns in meta dataframe.
68+
# define continuous confounder columns in meta dataframe.
9569
ConCov=['birth_days_to']
9670
```
9771

98-
9972
Run DysRegNet
100-
10173
```python
10274
data=dysregnet.run(expression_data=expr,
10375
meta=meta,
@@ -107,46 +79,36 @@ data=dysregnet.run(expression_data=expr,
10779
ConCov=ConCov,
10880
direction_condition=True,
10981
normaltest=True,
110-
R2_threshold=.2 )
82+
R2_threshold=.2)
11183

112-
# results table
84+
# get the patient-specific dysregulate networks
11385
data.get_results()
11486

115-
# or a binary result
116-
87+
# or with binary edges
11788
data.get_results_binary()
11889

11990
# get R2 values, coefficients, and coefficient p-values for all models/edges
12091
data.get_model_stats()
121-
12292
```
12393

124-
The expected run time for the installation and running the demo dataset on a "normal" desktop computer is around 3~5 minutes.
125-
126-
127-
12894
## The output
129-
13095
The package outputs a data frame that represents patient-specific dysregulated edges. The columns represent edges, and the rows are patient IDs.
13196

132-
In the result table, a value of 0 means that the edge is not significantly dysregulated (different from control samples). Otherwise, the z-score is reported, with a positive in case of activation and a negative sign in case of repression (different than the sign of the residual).
97+
In the result table, a value of 0 means that the edge is not significantly dysregulated (different from control samples). Otherwise, the z-score is reported.
13398

13499
The method "get_results_binary()" outputs binarized dysregulations instead of z-scores.
135100

101+
"get_model_stats()" outputs R2 values, coefficients, and coefficient p-values for all models/edges.
136102

137103
## Example
138104

139105
A simple example for running DysRegNet:
140106
([Notebook](https://github.com/biomedbigdata/DysRegNet_package/blob/main/test.ipynb)/[Google Colab](https://colab.research.google.com/github/biomedbigdata/DysRegNet_package/blob/main/test.ipynb)).
141107

142-
143108
You will need to download the demo dataset and extract the files into test dataset/
144109

145110
Link for the demo dataset: https://figshare.com/ndownloader/files/35142652
146111

147-
148-
149112
## Cite
150-
151113
"DysRegNet: Patient-specific and confounder-aware dysregulated network inference"
152-
Olga Lazareva*, Zakaria Louadi*, Johannes Kersting, Jan Baumbach, David B. Blumenthal, Markus List. bioRxiv 2022.04.29.490015; doi: https://doi.org/10.1101/2022.04.29.490015. * equal first-authors
114+
Johannes Kersting*, Olga Lazareva*, Zakaria Louadi*, David B. Blumenthal, Jan Baumbach, Markus List. bioRxiv 2022.04.29.490015; doi: https://doi.org/10.1101/2022.04.29.490015. * equal first-authors

0 commit comments

Comments
 (0)