You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DysRegNet, is a method for inferring patient-specific regulatory alterations (dysregulations) from gene expression profiles. DysRegNet uses linear models to account for confounders and residual-derived z-scores to assess significance.
7
-
8
-
9
5
## Installation
10
6
To install the package from PyPI please run:
11
-
12
-
`pip install dysregnet`
13
-
7
+
```bash
8
+
pip install dysregnet
9
+
```
14
10
15
11
or you can install it from git:
16
-
17
-
`git clone https://github.com/biomedbigdata/DysRegNet_package.git && cd DysRegNet_package`
The inputs of the package are the following Pandas DataFrame objects:
24
19
25
-
The inputs of the package are the following Pandas DataFrame object:
26
-
27
-
28
-
- expression_data - Gene expression matrix with the format: patients as rows (first column - patients/samples ids), and genes as columns.
20
+
- expression_data - Gene expression matrix in the format: patients as rows (first column - patients/samples ids), and genes as columns.
29
21
- GRN - Gene Regulatory Network (GRN) with two columns in the following order ['TF', 'target'].
30
22
- meta - Metadata with the first column containing patients/samples ids and other columns for the condition and the covariates.
31
23
32
-
33
24
The patients id or samples ids must be the same in the "expression_data" and "meta". Additionally, gene names or ids must match the ones in the "GRN" DataFrame.
34
25
35
26
In the condition column of the meta DataFrame, the control samples should be encoded as 0 and case samples as 1.
36
27
37
28
The gene regulatory network should be provided by the user. You can either use an experimental validated GRN or learn it from control samples. We recommend using software like [arboreto](https://github.com/aertslab/arboreto) since you can use its output directly to DysRegNet.
38
29
39
-
40
-
41
-
42
-
43
30
## Parameters
44
-
45
-
46
31
Additionally, you can provide the following parameters:
47
32
48
-
49
-
50
33
- conCol: Column name for the condition in the meta DataFrame.
51
34
52
35
- CatCov: List of categorical variable names. They should match the name of their columns in the meta Dataframe.
53
36
54
37
- ConCov: List of continuous covariates. They should match the name of their columns in the meta Dataframe.
55
38
56
-
- zscoring: Boolean, default: False. zscoring of expression data (if needed).
39
+
- zscoring: If True, DysRegNet will scale the expression of each gene and all continuous confounders based on their mean and standard deviation in the control samples.
57
40
58
41
- bonferroni_alpha: P-value threshold for multiple testing correction
59
42
60
-
- normaltest: Boolean. If True, Run a normality test for residuals "scipy.stats.normaltest". If residuals are not normal, the edge will not be considered in the analysis.
43
+
- normaltest: If True, DysRegNet runs a normality test for residuals "scipy.stats.normaltest". If residuals are not normal, the edge will not be considered in the analysis.
61
44
62
-
- normaltest_alpha: p-value threshold for normaltest (if True).
45
+
- normaltest_alpha: P-value threshold for normaltest (if True).
63
46
64
47
- R2_threshold: R-squared (R2) threshold from 0 to 1 (optional). If the fit is weaker, the edge will not be considered in the analysis.
65
48
66
-
- direction_condition: Boolean. If True: only include dysregulation that are relevant for the interactions (down-regulation of an activation or up-regulation of a supressions). Please check the paper for more details.
49
+
- direction_condition: If True, DysRegNet will only consider case samples with positive residuals (target gene overexpressed) for models with a negative TF coefficient as potentially dysregulated. Similarly, for positive TF coefficients, only case samples with negative residuals are considered. Please check the paper for more details.
67
50
51
+
The parameters are also annotated with dockstrings for more details.
68
52
69
53
## Get Started
70
-
71
-
72
-
Please note that the functions are annotated with dockstrings for more details.
73
-
74
54
Import the package and pandas:
75
-
76
-
77
55
```python
78
56
import dysregnet
79
57
import pandas as pd
80
58
```
81
59
82
-
83
-
84
60
Define the confounding variables or the design matrix
# get R2 values, coefficients, and coefficient p-values for all models/edges
120
91
data.get_model_stats()
121
-
122
92
```
123
93
124
-
The expected run time for the installation and running the demo dataset on a "normal" desktop computer is around 3~5 minutes.
125
-
126
-
127
-
128
94
## The output
129
-
130
95
The package outputs a data frame that represents patient-specific dysregulated edges. The columns represent edges, and the rows are patient IDs.
131
96
132
-
In the result table, a value of 0 means that the edge is not significantly dysregulated (different from control samples). Otherwise, the z-score is reported, with a positive in case of activation and a negative sign in case of repression (different than the sign of the residual).
97
+
In the result table, a value of 0 means that the edge is not significantly dysregulated (different from control samples). Otherwise, the z-score is reported.
133
98
134
99
The method "get_results_binary()" outputs binarized dysregulations instead of z-scores.
135
100
101
+
"get_model_stats()" outputs R2 values, coefficients, and coefficient p-values for all models/edges.
You will need to download the demo dataset and extract the files into test dataset/
144
109
145
110
Link for the demo dataset: https://figshare.com/ndownloader/files/35142652
146
111
147
-
148
-
149
112
## Cite
150
-
151
113
"DysRegNet: Patient-specific and confounder-aware dysregulated network inference"
152
-
Olga Lazareva*, Zakaria Louadi*, Johannes Kersting, Jan Baumbach, David B. Blumenthal, Markus List. bioRxiv 2022.04.29.490015; doi: https://doi.org/10.1101/2022.04.29.490015. * equal first-authors
114
+
Johannes Kersting*, Olga Lazareva*, Zakaria Louadi*, David B. Blumenthal, Jan Baumbach, Markus List. bioRxiv 2022.04.29.490015; doi: https://doi.org/10.1101/2022.04.29.490015. * equal first-authors
0 commit comments