Skip to content

Commit 194f4a7

Browse files
authored
Merge pull request #43 from openproblems-bio/jalil
config files are updated to new resources requirements
2 parents c1ec044 + 8f3fe4a commit 194f4a7

File tree

27 files changed

+482
-263
lines changed

27 files changed

+482
-263
lines changed

docs/source/extending.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Examples of GRN inference methods include GRNBoost2, CellOracle, and SCENIC. The
1010
Each method requires a `config.vsh` file together
1111
with a `script.py`. Additionally, the method can have extra files to store and organize the code, such as `helper`, which are stored in the same folder and called by the `script.py`.
1212

13-
The overlook of `config.vsh` is as follows. However, refer to the `src/methods/dummpy/config.yaml` for the updated formatting.
13+
The overlook of `config.vsh` is as follows. However, refer to the `src/methods/` folder for the updated formatting.
1414

1515
.. code-block:: yaml
1616
:caption: Example of a `config.vsh` file
@@ -38,7 +38,6 @@ The overlook of `config.vsh` is as follows. However, refer to the `src/methods/d
3838
- type: python
3939
packages: [ grnboost2 ] # additional packages required for your method. see different methods for examples as this could get complicated. or, use your image and omit this.
4040
41-
- type: native
4241
runners: # this is for the nextflow pipeline.
4342
- type: executable
4443
- type: nextflow
@@ -77,7 +76,7 @@ Your `script.py` should have the following structure:
7776
X=None,
7877
uns={
7978
"method_id": "method_name",
80-
"dataset_id": "dataset_name", # one of op, norman, etc.
79+
"dataset_id": "dataset_name",
8180
"prediction": net[["source", "target", "weight"]]
8281
}
8382
)

docs/source/index.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,15 @@ For information on evaluation metrics, refer to the :doc:`evaluation` section.
2121

2222
To integrate your GRN inference method, metric, or dataset, follow the instructions in the :doc:`extending` section.
2323

24-
25-
Currently, 5 multi-omics GRN inference and 5 transcriptomics-based methods are integrated into geneRNIB. You can find the latest integrated GRN inference methods on that page.
24+
To see the comparitive performance of the integrated GRN inference methods, refer to the :doc:`leaderboard` section.
2625

2726
.. image:: images/grn_models.png
2827
:width: 70%
2928
:align: center
3029
----
3130

31+
Pls see the GitHub page for the list of currently integrated methods. The methods are implemented in Python and R, and they can be used to infer GRNs from the datasets provided by geneRNIB.
32+
3233
In addition, three baseline methods are integrated into geneRNIB. These methods are used to evaluate the performance of new methods. The baseline methods are:
3334

3435
- **Negative control**: Randomly assigns weights to edges. GRN inference methods should outperform this method.
@@ -64,5 +65,6 @@ Contents
6465
dataset
6566
evaluation
6667
extending
68+
leaderboard
6769

6870

docs/source/inference.rst

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,7 @@ The inference datasets can be downloaded and stored in the `resources/grn_benchm
1313
1414
aws s3 sync s3://openproblems-data/resources/grn/grn_benchmark/inference_data resources/grn_benchmark/inference_data --no-sign-request
1515
16-
### 2. Available Datasets
17-
The available datasets include **op, nakatake, replogle, adamson,** and **norman**. Each dataset provides RNA data. Additionally, the `op` dataset includes paired multiome ATAC and RNA data.
18-
19-
### 3. GRN Inference Guidelines
16+
### 2. GRN Inference Guidelines
2017
When performing GRN inference, please consider the following:
2118

2219
- We evaluate only the **top TF-gene pairs**, currently limited to **50,000 edges**, ranked by their assigned weight.
@@ -27,7 +24,7 @@ When performing GRN inference, please consider the following:
2724
- `target`: Target gene
2825
- `weight`: Regulatory importance/likelihood score
2926

30-
### 4. Saving the Inferred Network
27+
### 3. Saving the Inferred Network
3128
Since geneRNIB works with **AnnData**, your inferred network should be saved in this format.
3229

3330
#### **Python Example: Saving a Network with AnnData**
@@ -72,7 +69,3 @@ For R, use the following approach:
7269
7370
### Next Steps
7471
Once you have inferred GRNs for one or more datasets, proceed to the next section to run the evaluation.
75-
76-
---
77-
78-
This version improves readability, corrects typos, enhances formatting, and ensures consistency in terminology. Let me know if you need further refinements! 🚀

docs/source/leaderboard.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
2+
Leaderboard
3+
=================
4+
The overal comparitive performance of the integrated GRN inference methods is summarized in the leaderboard below.
5+
6+
.. image:: images/leaderboard.png
7+
:width: 90%
8+
:align: center
9+
----
10+
11+
The individual performance of the methods on each dataset is summarized below.
12+
13+
.. image:: images/op.png
14+
:width: 90%
15+
:align: center
16+
----
17+
18+
.. image:: images/nakatake.png
19+
:width: 90%
20+
:align: center
21+
----
22+
23+
.. image:: images/norman.png
24+
:width: 90%
25+
:align: center
26+
----
27+
28+
.. image:: images/adamson.png
29+
:width: 90%
30+
:align: center
31+
----
32+
33+
.. image:: images/replogle.png
34+
:width: 90%
35+
:align: center
36+
----

scripts/experiments/readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
- run_sc_bulk: aim is to evaluate if sc format of data outperformed pseudobulked versions. Thus, it runs grn benchmark for two versions of data: sc vs pseudobulked. it only uses pearson correlation as inference metric.
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
#SBATCH --error=logs/%j.err
55
#SBATCH --ntasks=1
66
#SBATCH --cpus-per-task=2
7-
#SBATCH --time=20:00:00
8-
#SBATCH --mem=1000GB
7+
#SBATCH --time=10:00:00
8+
#SBATCH --mem=1500GB
99
#SBATCH --partition=cpu
1010
#SBATCH --mail-type=END,FAIL
1111
#SBATCH --mail-user=jalil.nourisa@gmail.com

scripts/labels_tw.config

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,14 @@ process {
3939

4040

4141
// Resource labels
42+
withLabel: {lowtime: 1.h}
43+
withLabel: {midtime: 4.h}
44+
withLabel: {hightime: 8.h}
45+
withLabel: {veryhightime: 24.h}
46+
withLabel: {onedaytime: 24.h}
47+
withLabel: {onedaytime: 24.h}
48+
withLabel: {twodaytime: 28.h}
49+
4250
withLabel: lowcpu { cpus = 5 }
4351
withLabel: midcpu { cpus = 15 }
4452
withLabel: highcpu { cpus = 30 }
@@ -54,6 +62,7 @@ process {
5462
memory = { get_memory( 100.GB * task.attempt ) }
5563
disk = { 200.GB * task.attempt }
5664
}
65+
5766
withLabel: veryhighmem {
5867
memory = { get_memory( 200.GB * task.attempt ) }
5968
disk = { 400.GB * task.attempt }

scripts/run_all.sh

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
set -e
22

3-
datasets=('replogle') #'replogle' 'op' 'nakatake' 'adamson' 'norman'
3+
datasets=('replogle') #'replogle' 'op' 'nakatake' 'adamson' 'norman' xaira_HEK293T xaira_HEK293T parsescience
44
run_local=true # set to true to run locally, false to run on AWS
55

6-
run_grn_inference=false
7-
run_grn_evaluation=true
6+
run_grn_inference=true
7+
run_grn_evaluation=false
88
run_download=false
99

1010

@@ -14,6 +14,16 @@ for dataset in "${datasets[@]}"; do
1414
echo "Running GRN inference for dataset: $dataset"
1515
if [ "$run_local" = true ]; then
1616
echo "Running locally"
17+
18+
file="resources/results/$dataset/trace.txt"
19+
20+
if [ -f "$file" ]; then
21+
22+
dir=$(dirname "$file")
23+
base=$(basename "$file" .txt)
24+
today=$(date +%Y-%m-%d)
25+
cp "$file" "${dir}/${base}_${today}.txt"
26+
fi
1727
else
1828
echo "Running on AWS"
1929
fi
@@ -23,6 +33,17 @@ for dataset in "${datasets[@]}"; do
2333

2434
if [ "$run_grn_evaluation" = true ]; then
2535
if [ "$run_local" = false ]; then
36+
37+
file="resources/results/$dataset/trace.txt"
38+
39+
if [ -f "$file" ]; then
40+
echo "Making a copy of previous trace file"
41+
dir=$(dirname "$file")
42+
base=$(basename "$file" .txt)
43+
today=$(date +%Y-%m-%d)
44+
cp "$file" "${dir}/${base}_${today}.txt"
45+
fi
46+
2647
echo "Downloading inference results from AWS"
2748
aws s3 sync s3://openproblems-data/resources/grn/results/$dataset resources/results/$dataset
2849
fi

scripts/run_grn_inference.sh

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -113,9 +113,16 @@ HERE
113113
fi
114114
}
115115

116-
# Example usage:
117-
append_entry "$DATASET" "[pearson_corr, negative_control, positive_control]"
118-
# append_entry "$DATASET" "[scprint]" "true"
116+
if [[ "$DATASET" =~ ^(replogle|parsescience|xaira_HEK293T)$ ]]; then
117+
# append_entry "$DATASET" "[pearson_corr, negative_control, positive_control, grnboost, ppcor, portia, scenic]"
118+
# append_entry "$DATASET" "[scprint]" "true"
119+
append_entry "$DATASET" "[scenic]"
120+
elif [ "$DATASET" = "op" ]; then
121+
append_entry "$DATASET" "[pearson_corr, negative_control, positive_control, grnboost, ppcor, portia, scenic, scprint, figr, scenicplus, celloracle, granie, scglue]"
122+
else
123+
append_entry "$DATASET" "[pearson_corr, negative_control, positive_control, grnboost, ppcor, portia, scenic, scprint]"
124+
fi
125+
119126

120127
# --- Final configuration ---
121128
if [ "$RUN_LOCAL" = true ]; then

scripts/run_process_data.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ set -e
1717
# python src/process_data/nakatake/script.py
1818
# python src/process_data/norman/script.py
1919

20-
# python src/process_data/opsca/script.py
20+
python src/process_data/opsca/script.py
2121
# python src/process_data/replogle/script.py #--run_test #--run_test
22-
python src/process_data/xaira/script.py #--run_test
22+
# python src/process_data/xaira/script.py #--run_test
2323
# python src/process_data/parse_bioscience/script.py #--run_test

0 commit comments

Comments
 (0)