openproblems-bio
diff --git a/‎docs/source/extending.rst‎
Lines changed: 2 additions & 3 deletions b/‎docs/source/extending.rst‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎docs/source/index.rst‎
Lines changed: 4 additions & 2 deletions b/‎docs/source/index.rst‎
Lines changed: 4 additions & 2 deletions
diff --git a/‎docs/source/inference.rst‎
Lines changed: 2 additions & 9 deletions b/‎docs/source/inference.rst‎
Lines changed: 2 additions & 9 deletions
diff --git a/‎docs/source/leaderboard.rst‎
Lines changed: 36 additions & 0 deletions b/‎docs/source/leaderboard.rst‎
Lines changed: 36 additions & 0 deletions
diff --git a/‎scripts/experiments/readme.md‎
Lines changed: 1 addition & 0 deletions b/‎scripts/experiments/readme.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎scripts/experiments/run_process_data.sh‎ ‎scripts/experiments/run_sc_bulk.sh‎scripts/experiments/run_process_data.sh renamed to scripts/experiments/run_sc_bulk.sh
Lines changed: 2 additions & 2 deletions b/‎scripts/experiments/run_process_data.sh‎ ‎scripts/experiments/run_sc_bulk.sh‎scripts/experiments/run_process_data.sh renamed to scripts/experiments/run_sc_bulk.sh
Lines changed: 2 additions & 2 deletions
diff --git a/‎scripts/labels_tw.config‎
Lines changed: 9 additions & 0 deletions b/‎scripts/labels_tw.config‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎scripts/run_all.sh‎
Lines changed: 24 additions & 3 deletions b/‎scripts/run_all.sh‎
Lines changed: 24 additions & 3 deletions
diff --git a/‎scripts/run_grn_inference.sh‎
Lines changed: 10 additions & 3 deletions b/‎scripts/run_grn_inference.sh‎
Lines changed: 10 additions & 3 deletions
diff --git a/‎scripts/run_process_data.sh‎
Lines changed: 2 additions & 2 deletions b/‎scripts/run_process_data.sh‎
Lines changed: 2 additions & 2 deletions
@@ -10,7 +10,7 @@ Examples of GRN inference methods include GRNBoost2, CellOracle, and SCENIC. The
 Each method requires a `config.vsh` file together
 with a `script.py`. Additionally, the method can have extra files to store and organize the code, such as `helper`, which are stored in the same folder and called by the `script.py`.
 
-The overlook of `config.vsh` is as follows. However, refer to the `src/methods/dummpy/config.yaml` for the updated formatting.
+The overlook of `config.vsh` is as follows. However, refer to the `src/methods/` folder for the updated formatting.
 
 .. code-block:: yaml
    :caption: Example of a `config.vsh` file
@@ -38,7 +38,6 @@ The overlook of `config.vsh` is as follows. However, refer to the `src/methods/d
             - type: python
                 packages: [ grnboost2 ] # additional packages required for your method. see different methods for examples as this could get complicated. or, use your image and omit this.
     
-        - type: native
     runners: # this is for the nextflow pipeline.
         - type: executable
         - type: nextflow
@@ -77,7 +76,7 @@ Your `script.py` should have the following structure:
         X=None,
         uns={
             "method_id": "method_name",
-            "dataset_id": "dataset_name", # one of op, norman, etc.
+            "dataset_id": "dataset_name", 
             "prediction": net[["source", "target", "weight"]]
         }
     )
 
@@ -21,14 +21,15 @@ For information on evaluation metrics, refer to the :doc:`evaluation` section.
 
 To integrate your GRN inference method, metric, or dataset, follow the instructions in the :doc:`extending` section. 
 
-
-Currently, 5 multi-omics GRN inference and 5 transcriptomics-based methods are integrated into geneRNIB. You can find the latest integrated GRN inference methods on that page.
+To see the comparitive performance of the integrated GRN inference methods, refer to the :doc:`leaderboard` section.
 
 .. image:: images/grn_models.png
    :width: 70%
    :align: center
 ----
 
+Pls see the GitHub page for the list of currently integrated methods. The methods are implemented in Python and R, and they can be used to infer GRNs from the datasets provided by geneRNIB.
+
 In addition, three baseline methods are integrated into geneRNIB. These methods are used to evaluate the performance of new methods. The baseline methods are:
 
 - **Negative control**: Randomly assigns weights to edges. GRN inference methods should outperform this method.
@@ -64,5 +65,6 @@ Contents
    dataset
    evaluation
    extending
+   leaderboard
 
 
@@ -13,10 +13,7 @@ The inference datasets can be downloaded and stored in the `resources/grn_benchm
 
    aws s3 sync s3://openproblems-data/resources/grn/grn_benchmark/inference_data resources/grn_benchmark/inference_data --no-sign-request
 
-### 2. Available Datasets  
-The available datasets include **op, nakatake, replogle, adamson,** and **norman**. Each dataset provides RNA data. Additionally, the `op` dataset includes paired multiome ATAC and RNA data.
-
-### 3. GRN Inference Guidelines  
+### 2. GRN Inference Guidelines  
 When performing GRN inference, please consider the following:  
 
 - We evaluate only the **top TF-gene pairs**, currently limited to **50,000 edges**, ranked by their assigned weight.  
@@ -27,7 +24,7 @@ When performing GRN inference, please consider the following:
   - `target`: Target gene  
   - `weight`: Regulatory importance/likelihood score  
 
-### 4. Saving the Inferred Network  
+### 3. Saving the Inferred Network  
 Since geneRNIB works with **AnnData**, your inferred network should be saved in this format.
 
 #### **Python Example: Saving a Network with AnnData**  
@@ -72,7 +69,3 @@ For R, use the following approach:
 
 ### Next Steps  
 Once you have inferred GRNs for one or more datasets, proceed to the next section to run the evaluation.
-
----
-
-This version improves readability, corrects typos, enhances formatting, and ensures consistency in terminology. Let me know if you need further refinements! 🚀
@@ -0,0 +1,36 @@
+
+Leaderboard
+=================
+The overal comparitive performance of the integrated GRN inference methods is summarized in the leaderboard below. 
+  
+.. image:: images/leaderboard.png
+   :width: 90%
+   :align: center
+----
+
+The individual performance of the methods on each dataset is summarized below.
+
+.. image:: images/op.png
+   :width: 90%
+   :align: center
+----
+
+.. image:: images/nakatake.png
+   :width: 90%
+   :align: center
+----
+
+.. image:: images/norman.png
+   :width: 90%
+   :align: center
+----
+
+.. image:: images/adamson.png
+   :width: 90%
+   :align: center
+----
+
+.. image:: images/replogle.png
+   :width: 90%
+   :align: center
+----
@@ -0,0 +1 @@
+- run_sc_bulk: aim is to evaluate if sc format of data outperformed pseudobulked versions. Thus, it runs grn benchmark for two versions of data: sc vs pseudobulked. it only uses pearson correlation as inference metric. 
@@ -4,8 +4,8 @@
 #SBATCH --error=logs/%j.err
 #SBATCH --ntasks=1
 #SBATCH --cpus-per-task=2
-#SBATCH --time=20:00:00
-#SBATCH --mem=1000GB
+#SBATCH --time=10:00:00
+#SBATCH --mem=1500GB
 #SBATCH --partition=cpu
 #SBATCH --mail-type=END,FAIL      
 #SBATCH --mail-user=jalil.nourisa@gmail.com   
 
@@ -39,6 +39,14 @@ process {
 
 
   // Resource labels
+  withLabel: {lowtime: 1.h}
+  withLabel: {midtime: 4.h}
+  withLabel: {hightime: 8.h}
+  withLabel: {veryhightime: 24.h}
+  withLabel: {onedaytime: 24.h}
+  withLabel: {onedaytime: 24.h}
+  withLabel: {twodaytime: 28.h}
+
   withLabel: lowcpu { cpus = 5 }
   withLabel: midcpu { cpus = 15 }
   withLabel: highcpu { cpus = 30 }
@@ -54,6 +62,7 @@ process {
     memory = { get_memory( 100.GB * task.attempt ) }
     disk = { 200.GB * task.attempt }
   }
+
   withLabel: veryhighmem {
     memory = { get_memory( 200.GB * task.attempt ) }
     disk = { 400.GB * task.attempt }
 
@@ -1,10 +1,10 @@
 set -e
 
-datasets=('replogle') #'replogle' 'op' 'nakatake' 'adamson' 'norman' 
+datasets=('replogle') #'replogle' 'op' 'nakatake' 'adamson' 'norman'  xaira_HEK293T xaira_HEK293T parsescience
 run_local=true # set to true to run locally, false to run on AWS
 
-run_grn_inference=false
-run_grn_evaluation=true
+run_grn_inference=true
+run_grn_evaluation=false
 run_download=false
 
 
@@ -14,6 +14,16 @@ for dataset in "${datasets[@]}"; do
         echo "Running GRN inference for dataset: $dataset"
         if [ "$run_local" = true ]; then
             echo "Running locally"
+            
+            file="resources/results/$dataset/trace.txt"
+
+            if [ -f "$file" ]; then
+                
+                dir=$(dirname "$file")
+                base=$(basename "$file" .txt)
+                today=$(date +%Y-%m-%d)
+                cp "$file" "${dir}/${base}_${today}.txt"
+            fi
         else
             echo "Running on AWS"
         fi
@@ -23,6 +33,17 @@ for dataset in "${datasets[@]}"; do
 
     if [ "$run_grn_evaluation" = true ]; then
         if [ "$run_local" = false ]; then
+            
+            file="resources/results/$dataset/trace.txt"
+
+            if [ -f "$file" ]; then
+                echo "Making a copy of previous trace file"
+                dir=$(dirname "$file")
+                base=$(basename "$file" .txt)
+                today=$(date +%Y-%m-%d)
+                cp "$file" "${dir}/${base}_${today}.txt"
+            fi
+
             echo "Downloading inference results from AWS"
             aws s3 sync  s3://openproblems-data/resources/grn/results/$dataset resources/results/$dataset 
         fi 
 
@@ -113,9 +113,16 @@ HERE
   fi
 }
 
-# Example usage:
-append_entry "$DATASET" "[pearson_corr, negative_control, positive_control]"
-# append_entry "$DATASET" "[scprint]" "true"
+if [[ "$DATASET" =~ ^(replogle|parsescience|xaira_HEK293T)$ ]]; then
+  # append_entry "$DATASET" "[pearson_corr, negative_control, positive_control, grnboost, ppcor, portia, scenic]"
+  # append_entry "$DATASET" "[scprint]" "true"
+  append_entry "$DATASET" "[scenic]"
+elif [ "$DATASET" = "op" ]; then
+  append_entry "$DATASET" "[pearson_corr, negative_control, positive_control, grnboost, ppcor, portia, scenic, scprint, figr, scenicplus, celloracle, granie, scglue]"
+else
+  append_entry "$DATASET" "[pearson_corr, negative_control, positive_control, grnboost, ppcor, portia, scenic, scprint]"
+fi
+
 
 # --- Final configuration ---
 if [ "$RUN_LOCAL" = true ]; then
 
@@ -17,7 +17,7 @@ set -e
 # python src/process_data/nakatake/script.py 
 # python src/process_data/norman/script.py
 
-# python src/process_data/opsca/script.py 
+python src/process_data/opsca/script.py 
 # python src/process_data/replogle/script.py  #--run_test  #--run_test
-python src/process_data/xaira/script.py    #--run_test
+# python src/process_data/xaira/script.py    #--run_test
 # python src/process_data/parse_bioscience/script.py  #--run_test
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+- run_sc_bulk: aim is to evaluate if sc format of data outperformed pseudobulked versions. Thus, it runs grn benchmark for two versions of data: sc vs pseudobulked. it only uses pearson correlation as inference metric.`