You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the user chooses the Splice-Junction-aware approach (which we call the "TR" (transcriptomic) for simplicity) of RCRUNCH, some additional steps are performed to identify reads that map across splice junctions. That is, after all the preprocessing steps, the remaining alignments for foreground (CLIP) samples are used to select the most expressed transcript isoform for each gene and construct a dataset-specific transcriptome. Then the genome and transcriptome alignment files are jointly analyzed to identify the highest scoring alignment for each read. Peaks are then detected either on the genome or the transcriptome (see RCRUNCH model), treating individual transcripts as chromosomes. This approach allows for the detection and proper quantification of RBP binding sites in the vicinity or even spanning splice junctions.
20
+
If the user chooses the Splice-Junction-aware approach (which we call the "TR" (transcriptomic) for simplicity) of RCRUNCH, some additional steps are performed to identify reads that map across splice junctions. That is, after all the preprocessing steps, the remaining alignments for foreground (CLIP) samples are used to select the most expressed transcript isoform for each gene and construct a dataset-specific transcriptome. Then the genome and transcriptome alignment files are jointly analyzed to identify the highest scoring alignment for each read. Peaks are then detected either on the genome (essentially the pre-mRNAs) or the transcriptome (see [RCRUNCH](#RCRUNCH_model) model). This approach allows for the detection and proper quantification of RBP binding sites in the vicinity or even spanning splice junctions.
At the heart of RCRUNCH lies the RCRUNCH model for the detection of RBP-binding regions. Genome/transcriptome-wide identification of peaks corresponding to individual binding sites for an RBP is time consuming. For this reason RCRUNCH implements a two-step process:
25
25
1. Identify broader genomic regions that are enriched in reads in the foreground (CLIP) compared to the background sample
26
26
2. Identify individual peaks within these selected broader windows
27
27
28
-
> 📖 Please read the "Methods" Section of the manuscript for an extensive description of RCRUNCH.
28
+
> 📖 Please read the "Methods" Section of the [manuscript](https://www.biorxiv.org/content/10.1101/2022.07.06.498949v1) for an extensive description of RCRUNCH.
29
29
30
30
### <spanstyle="color:blue">Motif analysis</span>
31
31
The last part of RCRUNCH is the de-novo prediction of binding motifs and the computation of enrichment scores for known (e.g. from [ATtRACT](https://attract.cnic.es/search)) and de-novo motifs for the RBP of interest.
<figcaption align = "center"><b> Overview of the RCRUNCH analysis steps </b></figcaption>
35
36
</div>
36
37
37
38
@@ -70,7 +71,7 @@ for your system (Linux). Be sure to select Python 3 option.
70
71
The workflow was built and tested with `miniconda 4.7.12`.
71
72
Other versions are not guaranteed to work as expected.
72
73
73
-
In addition to Miniconda, you will need the[Mamba](https://github.com/mamba-org/mamba) package manager, which -if you don't have it yet- needs to be installed in
74
+
In addition to Miniconda, you are strongly advised to use[Mamba](https://github.com/mamba-org/mamba) package manager, which -if you don't have it yet- needs to be installed in
74
75
the `base` conda environment with:
75
76
76
77
```bash
@@ -98,7 +99,7 @@ conda activate rcrunch
98
99
```
99
100
100
101
### 5. Test
101
-
To ensure that the version is working properly you can test it by:
102
+
To ensure that the code is working properly you can test it by:
102
103
1. Activate the Conda environment with:
103
104
```bash
104
105
conda activate rcrunch
@@ -114,22 +115,25 @@ or for **SLURM** workload manager,
> ✨ For your convenience a pre-filled [config.yaml](config.yaml) file is available.
119
+
### 6. Execution of RCRUNCH
120
120
121
-
> 💡 If you want to execute the dataset described in the pre-filled [config.yaml](config.yaml), you must fetch all the required files (clip samples, genome, attract, rna-central) by running: <br> `bash scripts/get_extra_annotation.sh` and then follow the steps described below
122
121
123
122
### 6a. Fill in the necessary config file
124
123
125
124
In order to run RCRUNCH, please fill in the organism related data and the experiment-dependent parameters for the different samples in the file `config.yaml`.
126
125
127
-
> ✨ You can replace the files and values accordingly. Leave as is if you want to execute the full dataset example.
126
+
> ✨ For your convenience a [pre-filled config.yaml](config.yaml) file is available, based on a real example. This can be adapted accordingly.
127
+
If you want to execute RCRUNCH with this config as is you need to download the required files by running:
128
+
```
129
+
bash get_extra_annotation.sh
130
+
```
128
131
129
132
130
133
### 6b. Dry run and DAG generation (optional)
131
134
132
-
You can generate a dry run of the pipeline by running:
135
+
You can perform a test execution of the pipeline without producing any actual results (referred to as dry run) by running:
136
+
133
137
```bash
134
138
snakemake \
135
139
-np \
@@ -138,6 +142,8 @@ snakemake \
138
142
--configfile config.yaml
139
143
```
140
144
145
+
This will show the jobs that will be called upon actual execution.
146
+
141
147
A directed acyclic graph (dag) of the run can be generated by running:
0 commit comments