Skip to content

Commit 0b9c6c0

Browse files
author
BIOPZ-Katsantoni Maria
committed
Merge branch 'main' of https://github.com/zavolanlab/RCRUNCH into snakemake_env_fix
2 parents f96cbae + bd7bd67 commit 0b9c6c0

File tree

2 files changed

+21
-13
lines changed

2 files changed

+21
-13
lines changed

README.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,21 +17,22 @@ RCRUNCH consists of the following components:
1717

1818

1919
### <span style="color:green">Splice-Junction-aware (transcriptomic) approach</span>
20-
If the user chooses the Splice-Junction-aware approach (which we call the "TR" (transcriptomic) for simplicity) of RCRUNCH, some additional steps are performed to identify reads that map across splice junctions. That is, after all the preprocessing steps, the remaining alignments for foreground (CLIP) samples are used to select the most expressed transcript isoform for each gene and construct a dataset-specific transcriptome. Then the genome and transcriptome alignment files are jointly analyzed to identify the highest scoring alignment for each read. Peaks are then detected either on the genome or the transcriptome (see RCRUNCH model), treating individual transcripts as chromosomes. This approach allows for the detection and proper quantification of RBP binding sites in the vicinity or even spanning splice junctions.
20+
If the user chooses the Splice-Junction-aware approach (which we call the "TR" (transcriptomic) for simplicity) of RCRUNCH, some additional steps are performed to identify reads that map across splice junctions. That is, after all the preprocessing steps, the remaining alignments for foreground (CLIP) samples are used to select the most expressed transcript isoform for each gene and construct a dataset-specific transcriptome. Then the genome and transcriptome alignment files are jointly analyzed to identify the highest scoring alignment for each read. Peaks are then detected either on the genome (essentially the pre-mRNAs) or the transcriptome (see [RCRUNCH](#RCRUNCH_model) model). This approach allows for the detection and proper quantification of RBP binding sites in the vicinity or even spanning splice junctions.
2121

22-
### <span style="color:red">RCRUNCH model</span>
22+
### <span style="color:red" id="RCRUNCH_model">RCRUNCH model</span>
2323

2424
At the heart of RCRUNCH lies the RCRUNCH model for the detection of RBP-binding regions. Genome/transcriptome-wide identification of peaks corresponding to individual binding sites for an RBP is time consuming. For this reason RCRUNCH implements a two-step process:
2525
1. Identify broader genomic regions that are enriched in reads in the foreground (CLIP) compared to the background sample
2626
2. Identify individual peaks within these selected broader windows
2727

28-
> 📖 Please read the "Methods" Section of the manuscript for an extensive description of RCRUNCH.
28+
> 📖 Please read the "Methods" Section of the [manuscript](https://www.biorxiv.org/content/10.1101/2022.07.06.498949v1) for an extensive description of RCRUNCH.
2929
3030
### <span style="color:blue">Motif analysis</span>
3131
The last part of RCRUNCH is the de-novo prediction of binding motifs and the computation of enrichment scores for known (e.g. from [ATtRACT](https://attract.cnic.es/search)) and de-novo motifs for the RBP of interest.
3232

3333
<div align="left">
34-
<img width="50%" align="center" src=images/rcrunch_components.png>
34+
<img width="100%" align="center" src=images/rcrunch_components.png>
35+
<figcaption align = "center"><b> Overview of the RCRUNCH analysis steps </b></figcaption>
3536
</div>
3637

3738

@@ -70,7 +71,7 @@ for your system (Linux). Be sure to select Python 3 option.
7071
The workflow was built and tested with `miniconda 4.7.12`.
7172
Other versions are not guaranteed to work as expected.
7273

73-
In addition to Miniconda, you will need the [Mamba](https://github.com/mamba-org/mamba) package manager, which -if you don't have it yet- needs to be installed in
74+
In addition to Miniconda, you are strongly advised to use [Mamba](https://github.com/mamba-org/mamba) package manager, which -if you don't have it yet- needs to be installed in
7475
the `base` conda environment with:
7576

7677
```bash
@@ -98,7 +99,7 @@ conda activate rcrunch
9899
```
99100

100101
### 5. Test
101-
To ensure that the version is working properly you can test it by:
102+
To ensure that the code is working properly you can test it by:
102103
1. Activate the Conda environment with:
103104
```bash
104105
conda activate rcrunch
@@ -114,22 +115,25 @@ or for **SLURM** workload manager,
114115
bash test/test_singularity_execution/test_slurm.sh
115116
```
116117

117-
### 6. Execution of RCRUNCH
118118

119-
> ✨ For your convenience a pre-filled [config.yaml](config.yaml) file is available.
119+
### 6. Execution of RCRUNCH
120120

121-
> 💡 If you want to execute the dataset described in the pre-filled [config.yaml](config.yaml), you must fetch all the required files (clip samples, genome, attract, rna-central) by running: <br> `bash scripts/get_extra_annotation.sh` and then follow the steps described below
122121

123122
### 6a. Fill in the necessary config file
124123

125124
In order to run RCRUNCH, please fill in the organism related data and the experiment-dependent parameters for the different samples in the file `config.yaml`.
126125

127-
> ✨ You can replace the files and values accordingly. Leave as is if you want to execute the full dataset example.
126+
> ✨ For your convenience a [pre-filled config.yaml](config.yaml) file is available, based on a real example. This can be adapted accordingly.
127+
If you want to execute RCRUNCH with this config as is you need to download the required files by running:
128+
```
129+
bash get_extra_annotation.sh
130+
```
128131

129132

130133
### 6b. Dry run and DAG generation (optional)
131134

132-
You can generate a dry run of the pipeline by running:
135+
You can perform a test execution of the pipeline without producing any actual results (referred to as dry run) by running:
136+
133137
```bash
134138
snakemake \
135139
-np \
@@ -138,6 +142,8 @@ snakemake \
138142
--configfile config.yaml
139143
```
140144

145+
This will show the jobs that will be called upon actual execution.
146+
141147
A directed acyclic graph (dag) of the run can be generated by running:
142148
```bash
143149
snakemake --dag -np --use-singularity | dot -Tpng > dag.png

config.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@
119119
dup_type: "umis",
120120
# types of duplicates to be removed
121121
# options: "umis", "duplicates", "with_duplicates"
122-
# the latter option means that no deduplication occurs
122+
# the latter option means that no deduplication is performed
123123
mate1_3p: "input_files/mate1_3p.fasta",
124124
# fasta file containing adapters of 3p of mate1 to be removed
125125
# multiple adapters are used here as described in analysis
@@ -166,7 +166,7 @@
166166
fragment_size: 80
167167
# estimated fragment size of the experiment
168168

169-
#________RCRUNCH specific options - DEFAULTS - Use with care__________________
169+
#___RCRUNCH specific options - DEFAULTS - Be careful of the values you use ___
170170

171171
seq_type: "pe"
172172
# paired end - currently only paired-end seq are allowed
@@ -180,6 +180,8 @@
180180
# number of randomisations of 'training' and 'test' sets
181181
# in the cross-validation of motif enrichment
182182
# the more randomisations the more reliable the result
183+
# too many runs will make RCRUNCH slower
184+
# a number ~2-10 is suggested
183185
random_sequences_per_peak: 20
184186
# used in the background creation during the motif analysis step
185187

0 commit comments

Comments
 (0)