-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Samantha edited this page Nov 20, 2020
·
24 revisions
Welcome to the iCLIP wiki!
You'll need to prepare your local filesystem by cloning the github repository for iCLIP and loading Snakemake.
- Clone the github repository to your local filesystem.
# Clone Repository from Github
git clone https://github.com/RBL-NCI/iCLIP.git
# Change your working directory to the iCLIP repo
cd iCLIP/
- Load Snakemake to your environment.
# Recommend running snakemake>=5.19
module load snakemake/5.24.1There are three input requirements for this pipeline, that must be found in the iCLIP/config directory. These files are:
-
config.yaml - this file will contain directory paths and user parameters for analysis.
- source_dir: path to snakemake file, within the cloned iCLIP repostiory; example: '/path/to/iCLIP/workflow'
- out_dir: path to created output directory, where output will be stored; example: '/path/to/output/'
- multiplex_manifest: path to multiplex manifest (see specific details below; example: '/path/to/multiplex_manifest.tsv'
- sample_manifest: path to multiplex manifest (see specific details below; example:'/path/to/sample_manifest.tsv'
- fastq_dir: path to gzipped multiplexed fastq files; example: '/path/to/raw/fastq/files'
- novoalign_reference: selection of reference database ['hg38', 'mm10']; example: 'mm10'
- minimum_count: integer value, of the minimum number of peaks for count; example: 2
-
multiplex_manifest.tsv - this manifest will include information to map fastq files to their multiple sample ID
- file_name: the full file name of the multiplexed sample, which must be unique; example: 'SIM_iCLIP_S1.R1_001.fastq'
- multiplex: the multiplexID associated the fastq file, which must be unique. These names must match the multiplex column of the sample_manifest.tsv file. example: 'SIM_iCLIP_S1'
An example multplex_manifest.tsv file: file_name multiplex SIM_iCLIP_S1.R1_001.fastq SIM_iCLIP_S1 -
samples_manifest.tsv
- multiplex: the multiplexID associatd with the fasta file, and will not be unique. These names must match the multipex columno of the multiplex_manifest.tsv file. example: 'SIM_iCLIP_S1' - sample: the final sample name; this must be unique - barcode: the barcode to identify multiplexed sample; this must be unique per each multiplex sample name but can repeat between samples - adaptor: the adaptor used, to be removed from sample; this may or may not be unique - group: CNTRL must be used for control samples, but any other group designation (alpha numeric) is accepted An example sample.tsv file: multiplex sample group barcode adaptor SIM_iCLIP_S1 Ro_Clip CLIP NNNTGGCNN AGATCGGAAGAGCGGTTCAG SIM_iCLIP_S1 Control_Clip CNTRL NNNCGGANN AGATCGGAAGAGCGGTTCAG