Skip to content
Samantha edited this page Nov 20, 2020 · 24 revisions

Welcome to the iCLIP wiki!

Getting Started

You'll need to prepare your local filesystem by cloning the github repository for iCLIP and loading Snakemake.

  1. Clone the github repository to your local filesystem.
# Clone Repository from Github
git clone https://github.com/RBL-NCI/iCLIP.git

# Change your working directory to the iCLIP repo
cd iCLIP/
  1. Load Snakemake to your environment.
# Recommend running snakemake>=5.19
module load snakemake/5.24.1

Preparing Config and Manifests

There are three input requirements for this pipeline, that must be found in the iCLIP/config directory. These files are:

  1. config.yaml - this file will contain directory paths and user parameters for analysis.

    • source_dir: path to snakemake file, within the cloned iCLIP repostiory; example: '/path/to/iCLIP/workflow'
    • out_dir: path to created output directory, where output will be stored; example: '/path/to/output/'
    • multiplex_manifest: path to multiplex manifest (see specific details below; example: '/path/to/multiplex_manifest.tsv'
    • sample_manifest: path to multiplex manifest (see specific details below; example:'/path/to/sample_manifest.tsv'
    • fastq_dir: path to gzipped multiplexed fastq files; example: '/path/to/raw/fastq/files'
    • novoalign_reference: selection of reference database ['hg38', 'mm10']; example: 'mm10'
    • minimum_count: integer value, of the minimum number of peaks for count; example: 2
  2. multiplex_manifest.tsv - this manifest will include information to map fastq files to their multiple sample ID

    • file_name: the full file name of the multiplexed sample, which must be unique; example: 'SIM_iCLIP_S1.R1_001.fastq'
    • multiplex: the multiplexID associated the fastq file, which must be unique. These names must match the multiplex column of the sample_manifest.tsv file. example: 'SIM_iCLIP_S1'
    An example multplex_manifest.tsv file:
    
    file_name                        multiplex
    SIM_iCLIP_S1.R1_001.fastq        SIM_iCLIP_S1
    
  3. samples_manifest.tsv

     - multiplex: the multiplexID associatd with the fasta file, and will not be unique. These names must match the multipex columno of the multiplex_manifest.tsv file. example: 'SIM_iCLIP_S1'
     - sample: the final sample name; this must be unique
     - barcode: the barcode to identify multiplexed sample; this must be unique per each multiplex sample name but can repeat between samples
     - adaptor: the adaptor used, to be removed from sample; this may or may not be unique
     - group: CNTRL must be used for control samples, but any other group designation (alpha numeric) is accepted
    
     An example sample.tsv file:
         multiplex       sample          group       barcode     adaptor
         SIM_iCLIP_S1    Ro_Clip         CLIP        NNNTGGCNN   AGATCGGAAGAGCGGTTCAG
         SIM_iCLIP_S1    Control_Clip    CNTRL       NNNCGGANN   AGATCGGAAGAGCGGTTCAG
    

Clone this wiki locally