Skip to content

Latest commit

 

History

History
153 lines (106 loc) · 6.45 KB

File metadata and controls

153 lines (106 loc) · 6.45 KB

Documentation

This page describes parameters to run the guide design pipeline


generate consensus.R

Generate target genome as consensus sequence of sampled sequenced genomes.

  • --input, -i: file path to aligned sequences
  • --type, -t: whether input is a CLUSTAL .aln alignment or FASTA file
  • --name, -n: name of newly generated consensus sequence
  • --output, -o: file path to write output file

OUTPUT:

  • FASTA file

break_genome.R

Generate all possible target windows from input target genome.

  • --input, -i: file path to target sequence (FASTA file format)
  • --window, -w: size of target window (default = 20nt)
  • --enzyme, -e: Cas enzyme for detection; currently only supports Cas13a or Cas12
  • --pfs_length, -p: length of PFS/PAM sequence to evaluate (default = 4nt)
  • --genome-strand, -g: whether input sequence is + or - strand
  • --strand, -s: whether to target + or - strand
  • --out, -o: file path to write output file

OUTPUT:

  • targets.txt: target sequences, one line per target
  • targets.fa: target sequences, FASTA format
  • spacers.txt: spacer sequences, one line per target
  • windows.txt: table of position, target sequence, spacer sequence, strand, PFS or PAM, %GC, and %A

If --enzyme Cas13a is specified, break_genome.R will return all possible target windows specified by --strand, as well as a 4nt protospacer flanking sequence (PFS).

If --enzyme Cas12 is specified, break_genome.R will return all possible target windows on both + and - strands, as well as a 4nt protospacer adjacent motif (PAM).

generate_pairwise_alignments.R

Generate pairwise alignments of sampled genomes to target genome.

  • --genome, -g: file path to input target genome (FASTA file format)
  • --input, -i: file path to sampled genomes (FASTA file format)
  • --num_cores, -n: number of cores to parallelize over (default n = 1)
  • --chunk_size, -c: number of sampled genomes to align at a time for parallelization (default c = 10)
  • --align_type, -a: alignment "type" argument for Biostrings::pairwiseAlignment() (default a = "global")
  • --output, -o: file path to write aligned sequences to

OUTPUT:

  • text matrix of aligned sequences

score_sensitivity.R

Calculate sensitivity (%targeted of sampled genomes, allowing 0 or 1 mismatches) per target window.

Requires pairwise alignments, as generated by generate_pairwise_alignments.R.

  • --genome, -g: file path to target sequence (FASTA file format)
  • --input, -i: file path to text file containing pairwise alignments of sampled genomes to target sequence
  • --window, -w: size of target window (default = 20nt)
  • --out, -o: file path to write output file

OUTPUT:

  • score_sensitivity.txt: table of position, strand, # genomes with 0 mismatches, # genomes with 1 mismatch, # genomes with 2 mismatches, sensitivity allowing 0 mismatches, sensitivity

score_multisegment_sensitivity.R

Calculate sensitivity (%targeted of sampled genomes) per target window.

  • --file, -f: file path header to alignment files
  • --type, -t: whether input is a CLUSTAL .aln alignment or FASTA file
  • --segments, -s: number of genome segments
  • --mismatch, -m: number of mismatches tolerated for targeting crRNAs
  • --window, -w: size of target window (default = 20nt)
  • --out, -o: file path to write output file

If target genome contains multiple segments, score_multisegment_sensitivity.R assumes input alignment filenames contain their segment numbers (ex. influenzaA3.aln)

OUTPUT:

  • score_sensitivity.txt: table of segment, position, strand, and sensitivity

score_human_CoV_specificity.R

Calculate specificity (1 - %targeted of human coronaviruses) per target window.

  • --genome, -g: bowtie index prefix
  • --enzyme, -e: Cas enzyme for detection
  • --mismatch, -m: number of mismatches tolerated for targeting crRNAs
  • --num_human_CoV, -n: number of other human coronaviruses (denominator for specificity)
  • --bowtie, -b: file path to system installation/executable of bowtie
  • --out, -o: file path to write output file

If --enzyme Cas13a is specified, bowtie will not search for reverse complement alignments.

Requires targets.fa and windows.txt to be in out directory.

OUTPUT:

  • score_human_CoV_specificity.txt: table of segment, position, strand, # human coronaviruses targeted, specificity
  • bowtie_<genome>_unmapped.fa: FASTA file of unaligned target windows
  • bowtie_<genome>_mapped.sam: SAM alignment file of aligned target windows
  • bowtie_<genome>_mapped.bowtiestats: bowtie alignment report

score_RNAfold_crRNAs.R

Compute crRNA folding structures.

  • --enzyme, -e: Cas enzyme for detection
  • --cas_repeat, -c: crRNA repeat sequence
  • --spacer, -s: example "good" spacer to evaluate crRNA hairpin structure
  • --rnafold, -r: file path to system installation/executable of RNAfold
  • --out, -o: file path to write output file

Requires spacers.txt and windows.txt to be in out directory.

OUTPUT:

  • crRNAs.txt: crRNA sequences, one per line
  • crRNAs_RNAfold.txt: RNAfold output from processing crRNAs.txt
  • score_RNAfold_crRNAs.txt: table of segment, position, strand, crRNA sequence, crRNA secondary structure, MFE, whether repeat sequence matches hairpin structure of "good" spacer, # basepaired positions in spacer

align_bowtie.R

Calculate number of alignments to offtarget genome

  • --genome, -g: bowtie index prefix
  • --enzyme, -e: Cas enzyme for detection
  • --mismatch, -m: number of mismatches tolerated for targeting crRNAs
  • --omit, -v: filepath to text file of transcripts to omit, one per line
  • --bowtie, -b: file path to system installation/executable of bowtie
  • --out, -o: file path to write output file

If --enzyme Cas13a is specified, bowtie will not search for reverse complement alignments.

Requires bowtie index to be located in /ref_data

OUTPUT:

  • alignment_cts_<genome>.txt: segment, position, strand, # alignments
  • bowtie_<genome>_unmapped.fa: FASTA file of unaligned target windows
  • bowtie_<genome>_mapped.sam: SAM alignment file of aligned target windows
  • bowtie_<genome>_mapped.bowtiestats: bowtie alignment report

helper.R

Contains add_column() function to compile summary table.

Example: ../outputs/covid/cas13a_20nt/compile_scores.R

  • dat: data.frame containing columns "start" and "strand
  • fname: file path to file containing metric to be added
  • dat_column: name of column to be added/replaced in dat
  • fname_column: name of column in fname to be pulled