This page describes parameters to run the guide design pipeline
Generate target genome as consensus sequence of sampled sequenced genomes.
--input, -i: file path to aligned sequences--type, -t: whetherinputis a CLUSTAL.alnalignment or FASTA file--name, -n: name of newly generated consensus sequence--output, -o: file path to write output file
OUTPUT:
- FASTA file
Generate all possible target windows from input target genome.
--input, -i: file path to target sequence (FASTA file format)--window, -w: size of target window (default = 20nt)--enzyme, -e: Cas enzyme for detection; currently only supportsCas13aorCas12--pfs_length, -p: length of PFS/PAM sequence to evaluate (default = 4nt)--genome-strand, -g: whether input sequence is+or-strand--strand, -s: whether to target+or-strand--out, -o: file path to write output file
OUTPUT:
targets.txt: target sequences, one line per targettargets.fa: target sequences, FASTA formatspacers.txt: spacer sequences, one line per targetwindows.txt: table of position, target sequence, spacer sequence, strand, PFS or PAM, %GC, and %A
If --enzyme Cas13a is specified, break_genome.R will return all possible target windows specified by --strand, as well as a 4nt protospacer flanking sequence (PFS).
If --enzyme Cas12 is specified, break_genome.R will return all possible target windows on both + and - strands, as well as a 4nt protospacer adjacent motif (PAM).
Generate pairwise alignments of sampled genomes to target genome.
--genome, -g: file path to input target genome (FASTA file format)--input, -i: file path to sampled genomes (FASTA file format)--num_cores, -n: number of cores to parallelize over (default n = 1)--chunk_size, -c: number of sampled genomes to align at a time for parallelization (default c = 10)--align_type, -a: alignment "type" argument for Biostrings::pairwiseAlignment() (default a = "global")--output, -o: file path to write aligned sequences to
OUTPUT:
- text matrix of aligned sequences
Calculate sensitivity (%targeted of sampled genomes, allowing 0 or 1 mismatches) per target window.
Requires pairwise alignments, as generated by generate_pairwise_alignments.R.
--genome, -g: file path to target sequence (FASTA file format)--input, -i: file path to text file containing pairwise alignments of sampled genomes to target sequence--window, -w: size of target window (default = 20nt)--out, -o: file path to write output file
OUTPUT:
score_sensitivity.txt: table of position, strand, # genomes with 0 mismatches, # genomes with 1 mismatch, # genomes with 2 mismatches, sensitivity allowing 0 mismatches, sensitivity
Calculate sensitivity (%targeted of sampled genomes) per target window.
--file, -f: file path header to alignment files--type, -t: whetherinputis a CLUSTAL.alnalignment or FASTA file--segments, -s: number of genome segments--mismatch, -m: number of mismatches tolerated for targeting crRNAs--window, -w: size of target window (default = 20nt)--out, -o: file path to write output file
If target genome contains multiple segments, score_multisegment_sensitivity.R assumes input alignment filenames contain their segment numbers (ex. influenzaA3.aln)
OUTPUT:
score_sensitivity.txt: table of segment, position, strand, and sensitivity
Calculate specificity (1 - %targeted of human coronaviruses) per target window.
--genome, -g: bowtie index prefix--enzyme, -e: Cas enzyme for detection--mismatch, -m: number of mismatches tolerated for targeting crRNAs--num_human_CoV, -n: number of other human coronaviruses (denominator for specificity)--bowtie, -b: file path to system installation/executable of bowtie--out, -o: file path to write output file
If --enzyme Cas13a is specified, bowtie will not search for reverse complement alignments.
Requires targets.fa and windows.txt to be in out directory.
OUTPUT:
score_human_CoV_specificity.txt: table of segment, position, strand, # human coronaviruses targeted, specificitybowtie_<genome>_unmapped.fa: FASTA file of unaligned target windowsbowtie_<genome>_mapped.sam: SAM alignment file of aligned target windowsbowtie_<genome>_mapped.bowtiestats:bowtiealignment report
Compute crRNA folding structures.
--enzyme, -e: Cas enzyme for detection--cas_repeat, -c: crRNA repeat sequence--spacer, -s: example "good" spacer to evaluate crRNA hairpin structure--rnafold, -r: file path to system installation/executable of RNAfold--out, -o: file path to write output file
Requires spacers.txt and windows.txt to be in out directory.
OUTPUT:
crRNAs.txt: crRNA sequences, one per linecrRNAs_RNAfold.txt: RNAfold output from processingcrRNAs.txtscore_RNAfold_crRNAs.txt: table of segment, position, strand, crRNA sequence, crRNA secondary structure, MFE, whether repeat sequence matches hairpin structure of "good" spacer, # basepaired positions in spacer
Calculate number of alignments to offtarget genome
--genome, -g: bowtie index prefix--enzyme, -e: Cas enzyme for detection--mismatch, -m: number of mismatches tolerated for targeting crRNAs--omit, -v: filepath to text file of transcripts to omit, one per line--bowtie, -b: file path to system installation/executable of bowtie--out, -o: file path to write output file
If --enzyme Cas13a is specified, bowtie will not search for reverse complement alignments.
Requires bowtie index to be located in /ref_data
OUTPUT:
alignment_cts_<genome>.txt: segment, position, strand, # alignmentsbowtie_<genome>_unmapped.fa: FASTA file of unaligned target windowsbowtie_<genome>_mapped.sam: SAM alignment file of aligned target windowsbowtie_<genome>_mapped.bowtiestats:bowtiealignment report
Contains add_column() function to compile summary table.
Example: ../outputs/covid/cas13a_20nt/compile_scores.R
dat: data.frame containing columns "start" and "strandfname: file path to file containing metric to be addeddat_column: name of column to be added/replaced indatfname_column: name of column infnameto be pulled