Documentation

This page describes parameters to run the guide design pipeline

`generate consensus.R`

Generate target genome as consensus sequence of sampled sequenced genomes.

--input, -i: file path to aligned sequences
--type, -t: whether input is a CLUSTAL .aln alignment or FASTA file
--name, -n: name of newly generated consensus sequence
--output, -o: file path to write output file

OUTPUT:

FASTA file

`break_genome.R`

Generate all possible target windows from input target genome.

--input, -i: file path to target sequence (FASTA file format)
--window, -w: size of target window (default = 20nt)
--enzyme, -e: Cas enzyme for detection; currently only supports Cas13a or Cas12
--pfs_length, -p: length of PFS/PAM sequence to evaluate (default = 4nt)
--genome-strand, -g: whether input sequence is + or - strand
--strand, -s: whether to target + or - strand
--out, -o: file path to write output file

OUTPUT:

targets.txt: target sequences, one line per target
targets.fa: target sequences, FASTA format
spacers.txt: spacer sequences, one line per target
windows.txt: table of position, target sequence, spacer sequence, strand, PFS or PAM, %GC, and %A

If --enzyme Cas13a is specified, break_genome.R will return all possible target windows specified by --strand, as well as a 4nt protospacer flanking sequence (PFS).

If --enzyme Cas12 is specified, break_genome.R will return all possible target windows on both + and - strands, as well as a 4nt protospacer adjacent motif (PAM).

`generate_pairwise_alignments.R`

Generate pairwise alignments of sampled genomes to target genome.

--genome, -g: file path to input target genome (FASTA file format)
--input, -i: file path to sampled genomes (FASTA file format)
--num_cores, -n: number of cores to parallelize over (default n = 1)
--chunk_size, -c: number of sampled genomes to align at a time for parallelization (default c = 10)
--align_type, -a: alignment "type" argument for Biostrings::pairwiseAlignment() (default a = "global")
--output, -o: file path to write aligned sequences to

OUTPUT:

text matrix of aligned sequences

`score_sensitivity.R`

Calculate sensitivity (%targeted of sampled genomes, allowing 0 or 1 mismatches) per target window.

Requires pairwise alignments, as generated by generate_pairwise_alignments.R.

--genome, -g: file path to target sequence (FASTA file format)
--input, -i: file path to text file containing pairwise alignments of sampled genomes to target sequence
--window, -w: size of target window (default = 20nt)
--out, -o: file path to write output file

OUTPUT:

score_sensitivity.txt: table of position, strand, # genomes with 0 mismatches, # genomes with 1 mismatch, # genomes with 2 mismatches, sensitivity allowing 0 mismatches, sensitivity

`score_multisegment_sensitivity.R`

Calculate sensitivity (%targeted of sampled genomes) per target window.

--file, -f: file path header to alignment files
--type, -t: whether input is a CLUSTAL .aln alignment or FASTA file
--segments, -s: number of genome segments
--mismatch, -m: number of mismatches tolerated for targeting crRNAs
--window, -w: size of target window (default = 20nt)
--out, -o: file path to write output file

If target genome contains multiple segments, score_multisegment_sensitivity.R assumes input alignment filenames contain their segment numbers (ex. influenzaA3.aln)

OUTPUT:

score_sensitivity.txt: table of segment, position, strand, and sensitivity

`score_human_CoV_specificity.R`

Calculate specificity (1 - %targeted of human coronaviruses) per target window.

--genome, -g: bowtie index prefix
--enzyme, -e: Cas enzyme for detection
--mismatch, -m: number of mismatches tolerated for targeting crRNAs
--num_human_CoV, -n: number of other human coronaviruses (denominator for specificity)
--bowtie, -b: file path to system installation/executable of bowtie
--out, -o: file path to write output file

If --enzyme Cas13a is specified, bowtie will not search for reverse complement alignments.

Requires targets.fa and windows.txt to be in out directory.

OUTPUT:

score_human_CoV_specificity.txt: table of segment, position, strand, # human coronaviruses targeted, specificity
bowtie_<genome>_unmapped.fa: FASTA file of unaligned target windows
bowtie_<genome>_mapped.sam: SAM alignment file of aligned target windows
bowtie_<genome>_mapped.bowtiestats: bowtie alignment report

`score_RNAfold_crRNAs.R`

Compute crRNA folding structures.

--enzyme, -e: Cas enzyme for detection
--cas_repeat, -c: crRNA repeat sequence
--spacer, -s: example "good" spacer to evaluate crRNA hairpin structure
--rnafold, -r: file path to system installation/executable of RNAfold
--out, -o: file path to write output file

Requires spacers.txt and windows.txt to be in out directory.

OUTPUT:

crRNAs.txt: crRNA sequences, one per line
crRNAs_RNAfold.txt: RNAfold output from processing crRNAs.txt
score_RNAfold_crRNAs.txt: table of segment, position, strand, crRNA sequence, crRNA secondary structure, MFE, whether repeat sequence matches hairpin structure of "good" spacer, # basepaired positions in spacer

`align_bowtie.R`

Calculate number of alignments to offtarget genome

--genome, -g: bowtie index prefix
--enzyme, -e: Cas enzyme for detection
--mismatch, -m: number of mismatches tolerated for targeting crRNAs
--omit, -v: filepath to text file of transcripts to omit, one per line
--bowtie, -b: file path to system installation/executable of bowtie
--out, -o: file path to write output file

If --enzyme Cas13a is specified, bowtie will not search for reverse complement alignments.

Requires bowtie index to be located in /ref_data

OUTPUT:

alignment_cts_<genome>.txt: segment, position, strand, # alignments
bowtie_<genome>_unmapped.fa: FASTA file of unaligned target windows
bowtie_<genome>_mapped.sam: SAM alignment file of aligned target windows
bowtie_<genome>_mapped.bowtiestats: bowtie alignment report

`helper.R`

Contains add_column() function to compile summary table.

Example: ../outputs/covid/cas13a_20nt/compile_scores.R

dat: data.frame containing columns "start" and "strand
fname: file path to file containing metric to be added
dat_column: name of column to be added/replaced in dat
fname_column: name of column in fname to be pulled

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation

`generate consensus.R`

`break_genome.R`

`generate_pairwise_alignments.R`

`score_sensitivity.R`

`score_multisegment_sensitivity.R`

`score_human_CoV_specificity.R`

`score_RNAfold_crRNAs.R`

`align_bowtie.R`

`helper.R`

FilesExpand file tree

readme.md

Latest commit

History

readme.md

File metadata and controls

Documentation

generate consensus.R

break_genome.R

generate_pairwise_alignments.R

score_sensitivity.R

score_multisegment_sensitivity.R

score_human_CoV_specificity.R

score_RNAfold_crRNAs.R

align_bowtie.R

helper.R

`generate consensus.R`

`break_genome.R`

`generate_pairwise_alignments.R`

`score_sensitivity.R`

`score_multisegment_sensitivity.R`

`score_human_CoV_specificity.R`

`score_RNAfold_crRNAs.R`

`align_bowtie.R`

`helper.R`