Skip to content

Command Line

Damien Farrell edited this page Jan 11, 2025 · 2 revisions

Usage

Run snipgenie for the cli or snipgenie-gui for the desktop version. You require a reference genome and reads in fastq format at minimum as input.

Command line options

This will run the entire process based on a set of options given at the terminal::

  -h, --help            show this help message and exit
  -i FILE, --input FILE
                        input folder(s)
  -M FILE, --manifest FILE
                        manifest file with samples, optional - overrides input
  -r FILE, --reference FILE
                        reference genome filename
  -S SPECIES, --species SPECIES
                        set the species reference genome, overrides -r. possible values are
                         Mbovis-AF212297, MTB-H37Rv, MAP-K10, M.smegmatis-
                        MC2155, Mycoplasmabovis-PG45, Sars-Cov-2
  -g FILE, --genbank_file FILE
                        annotation file, optional
  -t THREADS, --threads THREADS
                        cpu threads to use
  -sep LABELSEP, --labelsep LABELSEP
                        symbol to split the sample labels on if parsing filenames
  -x LABELINDEX, --labelindex LABELINDEX
                        position to extract label in split filenames
  -w, --overwrite       overwrite intermediate files
  -U, --unmapped        whether to save unmapped reads
  -Q QUALITY, --quality QUALITY
                        right trim quality, default 25
  -f FILTERS, --filters FILTERS
                        variant calling post-filters
  -m MASK, --mask MASK  supply mask regions from a bed file
  -pf PROXIMITY, --proximity PROXIMITY
                        proximity filter value, set 0 to not apply filter
  -u, --uninformative   keep uninformative sites when calling variants
  -p PLATFORM, --platform PLATFORM
                        sequencing platform, change to ont if using oxford nanopore
  -a ALIGNER, --aligner ALIGNER
                        aligner to use, bwa, subread, bowtie or minimap2
  -b, --buildtree       whether to build a phylogenetic tree, requires RaXML
  -N BOOTSTRAPS, --bootstraps BOOTSTRAPS
                        number of bootstraps to build tree
  -o FILE, --outdir FILE
                        Results folder
  -old, --old_method    use old calling method
  -q, --qc              QC report
  -s, --stats           Calculate read length and mapping stats
  -d, --dummy           Check samples but don't run
  -T, --test            Test run
  -v, --version         Get version

Examples

Call with your own reference fasta file:

snipgenie -r reference.fa -i data_files -o results

Use an in built species genome as reference. This will also supply an annotation file. The current options are Mbovis-AF212297, MTB-H37Rv, MAP-K10, M.smegmatis-MC2155:

snipgenie -S Mbovis-AF212297 -i data_files -o results

Provide more than one folder:

snipgenie -r reference.fa -i data_files1 -i data_files2 -o results

Provide an annotation (genbank format) for consequence calling:

snipgenie -r reference.fa -g reference.gb -i data_files -o results

Add your own filters and provide threads:

snipgenie -r reference.fa -i data_files -t 8 -o results` \
 -f 'QUAL>=40 && INFO/DP>=20 && MQ>40'

Aligners

You can use any one of the following aligners: bwa, subread, bowtie or minimap2. These should be present on your system, unless using the Windows version. Note that for oxford nanopore reads you should use minimap2 and specify the platform as 'ont'.

Mask file

You can selectively mask snp sites such as those contained in transposons or repetitive regions from being included in the output. You need to provide a bed file with the following columns: chromosome name, start and end coordinates of the regions. There is currently a built-in mask file used for M.bovis and of you select this genome as reference using the --species option it will be used automatically.

LT708304.1 	 105359 	 106751
LT708304.1 	 131419 	 132910
LT708304.1 	 149570 	 151187
LT708304.1 	 306201 	 307872

Proximity filter

By default a filter is run that excludes any variant positions within a given distance of each other. This is to prevent false positives resulting from alignment artifacts. The default value is 10. You can set the value using the -pf option. Use 0 to ignore the filter.

Variant calling method

The older method was to split the genome into chunks and run bcftools mpileup for all samples together, then concatenate the result into one large bcf file. This file was then used for bcftools call. This method has been replaced by the more typical approach of running mpileup on individual files, running the calling, then merging the results with bcftools merge. In both cases filters are applied at the end. The reason for the change is that the old method was inflexible for very large numbers of samples because the entire mpileup has to be re-run when new samples are added. The old method is still available by specifying -O at the command line.

Clone this wiki locally