-
Notifications
You must be signed in to change notification settings - Fork 0
Command Line
Run snipgenie for the cli or snipgenie-gui for the desktop version. You require a reference genome and reads in fastq format at minimum as input.
This will run the entire process based on a set of options given at the terminal::
-h, --help show this help message and exit
-i FILE, --input FILE
input folder(s)
-M FILE, --manifest FILE
manifest file with samples, optional - overrides input
-r FILE, --reference FILE
reference genome filename
-S SPECIES, --species SPECIES
set the species reference genome, overrides -r. possible values are
Mbovis-AF212297, MTB-H37Rv, MAP-K10, M.smegmatis-
MC2155, Mycoplasmabovis-PG45, Sars-Cov-2
-g FILE, --genbank_file FILE
annotation file, optional
-t THREADS, --threads THREADS
cpu threads to use
-sep LABELSEP, --labelsep LABELSEP
symbol to split the sample labels on if parsing filenames
-x LABELINDEX, --labelindex LABELINDEX
position to extract label in split filenames
-w, --overwrite overwrite intermediate files
-U, --unmapped whether to save unmapped reads
-Q QUALITY, --quality QUALITY
right trim quality, default 25
-f FILTERS, --filters FILTERS
variant calling post-filters
-m MASK, --mask MASK supply mask regions from a bed file
-pf PROXIMITY, --proximity PROXIMITY
proximity filter value, set 0 to not apply filter
-u, --uninformative keep uninformative sites when calling variants
-p PLATFORM, --platform PLATFORM
sequencing platform, change to ont if using oxford nanopore
-a ALIGNER, --aligner ALIGNER
aligner to use, bwa, subread, bowtie or minimap2
-b, --buildtree whether to build a phylogenetic tree, requires RaXML
-N BOOTSTRAPS, --bootstraps BOOTSTRAPS
number of bootstraps to build tree
-o FILE, --outdir FILE
Results folder
-old, --old_method use old calling method
-q, --qc QC report
-s, --stats Calculate read length and mapping stats
-d, --dummy Check samples but don't run
-T, --test Test run
-v, --version Get version
Call with your own reference fasta file:
snipgenie -r reference.fa -i data_files -o results
Use an in built species genome as reference. This will also supply an annotation file. The current options are Mbovis-AF212297, MTB-H37Rv, MAP-K10, M.smegmatis-MC2155:
snipgenie -S Mbovis-AF212297 -i data_files -o results
Provide more than one folder:
snipgenie -r reference.fa -i data_files1 -i data_files2 -o results
Provide an annotation (genbank format) for consequence calling:
snipgenie -r reference.fa -g reference.gb -i data_files -o results
Add your own filters and provide threads:
snipgenie -r reference.fa -i data_files -t 8 -o results` \
-f 'QUAL>=40 && INFO/DP>=20 && MQ>40'
You can use any one of the following aligners: bwa, subread, bowtie or minimap2. These should be present on your system, unless using the Windows version. Note that for oxford nanopore reads you should use minimap2 and specify the platform as 'ont'.
You can selectively mask snp sites such as those contained in transposons or repetitive regions from being included in the output. You need to provide a bed file with the following columns: chromosome name, start and end coordinates of the regions. There is currently a built-in mask file used for M.bovis and of you select this genome as reference using the --species option it will be used automatically.
LT708304.1 105359 106751
LT708304.1 131419 132910
LT708304.1 149570 151187
LT708304.1 306201 307872
By default a filter is run that excludes any variant positions within a given distance of each other. This is to prevent false positives resulting from alignment artifacts. The default value is 10. You can set the value using the -pf option. Use 0 to ignore the filter.
The older method was to split the genome into chunks and run bcftools mpileup for all samples together, then concatenate the result into one large bcf file. This file was then used for bcftools call. This method has been replaced by the more typical approach of running mpileup on individual files, running the calling, then merging the results with bcftools merge. In both cases filters are applied at the end. The reason for the change is that the old method was inflexible for very large numbers of samples because the entire mpileup has to be re-run when new samples are added. The old method is still available by specifying -O at the command line.