Skip to content

Desktop Application

Damien Farrell edited this page Jan 22, 2025 · 9 revisions

This page describes the GUI layout and functionality.

Layout

The application layout shown below contains the main sample table in the centre pane with dock panes containing other elements like the log window or plugins. These panes can be dragged to any side or undocked. The log window at the bottom here will show output from running processes including information when errors occur.

Usage

  • The first step is to always set an output folder for the results. Choose Settings->Set Output Folder. Note that if this folder already contains a set of results from a previous run of the command line tool the program will load that sample table and use whatever output files are in that folder from the previous run. If you move the output folder that has to be re-loaded.
  • You must then set a reference genome either by using a preset species or loading a fasta sequence you have previously identified as the one you wish to use. To load a preset use the Preset Genomes menu. The reference should currently be a single chromosome. Preset genomes have an annotation (genbank format) and sometimes a mask file (bed format) associated with them. However you can run without these.
  • Save the project. It will be saved a single file with a .snipgenie extension. This saves the loaded tables and settings and some plugin results. Result files in the output folder aren't saved inside the project.
  • Load the fastq files you wish to analyse. These can be selected individually or and entire folder added. Use File-> Add Folder or File->Add Fastq Files. When the files are loaded the samples table will be updated to reflect the files and their assigned labels. See importing files.
  • Once files are loaded you can begin analysis. Prior to alignment you may want to check your files for contamination, though this can be done at a later stage to exclude samples causing problems (e.g. samples that have poor depth).
  • Note that you can run the whole workflow in one go using Workflow->Run Workflow. The alignment and calling steps can also be done separately for convenience. The reason is that you might want to run just a few files initially if dealing with a large dataset. The first step is obviously to align the fastq files. You should select the files in the table to be aligned. Just select all rows to align everything.
  • After alignment you will see the table updated to reflect the bam files created.
  • Rows in the table can be selected at any time. If you right click on the table you will get some functions you can run on the selected rows. Quality checks similar to fastqc (though more basic) can be performed on any sample.
  • Removing samples - Samples can be removed or added at any time. When you do this the bam and variant calling files will also be removed. You will need to re-run variant calling to get a new merged vcf. Or just re-run the whole workflow again.

Example

Here is a screen grab of some of the basic workflow steps.

Generate a test project

If you just want to try the program without your own data you make simulated data, run then load as a project. Use Settings->Create Test Data. This will take a few moments to generate the fastq files and then align and call.

Viewing results tables

A number of the tabular outputs can be viewed inside the application when they are available.

SNP table

This is a view of the SNPs in a table for all samples (rows) and at each position (columns). This is loaded from the core.txt file in results that is the product of filtered SNPs. This is what is used to make the final phylogeny. Below is shown the table with positions in the columns and each row is a sample.

Viewing alignments

You can right click on any cell to view the alignments for that sample at the position of the snp (column). If there is a variant it will be visible as a letter present in every read where it differs from the reference. This is a fairly rudimentary text based view. For better graphical views you should use igv.

SNP distance table

This shows the distance matrix of samples derived from the core SNP alignment above. The samples can be sorted by their order in the phylogeny stored in the project.

VCF table

This lets you view the content of vcf files that are the product of variant calls. Normally useful for debugging errors that might be occurring or checking on the depth and quality values for a position/site. By default the filtered SNPs vcf will be displayed but you can load other vcf/bcf files from the file system.

CSQ table

This is a table showing the contents of the csq.tsv file that is calculated as from the consequence calling step. This file is only present if we have provided a genbank annotation file when running the variant calling. This shows the effects of each identified SNP in terms of their amino acid changes in the annotated proteins along the genome. SNPs in intergenic regions or non-coding genes won't show up here.

View tree

A maximum likelihood tree built from the SNP alignment (with RAxML or fasttree if installed) can be viewed.

Simple plots from the table data

Simple graphical (bar, scatter or histogram) plots can be made from numerical data in the table e.g. to plot percent coverage across samples. The plots are made using the right hand toolbar that is present in some of the tables.

Adding sample metadata

You can merge sample meta data into the samples table from the Settings->merge meta data. This could be useful if you have another table that identifies the samples in some meaningful way and you want to combine this with the results. You can also use this information to colour the phylogenetic tree if you have made one. Or you can simply take the merged table and use it elsewhere.

Clone this wiki locally