Skip to content

Commit 9992e02

Browse files
committed
Updated documentation
1 parent b7e8b61 commit 9992e02

File tree

4 files changed

+24
-8
lines changed

4 files changed

+24
-8
lines changed

docs/MANUAL.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,9 +79,14 @@ Puerto 1054 1076 400_3_out_R* 60 -
7979
Call variants with iVar
8080
----
8181

82-
iVar uses the output of the `samtools mpileup` command to call variants - single nucleotide variants(SNVs) and indels. In order to call variants correctly, the reference file used for alignment must be passed to iVar using the `-r` flag. The output of `samtools pileup` is piped into `ivar variants` to generate a .tsv file with the variants. There are two parameters that can be set for variant calling using iVar - minimum quality(Default: 20) and minimum frequency(Default: 0.03). Minimum quality is the minimum quality for a base to be counted towards the ungapped depth to canculate iSNV frequency at a given position. For insertions, the quality metric is discarded and the mpileup depth is used directly. Minimum frequency is the minimum frequency required for a SNV or indel to be reported. iVar can also identify codons and translate variants into amino acids using a GFF file in the [GFF3](https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md) format containing the required coding regions (CDS). In absence of a GFF file, iVar will not perform the translation and "NA" will be added to the output file in place of the reference and alternate codons and amino acids. The GFF file in the GFF3 format can be downloaded via ftp from NCBI RefSeq/Genbank. They are usually the files with the extension ".gff.gz". For example, the GFF file for Zaire Ebolavirus can be found [here](ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/viral/Zaire_ebolavirus/all_assembly_versions/GCF_000848505.1_ViralProj14703). More details on GFF3 files hosted by NCBI can be found in their ftp [FAQs](https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/).
82+
iVar uses the output of the `samtools mpileup` command to call variants - single nucleotide variants(SNVs) and indels. In order to call variants correctly, the reference file used for alignment must be passed to iVar using the `-r` flag. The output of `samtools pileup` is piped into `ivar variants` to generate a .tsv file with the variants. There are two parameters that can be set for variant calling using iVar - minimum quality(Default: 20) and minimum frequency(Default: 0.03). Minimum quality is the minimum quality for a base to be counted towards the ungapped depth to canculate iSNV frequency at a given position. For insertions, the quality metric is discarded and the mpileup depth is used directly. Minimum frequency is the minimum frequency required for a SNV or indel to be reported.
83+
#### Amino acid translation of iSNVs
8384

84-
Some RNA viruses such as Ebola virus, might have polymerase slippage causing the insertion of a couple of nucleotides. More details can be found here[https://viralzone.expasy.org/857?outline=all_by_protein]. iVar can account for this editing and identify the correct open reading frames. The user will have to specify two additional parameters: EditPosition and EditSequence in the "attributes" column of the GFF file to account for this. A test example is given here,
85+
iVar can identify codons and translate variants into amino acids using a GFF file in the [GFF3](https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md) format containing the required coding regions (CDS). In absence of a GFF file, iVar will not perform the translation and "NA" will be added to the output file in place of the reference and alternate codons and amino acids. The GFF file in the GFF3 format can be downloaded via ftp from NCBI RefSeq/Genbank. They are usually the files with the extension ".gff.gz". For example, the GFF file for Zaire Ebolavirus can be found [here](ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/viral/Zaire_ebolavirus/all_assembly_versions/GCF_000848505.1_ViralProj14703). More details on GFF3 files hosted by NCBI can be found in their ftp [FAQs](https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/).
86+
87+
#### Account for RNA editing through polymerase slippage
88+
89+
Some RNA viruses such as Ebola virus, might have polymerase slippage causing the insertion of a couple of nucleotides. More details can be found [here](https://viralzone.expasy.org/857?outline=all_by_protein). iVar can account for this editing and identify the correct open reading frames. The user will have to specify two additional parameters, **EditPosition**: Position at which edit occurs and **EditSequence**: The sequence tht is inserted at the given positon, in the "attributes" column of the GFF file to account for this. A test example is given below,
8590

8691
```
8792
test Genbank CDS 2 292 . + . ID=id-testedit1;Note=PinkFloyd;EditPosition=100;EditSequence=A

docs/html/manualpage.html

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,8 +117,10 @@ <h2><a class="anchor" id="autotoc_md15"></a>
117117
<p>Example BED file</p>
118118
<div class="fragment"><div class="line">Puerto 28 52 400_1_out_L 60 +</div><div class="line">Puerto 482 504 400_1_out_R 60 -</div><div class="line">Puerto 359 381 400_2_out_L 60 +</div><div class="line">Puerto 796 818 400_2_out_R 60 -</div><div class="line">Puerto 658 680 400_3_out_L* 60 +</div><div class="line">Puerto 1054 1076 400_3_out_R* 60 -</div><div class="line">.</div><div class="line">.</div><div class="line">.</div><div class="line">.</div></div><!-- fragment --><h2><a class="anchor" id="autotoc_md16"></a>
119119
Call variants with iVar</h2>
120-
<p>iVar uses the output of the <code>samtools mpileup</code> command to call variants - single nucleotide variants(SNVs) and indels. In order to call variants correctly, the reference file used for alignment must be passed to iVar using the <code>-r</code> flag. The output of <code>samtools pileup</code> is piped into <code>ivar variants</code> to generate a .tsv file with the variants. There are two parameters that can be set for variant calling using iVar - minimum quality(Default: 20) and minimum frequency(Default: 0.03). Minimum quality is the minimum quality for a base to be counted towards the ungapped depth to canculate iSNV frequency at a given position. For insertions, the quality metric is discarded and the mpileup depth is used directly. Minimum frequency is the minimum frequency required for a SNV or indel to be reported. iVar can also identify codons and translate variants into amino acids using a GFF file in the https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md "GFF3" format containing the required coding regions (CDS). In absence of a GFF file, iVar will not perform the translation and "NA" will be added to the output file in place of the reference and alternate codons and amino acids.</p>
121-
<p>Some RNA viruses such as Ebola virus, might have polymerase slippage causing the insertion of a couple of nucleotides. More details can be found here[<a href="https://viralzone.expasy.org/857?outline=all_by_protein">https://viralzone.expasy.org/857?outline=all_by_protein</a>]. iVar can account for this editing and identify the correct open reading frames. The user will have to specify two additional parameters: EditPosition and EditSequence in the "attributes" column of the GFF file to account for this. A test example is given here,</p>
120+
<p>iVar uses the output of the <code>samtools mpileup</code> command to call variants - single nucleotide variants(SNVs) and indels. In order to call variants correctly, the reference file used for alignment must be passed to iVar using the <code>-r</code> flag. The output of <code>samtools pileup</code> is piped into <code>ivar variants</code> to generate a .tsv file with the variants. There are two parameters that can be set for variant calling using iVar - minimum quality(Default: 20) and minimum frequency(Default: 0.03). Minimum quality is the minimum quality for a base to be counted towards the ungapped depth to canculate iSNV frequency at a given position. For insertions, the quality metric is discarded and the mpileup depth is used directly. Minimum frequency is the minimum frequency required for a SNV or indel to be reported. </p><h4>Amino acid translation of iSNVs</h4>
121+
<p>iVar can identify codons and translate variants into amino acids using a GFF file in the https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md "GFF3" format containing the required coding regions (CDS). In absence of a GFF file, iVar will not perform the translation and "NA" will be added to the output file in place of the reference and alternate codons and amino acids. The GFF file in the GFF3 format can be downloaded via ftp from NCBI RefSeq/Genbank. They are usually the files with the extension ".gff.gz". For example, the GFF file for Zaire Ebolavirus can be found <a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/viral/Zaire_ebolavirus/all_assembly_versions/GCF_000848505.1_ViralProj14703">here</a>. More details on GFF3 files hosted by NCBI can be found in their ftp <a href="https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/">FAQs</a>.</p>
122+
<h4>Account for RNA editing through polymerase slippage</h4>
123+
<p>Some RNA viruses such as Ebola virus, might have polymerase slippage causing the insertion of a couple of nucleotides. More details can be found <a href="https://viralzone.expasy.org/857?outline=all_by_protein">here</a>. iVar can account for this editing and identify the correct open reading frames. The user will have to specify two additional parameters, <b>EditPosition</b>: Position at which edit occurs and <b>EditSequence</b>: The sequence tht is inserted at the given positon, in the "attributes" column of the GFF file to account for this. A test example is given below,</p>
122124
<div class="fragment"><div class="line">test Genbank CDS 2 292 . + . ID=id-testedit1;Note=PinkFloyd;EditPosition=100;EditSequence=A</div><div class="line">test Genbank CDS 2 292 . + . ID=id-testedit2;Note=AnotherBrickInTheWall;EditPosition=102;EditSequence=AA</div></div><!-- fragment --><p>If a certain base is present in multiple CDSs, iVar will add a new row for each CDS frame and distinguish the rows by adding the ID (specified in attributes of GFF) of the GFF feature used for the translation. This is shown for position 42 in the example output below. There are two rows with two different GFF features: id-test3 and id-test4.</p>
123125
<p>Command: </p><div class="fragment"><div class="line">Usage: samtools mpileup -A -d 0 -B -Q 0 &lt;input.bam&gt; | ivar variants -p &lt;prefix&gt; [-q &lt;min-quality&gt;] [-t &lt;min-frequency-threshold&gt;] [-m &lt;minimum depth&gt;] [-r &lt;reference-fasta&gt;] [-g GFF file]</div><div class="line"></div><div class="line">Note : samtools mpileup output must be piped into ivar variants</div><div class="line"></div><div class="line">Input Options Description</div><div class="line"> -q Minimum quality score threshold to count base (Default: 20)</div><div class="line"> -t Minimum frequency threshold(0 - 1) to call variants (Default: 0.03)</div><div class="line"> -m Minimum read depth to call variants (Default: 0)</div><div class="line"> -r Reference file used for alignment. This is used to translate the nucleotide sequences and identify intra host single nucleotide variants</div><div class="line"> -g A GFF file in the GFF3 format can be supplied to specify coordinates of open reading frames (ORFs). In absence of GFF file, amino acid translation will not be done.</div><div class="line"></div><div class="line">Output Options Description</div><div class="line"> -p (Required) Prefix for the output tsv variant file</div></div><!-- fragment --><p>Example Usage: </p><div class="fragment"><div class="line">samtools mpileup -A -d 600000 -F 0 -B -Q 0 test.trimmed.bam | ivar variants -p test -q 20 -t 0.03 -r test_reference.fa -g test.gff</div></div><!-- fragment --><p>The command above will generate a test.tsv file.</p>
124126
<p>Example of output .tsv file.</p>

docs/latex/manualpage.tex

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,9 +79,13 @@
7979
.
8080
\end{DoxyCode}
8181
\hypertarget{manualpage_autotoc_md16}{}\subsection{Call variants with i\+Var}\label{manualpage_autotoc_md16}
82-
i\+Var uses the output of the {\ttfamily samtools mpileup} command to call variants -\/ single nucleotide variants(\+S\+N\+Vs) and indels. In order to call variants correctly, the reference file used for alignment must be passed to i\+Var using the {\ttfamily -\/r} flag. The output of {\ttfamily samtools pileup} is piped into {\ttfamily ivar variants} to generate a .tsv file with the variants. There are two parameters that can be set for variant calling using i\+Var -\/ minimum quality(\+Default\+: 20) and minimum frequency(Default\+: 0.\+03). Minimum quality is the minimum quality for a base to be counted towards the ungapped depth to canculate i\+S\+NV frequency at a given position. For insertions, the quality metric is discarded and the mpileup depth is used directly. Minimum frequency is the minimum frequency required for a S\+NV or indel to be reported. i\+Var can also identify codons and translate variants into amino acids using a G\+FF file in the https\+://github.com/\+The-\/\+Sequence-\/\+Ontology/\+Specifications/blob/master/gff3.\+md \char`\"{}\+G\+F\+F3\char`\"{} format containing the required coding regions (C\+DS). In absence of a G\+FF file, i\+Var will not perform the translation and \char`\"{}\+N\+A\char`\"{} will be added to the output file in place of the reference and alternate codons and amino acids.
82+
i\+Var uses the output of the {\ttfamily samtools mpileup} command to call variants -\/ single nucleotide variants(\+S\+N\+Vs) and indels. In order to call variants correctly, the reference file used for alignment must be passed to i\+Var using the {\ttfamily -\/r} flag. The output of {\ttfamily samtools pileup} is piped into {\ttfamily ivar variants} to generate a .tsv file with the variants. There are two parameters that can be set for variant calling using i\+Var -\/ minimum quality(\+Default\+: 20) and minimum frequency(Default\+: 0.\+03). Minimum quality is the minimum quality for a base to be counted towards the ungapped depth to canculate i\+S\+NV frequency at a given position. For insertions, the quality metric is discarded and the mpileup depth is used directly. Minimum frequency is the minimum frequency required for a S\+NV or indel to be reported. \paragraph*{Amino acid translation of i\+S\+N\+Vs}
8383

84-
Some R\+NA viruses such as Ebola virus, might have polymerase slippage causing the insertion of a couple of nucleotides. More details can be found here\mbox{[}\href{https://viralzone.expasy.org/857?outline=all_by_protein}{\tt https\+://viralzone.\+expasy.\+org/857?outline=all\+\_\+by\+\_\+protein}\mbox{]}. i\+Var can account for this editing and identify the correct open reading frames. The user will have to specify two additional parameters\+: Edit\+Position and Edit\+Sequence in the \char`\"{}attributes\char`\"{} column of the G\+FF file to account for this. A test example is given here,
84+
i\+Var can identify codons and translate variants into amino acids using a G\+FF file in the https\+://github.com/\+The-\/\+Sequence-\/\+Ontology/\+Specifications/blob/master/gff3.\+md \char`\"{}\+G\+F\+F3\char`\"{} format containing the required coding regions (C\+DS). In absence of a G\+FF file, i\+Var will not perform the translation and \char`\"{}\+N\+A\char`\"{} will be added to the output file in place of the reference and alternate codons and amino acids. The G\+FF file in the G\+F\+F3 format can be downloaded via ftp from N\+C\+BI Ref\+Seq/\+Genbank. They are usually the files with the extension \char`\"{}.\+gff.\+gz\char`\"{}. For example, the G\+FF file for Zaire Ebolavirus can be found \href{ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/viral/Zaire_ebolavirus/all_assembly_versions/GCF_000848505.1_ViralProj14703}{\tt here}. More details on G\+F\+F3 files hosted by N\+C\+BI can be found in their ftp \href{https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/}{\tt F\+A\+Qs}.
85+
86+
\paragraph*{Account for R\+NA editing through polymerase slippage}
87+
88+
Some R\+NA viruses such as Ebola virus, might have polymerase slippage causing the insertion of a couple of nucleotides. More details can be found \href{https://viralzone.expasy.org/857?outline=all_by_protein}{\tt here}. i\+Var can account for this editing and identify the correct open reading frames. The user will have to specify two additional parameters, {\bfseries Edit\+Position}\+: Position at which edit occurs and {\bfseries Edit\+Sequence}\+: The sequence tht is inserted at the given positon, in the \char`\"{}attributes\char`\"{} column of the G\+FF file to account for this. A test example is given below,
8589

8690

8791
\begin{DoxyCode}

0 commit comments

Comments
 (0)