Skip to content

Commit c1f9f1f

Browse files
committed
Update documentation
1 parent a824949 commit c1f9f1f

File tree

3 files changed

+150
-35
lines changed

3 files changed

+150
-35
lines changed

doc/bcftools.1

Lines changed: 80 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@
22
.\" Title: bcftools
33
.\" Author: [see the "AUTHOR(S)" section]
44
.\" Generator: Asciidoctor 2.0.15.dev
5-
.\" Date: 2023-03-28
5+
.\" Date: 2023-05-30
66
.\" Manual: \ \&
77
.\" Source: \ \&
88
.\" Language: English
99
.\"
10-
.TH "BCFTOOLS" "1" "2023-03-28" "\ \&" "\ \&"
10+
.TH "BCFTOOLS" "1" "2023-05-30" "\ \&" "\ \&"
1111
.ie \n(.g .ds Aq \(aq
1212
.el .ds Aq '
1313
.ss \n[.ss] 0
@@ -51,7 +51,7 @@ standard input (stdin) and outputs to the standard output (stdout). Several
5151
commands can thus be combined with Unix pipes.
5252
.SS "VERSION"
5353
.sp
54-
This manual page was last updated \fB2023\-03\-28 13:46 BST\fP and refers to bcftools git version \fB1.17\-15\-gf2d2fdf8+\fP.
54+
This manual page was last updated \fB2023\-05\-30 09:18 BST\fP and refers to bcftools git version \fB1.17\-50\-ga8249495+\fP.
5555
.SS "BCF1"
5656
.sp
5757
The obsolete BCF1 format output by versions of samtools <= 0.1.19 is \fBnot\fP
@@ -1359,10 +1359,6 @@ Alias for \fB\-d exact\fP
13591359
.RS 4
13601360
Read file names from \fIFILE\fP, one file name per line.
13611361
.RE
1362-
\fB\-G, \-\-drop\-genotypes\fP
1363-
.RS 4
1364-
drop individual genotype information.
1365-
.RE
13661362
.sp
13671363
\fB\-l, \-\-ligate\fP
13681364
.RS 4
@@ -1522,24 +1518,24 @@ include only sites for which \fIEXPRESSION\fP is true. For valid expressions see
15221518
\fB\-I, \-\-iupac\-codes\fP
15231519
.RS 4
15241520
output variants in the form of IUPAC ambiguity codes determined from FORMAT/GT fields. By default all
1525-
samples are used and can be subset with \f(CR\-s, \-\-samples\fP and \f(CR\-S, \-\-samples\-file\fP. Use \f(CR\-s \-\fP to ignore
1521+
samples are used and can be subset with \fB\-s, \-\-samples\fP and \fB\-S, \-\-samples\-file\fP. Use \fB\-s \-\fP to ignore
15261522
samples and use only the REF and ALT columns. NOTE: prior to version 1.17 the IUPAC codes were determined solely
15271523
from REF,ALT columns and sample genotypes were not considered.
15281524
.RE
15291525
.sp
15301526
\fB\-\-mark\-del\fP \fICHAR\fP
15311527
.RS 4
1532-
instead of removing sequence, insert CHAR for deletions
1528+
instead of removing sequence, insert character CHAR for deletions
15331529
.RE
15341530
.sp
1535-
\fB\-\-mark\-ins\fP \fIuc\fP|\fIlc\fP
1531+
\fB\-\-mark\-ins\fP \fIuc\fP|\fIlc\fP|\fICHAR\fP
15361532
.RS 4
1537-
highlight inserted sequence in uppercase (uc) or lowercase (lc), leaving the rest of the sequence as is
1533+
highlight inserted sequence in uppercase (uc), lowercase (lc), or a provided character CHAR, leaving the rest of the sequence as is
15381534
.RE
15391535
.sp
15401536
\fB\-\-mark\-snv\fP \fIuc\fP|\fIlc\fP
15411537
.RS 4
1542-
highlight substitutions in uppercase (uc) or lowercase (lc), leaving the rest of the sequence as is
1538+
highlight substitutions in uppercase (uc), lowercase (lc), or a provided character CHAR, leaving the rest of the sequence as is
15431539
.RE
15441540
.sp
15451541
\fB\-m, \-\-mask\fP \fIFILE\fP
@@ -1567,12 +1563,12 @@ write output to a file
15671563
.sp
15681564
\fB\-s, \-\-samples\fP \fILIST\fP
15691565
.RS 4
1570-
apply variants of the listed samples. See also the option \f(CR\-I, \-\-iupac\-codes\fP
1566+
apply variants of the listed samples. See also the option \fB\-I, \-\-iupac\-codes\fP
15711567
.RE
15721568
.sp
15731569
\fB\-S, \-\-samples\-file\fP \fIFILE\fP
15741570
.RS 4
1575-
apply variants of the samples listed in the file. See also the option \f(CR\-I, \-\-iupac\-codes\fP
1571+
apply variants of the samples listed in the file. See also the option \fB\-I, \-\-iupac\-codes\fP
15761572
.RE
15771573
.sp
15781574
\fBExamples:\fP
@@ -1591,6 +1587,44 @@ apply variants of the samples listed in the file. See also the option \f(CR\-I,
15911587
.fam
15921588
.fi
15931589
.if n .RE
1590+
.sp
1591+
\fBNotes:\fP
1592+
.RS 4
1593+
Masking options are applied in the following order
1594+
.sp
1595+
.RS 4
1596+
.ie n \{\
1597+
\h'-04' 1.\h'+01'\c
1598+
.\}
1599+
.el \{\
1600+
. sp -1
1601+
. IP " 1." 4.2
1602+
.\}
1603+
mask regions with \fB\-\-mask\-with\fP character if \fB\-\-mask\fP is given. All overlapping VCF variants are ignored
1604+
.RE
1605+
.sp
1606+
.RS 4
1607+
.ie n \{\
1608+
\h'-04' 2.\h'+01'\c
1609+
.\}
1610+
.el \{\
1611+
. sp -1
1612+
. IP " 2." 4.2
1613+
.\}
1614+
replace sequence not mentioned in the VCF with the requested character if \fB\-\-absent\fP is given
1615+
.RE
1616+
.sp
1617+
.RS 4
1618+
.ie n \{\
1619+
\h'-04' 3.\h'+01'\c
1620+
.\}
1621+
.el \{\
1622+
. sp -1
1623+
. IP " 3." 4.2
1624+
.\}
1625+
finally apply \fB\-\-mark\-del\fP, \fB\-\-mark\-ins\fP, \fB\-\-mark\-snv\fP masks
1626+
.RE
1627+
.RE
15941628
.SS "bcftools convert \fI[OPTIONS]\fP \fIFILE\fP"
15951629
.SS "VCF input options:"
15961630
.sp
@@ -1920,13 +1954,13 @@ convert from TSV (tab\-separated values) format (such as generated by
19201954
\fB\-c, \-\-columns\fP \fIlist\fP
19211955
.RS 4
19221956
comma\-separated list of fields in the input file. In the current
1923-
version, the fields CHROM, POS, ID, and AA are expected and
1924-
can appear in arbitrary order, columns which should be ignored in the input
1957+
version, the fields CHROM, POS, ID, and AA or REF, ALT are expected and
1958+
can appear in arbitrary order. Columns which should be ignored in the input
19251959
file can be indicated by "\-".
19261960
The AA field lists alleles on the forward reference strand,
19271961
for example "CC" or "CT" for diploid genotypes or "C"
19281962
for haploid genotypes (sex chromosomes). Insertions and deletions
1929-
are not supported yet, missing data can be indicated with "\-\-".
1963+
are supported only with REF and ALT but not with AA. Missing data can be indicated with "\-\-" or ".".
19301964
.RE
19311965
.sp
19321966
\fB\-f, \-\-fasta\-ref\fP \fIfile\fP
@@ -1950,7 +1984,10 @@ file of sample names. See \fBCommon Options\fP
19501984
.nf
19511985
.fam C
19521986
# Convert 23andme results into VCF
1953-
bcftools convert \-c ID,CHROM,POS,AA \-s SampleName \-f 23andme\-ref.fa \-\-tsv2vcf 23andme.txt \-Oz \-o out.vcf.gz
1987+
bcftools convert \-c ID,CHROM,POS,AA \-s SampleName \-f 23andme\-ref.fa \-\-tsv2vcf 23andme.txt \-o out.vcf.gz
1988+
1989+
# Convert tab\-delimited file into a sites\-only VCF (no genotypes), in this example first column to be ignored
1990+
bcftools convert \-c \-,CHROM,POS,REF,ALT \-f ref.fa \-\-tsv2vcf calls.txt \-o out.bcf
19541991
.fam
19551992
.fi
19561993
.if n .RE
@@ -1999,6 +2036,12 @@ aminoacids, with \fB\-B 1\fP only an abbreviated version such as \fI25E..329>25G
19992036
written.
20002037
.RE
20012038
.sp
2039+
\fB\-\-dump\-gff\fP \fIFILE\fP
2040+
.RS 4
2041+
dump the parsed GFF into a gzipped FILE. Intended for debugging purposes,
2042+
shows how is the input GFF viewed by the program.
2043+
.RE
2044+
.sp
20022045
\fB\-e, \-\-exclude\fP \fIEXPRESSION\fP
20032046
.RS 4
20042047
exclude sites for which \fIEXPRESSION\fP is true. For valid expressions see
@@ -2032,6 +2075,17 @@ An example of a minimal working GFF file:
20322075
# the gene (determined from the transcript\(aqs "Parent=gene:" attribute), and the biotype
20332076
# (the most interesting is "protein_coding").
20342077
#
2078+
# Empty and commented lines are skipped, the following GFF columns are required
2079+
# 1. chromosome
2080+
# 2. IGNORED
2081+
# 3. type (CDS, exon, three_prime_UTR, five_prime_UTR, gene, transcript, etc.)
2082+
# 4. start of the feature (1\-based)
2083+
# 5. end of the feature (1\-based)
2084+
# 6. IGNORED
2085+
# 7. strand (+ or \-)
2086+
# 8. phase (0, 1, 2 or .)
2087+
# 9. semicolon\-separated attributes (see below)
2088+
#
20352089
# Attributes required for
20362090
# gene lines:
20372091
# \- ID=gene:<gene_id>
@@ -2171,6 +2225,13 @@ see \fBCommon Options\fP
21712225
see \fBCommon Options\fP
21722226
.RE
21732227
.sp
2228+
\fB\-\-unify\-chr\-names\fP \fI0\fP|\fI1\fP
2229+
.RS 4
2230+
Automatically detect and unify chromosome naming conventions in the GFF, fasta
2231+
and VCF, such as "chrX" vs "X". The chromosome names in the output VCF will match
2232+
that of the input VCF. The default is to attempt the automatic translation.
2233+
.RE
2234+
.sp
21742235
\fB\-\-write\-index\fP
21752236
.RS 4
21762237
Automatically index the output file
@@ -6143,4 +6204,4 @@ BCFtools wiki page: \c
61436204
.SH "COPYING"
61446205
.sp
61456206
The MIT/Expat License or GPL License, see the LICENSE document for details.
6146-
Copyright (c) Genome Research Ltd.
6207+
Copyright (c) Genome Research Ltd.

doc/bcftools.html

Lines changed: 59 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ <h2 id="_description">DESCRIPTION</h2>
5050
<div class="sect2">
5151
<h3 id="_version">VERSION</h3>
5252
<div class="paragraph">
53-
<p>This manual page was last updated <strong>2023-03-28 13:46 BST</strong> and refers to bcftools git version <strong>1.17-15-gf2d2fdf8+</strong>.</p>
53+
<p>This manual page was last updated <strong>2023-05-30 09:18 BST</strong> and refers to bcftools git version <strong>1.17-50-ga8249495+</strong>.</p>
5454
</div>
5555
</div>
5656
<div class="sect2">
@@ -1271,21 +1271,21 @@ <h3 id="consensus">bcftools consensus <em>[OPTIONS]</em> <em>FILE</em></h3>
12711271
<dt class="hdlist1"><strong>-I, --iupac-codes</strong></dt>
12721272
<dd>
12731273
<p>output variants in the form of IUPAC ambiguity codes determined from FORMAT/GT fields. By default all
1274-
samples are used and can be subset with <code>-s, --samples</code> and <code>-S, --samples-file</code>. Use <code>-s -</code> to ignore
1274+
samples are used and can be subset with <strong>-s, --samples</strong> and <strong>-S, --samples-file</strong>. Use <strong>-s -</strong> to ignore
12751275
samples and use only the REF and ALT columns. NOTE: prior to version 1.17 the IUPAC codes were determined solely
12761276
from REF,ALT columns and sample genotypes were not considered.</p>
12771277
</dd>
12781278
<dt class="hdlist1"><strong>--mark-del</strong> <em>CHAR</em></dt>
12791279
<dd>
1280-
<p>instead of removing sequence, insert CHAR for deletions</p>
1280+
<p>instead of removing sequence, insert character CHAR for deletions</p>
12811281
</dd>
1282-
<dt class="hdlist1"><strong>--mark-ins</strong> <em>uc</em>|<em>lc</em></dt>
1282+
<dt class="hdlist1"><strong>--mark-ins</strong> <em>uc</em>|<em>lc</em>|<em>CHAR</em></dt>
12831283
<dd>
1284-
<p>highlight inserted sequence in uppercase (uc) or lowercase (lc), leaving the rest of the sequence as is</p>
1284+
<p>highlight inserted sequence in uppercase (uc), lowercase (lc), or a provided character CHAR, leaving the rest of the sequence as is</p>
12851285
</dd>
12861286
<dt class="hdlist1"><strong>--mark-snv</strong> <em>uc</em>|<em>lc</em></dt>
12871287
<dd>
1288-
<p>highlight substitutions in uppercase (uc) or lowercase (lc), leaving the rest of the sequence as is</p>
1288+
<p>highlight substitutions in uppercase (uc), lowercase (lc), or a provided character CHAR, leaving the rest of the sequence as is</p>
12891289
</dd>
12901290
<dt class="hdlist1"><strong>-m, --mask</strong> <em>FILE</em></dt>
12911291
<dd>
@@ -1308,11 +1308,11 @@ <h3 id="consensus">bcftools consensus <em>[OPTIONS]</em> <em>FILE</em></h3>
13081308
</dd>
13091309
<dt class="hdlist1"><strong>-s, --samples</strong> <em>LIST</em></dt>
13101310
<dd>
1311-
<p>apply variants of the listed samples. See also the option <code>-I, --iupac-codes</code></p>
1311+
<p>apply variants of the listed samples. See also the option <strong>-I, --iupac-codes</strong></p>
13121312
</dd>
13131313
<dt class="hdlist1"><strong>-S, --samples-file</strong> <em>FILE</em></dt>
13141314
<dd>
1315-
<p>apply variants of the samples listed in the file. See also the option <code>-I, --iupac-codes</code></p>
1315+
<p>apply variants of the samples listed in the file. See also the option <strong>-I, --iupac-codes</strong></p>
13161316
</dd>
13171317
</dl>
13181318
</div>
@@ -1331,6 +1331,27 @@ <h3 id="consensus">bcftools consensus <em>[OPTIONS]</em> <em>FILE</em></h3>
13311331
# For more examples see http://samtools.github.io/bcftools/howtos/consensus-sequence.html</pre>
13321332
</div>
13331333
</div>
1334+
<div class="dlist">
1335+
<dl>
1336+
<dt class="hdlist1"><strong>Notes:</strong></dt>
1337+
<dd>
1338+
<p>Masking options are applied in the following order</p>
1339+
<div class="olist arabic">
1340+
<ol class="arabic">
1341+
<li>
1342+
<p>mask regions with <strong>--mask-with</strong> character if <strong>--mask</strong> is given. All overlapping VCF variants are ignored</p>
1343+
</li>
1344+
<li>
1345+
<p>replace sequence not mentioned in the VCF with the requested character if <strong>--absent</strong> is given</p>
1346+
</li>
1347+
<li>
1348+
<p>finally apply <strong>--mark-del</strong>, <strong>--mark-ins</strong>, <strong>--mark-snv</strong> masks</p>
1349+
</li>
1350+
</ol>
1351+
</div>
1352+
</dd>
1353+
</dl>
1354+
</div>
13341355
</div>
13351356
<div class="sect2">
13361357
<h3 id="convert">bcftools convert <em>[OPTIONS]</em> <em>FILE</em></h3>
@@ -1665,13 +1686,13 @@ <h4 id="_tsv_conversion">TSV conversion:</h4>
16651686
<dt class="hdlist1"><strong>-c, --columns</strong> <em>list</em></dt>
16661687
<dd>
16671688
<p>comma-separated list of fields in the input file. In the current
1668-
version, the fields CHROM, POS, ID, and AA are expected and
1669-
can appear in arbitrary order, columns which should be ignored in the input
1689+
version, the fields CHROM, POS, ID, and AA or REF, ALT are expected and
1690+
can appear in arbitrary order. Columns which should be ignored in the input
16701691
file can be indicated by "-".
16711692
The AA field lists alleles on the forward reference strand,
16721693
for example "CC" or "CT" for diploid genotypes or "C"
16731694
for haploid genotypes (sex chromosomes). Insertions and deletions
1674-
are not supported yet, missing data can be indicated with "--".</p>
1695+
are supported only with REF and ALT but not with AA. Missing data can be indicated with "--" or ".".</p>
16751696
</dd>
16761697
<dt class="hdlist1"><strong>-f, --fasta-ref</strong> <em>file</em></dt>
16771698
<dd>
@@ -1693,7 +1714,10 @@ <h4 id="_tsv_conversion">TSV conversion:</h4>
16931714
<div class="listingblock">
16941715
<div class="content">
16951716
<pre># Convert 23andme results into VCF
1696-
bcftools convert -c ID,CHROM,POS,AA -s SampleName -f 23andme-ref.fa --tsv2vcf 23andme.txt -Oz -o out.vcf.gz</pre>
1717+
bcftools convert -c ID,CHROM,POS,AA -s SampleName -f 23andme-ref.fa --tsv2vcf 23andme.txt -o out.vcf.gz
1718+
1719+
# Convert tab-delimited file into a sites-only VCF (no genotypes), in this example first column to be ignored
1720+
bcftools convert -c -,CHROM,POS,REF,ALT -f ref.fa --tsv2vcf calls.txt -o out.bcf</pre>
16971721
</div>
16981722
</div>
16991723
</div>
@@ -1749,6 +1773,11 @@ <h3 id="csq">bcftools csq <em>[OPTIONS]</em> <em>FILE</em></h3>
17491773
aminoacids, with <strong>-B 1</strong> only an abbreviated version such as <em>25E..329&gt;25G..94</em> will be
17501774
written.</p>
17511775
</dd>
1776+
<dt class="hdlist1"><strong>--dump-gff</strong> <em>FILE</em></dt>
1777+
<dd>
1778+
<p>dump the parsed GFF into a gzipped FILE. Intended for debugging purposes,
1779+
shows how is the input GFF viewed by the program.</p>
1780+
</dd>
17521781
<dt class="hdlist1"><strong>-e, --exclude</strong> <em>EXPRESSION</em></dt>
17531782
<dd>
17541783
<p>exclude sites for which <em>EXPRESSION</em> is true. For valid expressions see
@@ -1778,6 +1807,17 @@ <h3 id="csq">bcftools csq <em>[OPTIONS]</em> <em>FILE</em></h3>
17781807
# the gene (determined from the transcript's "Parent=gene:" attribute), and the biotype
17791808
# (the most interesting is "protein_coding").
17801809
#
1810+
# Empty and commented lines are skipped, the following GFF columns are required
1811+
# 1. chromosome
1812+
# 2. IGNORED
1813+
# 3. type (CDS, exon, three_prime_UTR, five_prime_UTR, gene, transcript, etc.)
1814+
# 4. start of the feature (1-based)
1815+
# 5. end of the feature (1-based)
1816+
# 6. IGNORED
1817+
# 7. strand (+ or -)
1818+
# 8. phase (0, 1, 2 or .)
1819+
# 9. semicolon-separated attributes (see below)
1820+
#
17811821
# Attributes required for
17821822
# gene lines:
17831823
# - ID=gene:&lt;gene_id&gt;
@@ -1900,6 +1940,12 @@ <h3 id="csq">bcftools csq <em>[OPTIONS]</em> <em>FILE</em></h3>
19001940
<dd>
19011941
<p>see <strong><a href="#common_options">Common Options</a></strong></p>
19021942
</dd>
1943+
<dt class="hdlist1"><strong>--unify-chr-names</strong> <em>0</em>|<em>1</em></dt>
1944+
<dd>
1945+
<p>Automatically detect and unify chromosome naming conventions in the GFF, fasta
1946+
and VCF, such as "chrX" vs "X". The chromosome names in the output VCF will match
1947+
that of the input VCF. The default is to attempt the automatic translation.</p>
1948+
</dd>
19031949
<dt class="hdlist1"><strong>--write-index</strong></dt>
19041950
<dd>
19051951
<p>Automatically index the output file</p>
@@ -5211,7 +5257,7 @@ <h2 id="_copying">COPYING</h2>
52115257
</div>
52125258
<div id="footer">
52135259
<div id="footer-text">
5214-
Last updated 2023-03-28 13:46:18 +0100
5260+
Last updated 2023-05-30 09:18:06 +0100
52155261
</div>
52165262
</div>
52175263
</body>

0 commit comments

Comments
 (0)