22.\" Title: bcftools
33.\" Author: [see the "AUTHOR(S)" section]
44.\" Generator: Asciidoctor 2.0.15.dev
5- .\" Date: 2023-03-28
5+ .\" Date: 2023-05-30
66.\" Manual: \ \&
77.\" Source: \ \&
88.\" Language: English
99.\"
10- .TH "BCFTOOLS" "1" "2023-03-28 " "\ \& " "\ \& "
10+ .TH "BCFTOOLS" "1" "2023-05-30 " "\ \& " "\ \& "
1111.ie \n( .g .ds Aq \(aq
1212.el .ds Aq '
1313.ss \n[ .ss ] 0
@@ -51,7 +51,7 @@ standard input (stdin) and outputs to the standard output (stdout). Several
5151commands can thus be combined with Unix pipes.
5252.SS "VERSION"
5353.sp
54- This manual page was last updated \fB 2023 \- 03 \- 28 13:46 BST \fP and refers to bcftools git version \fB 1.17 \- 15 \- gf2d2fdf8 +\fP .
54+ This manual page was last updated \fB 2023 \- 05 \- 30 09:18 BST \fP and refers to bcftools git version \fB 1.17 \- 50 \- ga8249495 +\fP .
5555.SS "BCF1"
5656.sp
5757The obsolete BCF1 format output by versions of samtools <= 0.1.19 is \fB not \fP
@@ -1359,10 +1359,6 @@ Alias for \fB\-d exact\fP
13591359.RS 4
13601360Read file names from \fI FILE \fP , one file name per line.
13611361.RE
1362- \fB \- G, \-\- drop \- genotypes \fP
1363- .RS 4
1364- drop individual genotype information.
1365- .RE
13661362.sp
13671363\fB \- l, \-\- ligate \fP
13681364.RS 4
@@ -1522,24 +1518,24 @@ include only sites for which \fIEXPRESSION\fP is true. For valid expressions see
15221518\fB \- I, \-\- iupac \- codes \fP
15231519.RS 4
15241520output variants in the form of IUPAC ambiguity codes determined from FORMAT/GT fields. By default all
1525- samples are used and can be subset with \f(CR \- s, \-\- samples \fP and \f(CR \- S, \-\- samples \- file \fP . Use \f(CR \- s \- \fP to ignore
1521+ samples are used and can be subset with \fB \- s, \-\- samples \fP and \fB \- S, \-\- samples \- file \fP . Use \fB \- s \- \fP to ignore
15261522samples and use only the REF and ALT columns. NOTE: prior to version 1.17 the IUPAC codes were determined solely
15271523from REF,ALT columns and sample genotypes were not considered.
15281524.RE
15291525.sp
15301526\fB \-\- mark \- del \fP \fI CHAR \fP
15311527.RS 4
1532- instead of removing sequence, insert CHAR for deletions
1528+ instead of removing sequence, insert character CHAR for deletions
15331529.RE
15341530.sp
1535- \fB \-\- mark \- ins \fP \fI uc \fP |\fI lc \fP
1531+ \fB \-\- mark \- ins \fP \fI uc \fP |\fI lc \fP | \fI CHAR \fP
15361532.RS 4
1537- highlight inserted sequence in uppercase (uc) or lowercase (lc), leaving the rest of the sequence as is
1533+ highlight inserted sequence in uppercase (uc), lowercase (lc), or a provided character CHAR , leaving the rest of the sequence as is
15381534.RE
15391535.sp
15401536\fB \-\- mark \- snv \fP \fI uc \fP |\fI lc \fP
15411537.RS 4
1542- highlight substitutions in uppercase (uc) or lowercase (lc), leaving the rest of the sequence as is
1538+ highlight substitutions in uppercase (uc), lowercase (lc), or a provided character CHAR , leaving the rest of the sequence as is
15431539.RE
15441540.sp
15451541\fB \- m, \-\- mask \fP \fI FILE \fP
@@ -1567,12 +1563,12 @@ write output to a file
15671563.sp
15681564\fB \- s, \-\- samples \fP \fI LIST \fP
15691565.RS 4
1570- apply variants of the listed samples. See also the option \f(CR \- I, \-\- iupac \- codes \fP
1566+ apply variants of the listed samples. See also the option \fB \- I, \-\- iupac \- codes \fP
15711567.RE
15721568.sp
15731569\fB \- S, \-\- samples \- file \fP \fI FILE \fP
15741570.RS 4
1575- apply variants of the samples listed in the file. See also the option \f(CR \- I, \-\- iupac \- codes \fP
1571+ apply variants of the samples listed in the file. See also the option \fB \- I, \-\- iupac \- codes \fP
15761572.RE
15771573.sp
15781574\fB Examples: \fP
@@ -1591,6 +1587,44 @@ apply variants of the samples listed in the file. See also the option \f(CR\-I,
15911587.fam
15921588.fi
15931589.if n .RE
1590+ .sp
1591+ \fB Notes: \fP
1592+ .RS 4
1593+ Masking options are applied in the following order
1594+ .sp
1595+ .RS 4
1596+ .ie n \{\
1597+ \h '-04' 1.\h '+01' \c
1598+ .\}
1599+ .el \{\
1600+ . sp -1
1601+ . IP " 1." 4.2
1602+ .\}
1603+ mask regions with \fB \-\- mask \- with \fP character if \fB \-\- mask \fP is given. All overlapping VCF variants are ignored
1604+ .RE
1605+ .sp
1606+ .RS 4
1607+ .ie n \{\
1608+ \h '-04' 2.\h '+01' \c
1609+ .\}
1610+ .el \{\
1611+ . sp -1
1612+ . IP " 2." 4.2
1613+ .\}
1614+ replace sequence not mentioned in the VCF with the requested character if \fB \-\- absent \fP is given
1615+ .RE
1616+ .sp
1617+ .RS 4
1618+ .ie n \{\
1619+ \h '-04' 3.\h '+01' \c
1620+ .\}
1621+ .el \{\
1622+ . sp -1
1623+ . IP " 3." 4.2
1624+ .\}
1625+ finally apply \fB \-\- mark \- del \fP , \fB \-\- mark \- ins \fP , \fB \-\- mark \- snv \fP masks
1626+ .RE
1627+ .RE
15941628.SS "bcftools convert \fI [OPTIONS] \fP \fI FILE \fP "
15951629.SS "VCF input options:"
15961630.sp
@@ -1920,13 +1954,13 @@ convert from TSV (tab\-separated values) format (such as generated by
19201954\fB \- c, \-\- columns \fP \fI list \fP
19211955.RS 4
19221956comma\- separated list of fields in the input file. In the current
1923- version, the fields CHROM, POS, ID, and AA are expected and
1924- can appear in arbitrary order, columns which should be ignored in the input
1957+ version, the fields CHROM, POS, ID, and AA or REF, ALT are expected and
1958+ can appear in arbitrary order. Columns which should be ignored in the input
19251959file can be indicated by "\- ".
19261960The AA field lists alleles on the forward reference strand,
19271961for example "CC" or "CT" for diploid genotypes or "C"
19281962for haploid genotypes (sex chromosomes). Insertions and deletions
1929- are not supported yet, missing data can be indicated with "\-\- ".
1963+ are supported only with REF and ALT but not with AA. Missing data can be indicated with "\-\- " or ". ".
19301964.RE
19311965.sp
19321966\fB \- f, \-\- fasta \- ref \fP \fI file \fP
@@ -1950,7 +1984,10 @@ file of sample names. See \fBCommon Options\fP
19501984.nf
19511985.fam C
19521986# Convert 23andme results into VCF
1953- bcftools convert \- c ID,CHROM,POS,AA \- s SampleName \- f 23andme\- ref.fa \-\- tsv2vcf 23andme.txt \- Oz \- o out.vcf.gz
1987+ bcftools convert \- c ID,CHROM,POS,AA \- s SampleName \- f 23andme\- ref.fa \-\- tsv2vcf 23andme.txt \- o out.vcf.gz
1988+
1989+ # Convert tab\- delimited file into a sites\- only VCF (no genotypes), in this example first column to be ignored
1990+ bcftools convert \- c \- ,CHROM,POS,REF,ALT \- f ref.fa \-\- tsv2vcf calls.txt \- o out.bcf
19541991.fam
19551992.fi
19561993.if n .RE
@@ -1999,6 +2036,12 @@ aminoacids, with \fB\-B 1\fP only an abbreviated version such as \fI25E..329>25G
19992036written.
20002037.RE
20012038.sp
2039+ \fB \-\- dump \- gff \fP \fI FILE \fP
2040+ .RS 4
2041+ dump the parsed GFF into a gzipped FILE. Intended for debugging purposes,
2042+ shows how is the input GFF viewed by the program.
2043+ .RE
2044+ .sp
20022045\fB \- e, \-\- exclude \fP \fI EXPRESSION \fP
20032046.RS 4
20042047exclude sites for which \fI EXPRESSION \fP is true. For valid expressions see
@@ -2032,6 +2075,17 @@ An example of a minimal working GFF file:
20322075 # the gene (determined from the transcript\(aq s "Parent=gene:" attribute), and the biotype
20332076 # (the most interesting is "protein_coding").
20342077 #
2078+ # Empty and commented lines are skipped, the following GFF columns are required
2079+ # 1. chromosome
2080+ # 2. IGNORED
2081+ # 3. type (CDS, exon, three_prime_UTR, five_prime_UTR, gene, transcript, etc.)
2082+ # 4. start of the feature (1\- based)
2083+ # 5. end of the feature (1\- based)
2084+ # 6. IGNORED
2085+ # 7. strand (+ or \- )
2086+ # 8. phase (0, 1, 2 or .)
2087+ # 9. semicolon\- separated attributes (see below)
2088+ #
20352089 # Attributes required for
20362090 # gene lines:
20372091 # \- ID=gene:<gene_id>
@@ -2171,6 +2225,13 @@ see \fBCommon Options\fP
21712225see \fB Common Options \fP
21722226.RE
21732227.sp
2228+ \fB \-\- unify \- chr \- names \fP \fI 0 \fP |\fI 1 \fP
2229+ .RS 4
2230+ Automatically detect and unify chromosome naming conventions in the GFF, fasta
2231+ and VCF, such as "chrX" vs "X". The chromosome names in the output VCF will match
2232+ that of the input VCF. The default is to attempt the automatic translation.
2233+ .RE
2234+ .sp
21742235\fB \-\- write \- index \fP
21752236.RS 4
21762237Automatically index the output file
@@ -6143,4 +6204,4 @@ BCFtools wiki page: \c
61436204.SH "COPYING"
61446205.sp
61456206The MIT/Expat License or GPL License, see the LICENSE document for details.
6146- Copyright (c) Genome Research Ltd.
6207+ Copyright (c) Genome Research Ltd.
0 commit comments