Skip to content

Commit 983e610

Browse files
committed
Improvements and changes in gtcheck
- Add output options `-o, --output` and `-O, --output-type` - Add filtering options `-i, --include` and `-e, --exclude` - Rename the short option `-e, --error-probability` from lower case to upper case `-E, --error-probability`. This is a backward compatible change, the program can tell when the lower case `-e` is used in the previous meaning - Changes to the output format, replace the DC section with DCv2: - adds a new column for the number of matching genotypes - fixes in HWE score calculation plus output average HWE score rather than absolute HWE score - better description of fields
1 parent a1b781d commit 983e610

22 files changed

+361
-143
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,7 @@ vcfcall.o: vcfcall.c $(htslib_vcf_h) $(htslib_kfunc_h) $(htslib_synced_bcf_reade
245245
vcfconcat.o: vcfconcat.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_kseq_h) $(htslib_bgzf_h) $(htslib_tbx_h) $(htslib_thread_pool_h) $(bcftools_h)
246246
vcfconvert.o: vcfconvert.c $(htslib_faidx_h) $(htslib_vcf_h) $(htslib_bgzf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(htslib_kseq_h) $(bcftools_h) $(filter_h) $(convert_h) $(tsv2vcf_h)
247247
vcffilter.o: vcffilter.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(bcftools_h) $(filter_h) rbuf.h regidx.h
248-
vcfgtcheck.o: vcfgtcheck.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(htslib_kbitset_h) $(htslib_hts_os_h) $(bcftools_h) extsort.h
248+
vcfgtcheck.o: vcfgtcheck.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(htslib_kbitset_h) $(htslib_hts_os_h) $(htslib_bgzf_h) $(bcftools_h) extsort.h filter.h
249249
vcfindex.o: vcfindex.c $(htslib_vcf_h) $(htslib_tbx_h) $(htslib_kstring_h) $(htslib_bgzf_h) $(bcftools_h)
250250
vcfisec.o: vcfisec.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(htslib_hts_os_h) $(bcftools_h) $(filter_h)
251251
vcfmerge.o: vcfmerge.c $(htslib_vcf_h) $(htslib_synced_bcf_reader_h) $(htslib_vcfutils_h) $(htslib_faidx_h) regidx.h $(bcftools_h) vcmp.h $(htslib_khash_h)

NEWS

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,25 @@ Changes affecting specific commands:
2828
- New `-*, --keep-unseen-allele` option to output the unobserved allele <*>,
2929
intended for gVCF.
3030

31+
* bcftools gtcheck
32+
33+
- Add output options `-o, --output` and `-O, --output-type`
34+
35+
- Add filtering options `-i, --include` and `-e, --exclude`
36+
37+
- Rename the short option `-e, --error-probability` from lower case to upper
38+
case `-E, --error-probability`
39+
40+
- Changes to the output format, replace the DC section with DCv2:
41+
42+
- adds a new column for the number of matching genotypes
43+
44+
- fixes in HWE score calculation plus output average HWE score rather
45+
than absolute HWE score
46+
47+
- better description of fields
48+
49+
3150
* bcftools mpileup
3251

3352
- Output MIN_DP rather than MinDP in gVCF mode

doc/bcftools.txt

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1640,25 +1640,38 @@ The discordance score can be interpreted as the number of mismatching genotypes
16401640
*--dry-run*::
16411641
Stop after first record to estimate required time.
16421642

1643-
*-e, --error-probability* 'INT'::
1643+
*-e, --exclude* ['qry'|'gt']:'EXPRESSION'::
1644+
Exclude sites from query file ('qry:') or genotype file ('gt:') for which 'EXPRESSION' is true.
1645+
For valid expressions see *<<expressions,EXPRESSIONS>>*.
1646+
1647+
*-E, --error-probability* 'INT'::
16441648
Interpret genotypes and genotype likelihoods probabilistically. The value of 'INT'
16451649
represents genotype quality when GT tag is used (e.g. Q=30 represents one error in 1,000 genotypes and
16461650
Q=40 one error in 10,000 genotypes) and is ignored when PL tag is used (in that case an arbitrary
16471651
non-zero integer can be provided).
16481652
{nbsp} +
16491653
{nbsp} +
1650-
If *-e* is set to 0, the discordance score can be interpreted as the number of mismatching genotypes,
1654+
If *-E* is set to 0, the discordance score can be interpreted as the number of mismatching genotypes,
16511655
but only in the GT-vs-GT matching mode. See the *-u, --use* option below for additional notes and caveats.
16521656
{nbsp} +
16531657
{nbsp} +
1654-
If performance is an issue, set *-e 0* for faster run times but less accurate results.
1658+
If performance is an issue, set *-E 0* for faster run times but less accurate results.
1659+
{nbsp} +
1660+
{nbsp} +
1661+
Note that in previous versions of bcftools (<=1.18), this option used to be a smaller case *-e*. It
1662+
changed to make room for the filtering option *-e, --exclude* to stay consistent across other
1663+
commands.
16551664

16561665
*-g, --genotypes* 'FILE'::
16571666
VCF/BCF file with reference genotypes to compare against
16581667

16591668
*-H, --homs-only*::
16601669
Homozygous genotypes only, useful with low coverage data (requires *-g, --genotypes*)
16611670

1671+
*-i, --include* ['qry'|'gt']:'EXPRESSION'::
1672+
Include sites from query file ('qry:') or genotype file ('gt:') for which 'EXPRESSION' is true.
1673+
For valid expressions see *<<expressions,EXPRESSIONS>>*.
1674+
16621675
*--n-matches* 'INT'::
16631676
Print only top INT matches for each sample, 0 for unlimited. Use negative value
16641677
to sort by HWE probability rather than the number of discordant sites. Note
@@ -1668,6 +1681,12 @@ The discordance score can be interpreted as the number of mismatching genotypes
16681681
Disable calculation of HWE probability to reduce memory requirements with
16691682
comparisons between very large number of sample pairs.
16701683

1684+
*-o, --output* 'FILE'::
1685+
Write to 'FILE' rather than to standard output, where it is written by default.
1686+
1687+
*-O, --output-type* 't'|'z'::
1688+
Write a plain ('t') or compressed ('z') text tab-delimited output.
1689+
16711690
*-p, --pairs* 'LIST'::
16721691
A comma-separated list of sample pairs to compare. When the *-g* option is given, the first
16731692
sample must be from the query file, the second from the *-g* file, third from the query file

test/gtcheck.1.out

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
DC s1 s1 0 2.371900e+01 2
1+
DCv2 s1 s1 0 3.465736e-01 2 2

test/gtcheck.10.out

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
DC s1 s1 4.002001e-03 2.371900e+01 2
1+
DCv2 s1 s1 4.002001e-03 3.465736e-01 2 2

test/gtcheck.11.out

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
DC s1 s1 0.000000e+00 2.371900e+01 2
1+
DCv2 s1 s1 0.000000e+00 3.465736e-01 2 2

test/gtcheck.12.out

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
DC B A 5.733631e-01 2.253795e+00 2
2-
DC C A 4.938053e+00 8.675006e-01 2
3-
DC C B 2.791391e+00 8.675006e-01 2
4-
DC D A 5.022610e+00 0.000000e+00 2
5-
DC D B 5.178533e+00 0.000000e+00 2
6-
DC D C 4.938053e+00 0.000000e+00 2
7-
DC E A 7.325195e+00 0.000000e+00 2
8-
DC E B 5.178533e+00 0.000000e+00 2
9-
DC E C 2.635468e+00 1.386294e+00 2
10-
DC E D 2.720025e+00 2.407946e+00 2
1+
DCv2 B A 5.733631e-01 1.126897e+00 2 2
2+
DCv2 C A 4.938053e+00 4.337503e-01 2 2
3+
DCv2 C B 2.791391e+00 4.337503e-01 2 2
4+
DCv2 D A 5.022610e+00 0.000000e+00 2 2
5+
DCv2 D B 5.178533e+00 0.000000e+00 2 2
6+
DCv2 D C 4.938053e+00 0.000000e+00 2 2
7+
DCv2 E A 7.325195e+00 0.000000e+00 2 2
8+
DCv2 E B 5.178533e+00 0.000000e+00 2 2
9+
DCv2 E C 2.635468e+00 6.931472e-01 2 2
10+
DCv2 E D 2.720025e+00 3.566749e-01 2 2

test/gtcheck.2.out

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
DC s1 s1 0 6.931472e-01 1
1+
DCv2 s1 s1 0 6.931472e-01 1 1

test/gtcheck.3.out

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
DC B A 0 2.253795e+00 2
2-
DC C A 1 8.675006e-01 2
3-
DC C B 1 8.675006e-01 2
4-
DC D A 2 0.000000e+00 2
5-
DC D B 2 0.000000e+00 2
6-
DC D C 2 0.000000e+00 2
7-
DC E A 2 0.000000e+00 2
8-
DC E B 2 0.000000e+00 2
9-
DC E C 1 1.386294e+00 2
10-
DC E D 1 2.407946e+00 2
1+
DCv2 B A 0 1.126897e+00 2 2
2+
DCv2 C A 1 8.675006e-01 2 1
3+
DCv2 C B 1 8.675006e-01 2 1
4+
DCv2 D A 2 0.000000e+00 2 0
5+
DCv2 D B 2 0.000000e+00 2 0
6+
DCv2 D C 2 0.000000e+00 2 0
7+
DCv2 E A 2 0.000000e+00 2 0
8+
DCv2 E B 2 0.000000e+00 2 0
9+
DCv2 E C 1 1.386294e+00 2 1
10+
DCv2 E D 1 7.133499e-01 2 1

test/gtcheck.4.out

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
DC D C 2 0.000000e+00 2
2-
DC E C 1 1.386294e+00 2
3-
DC E D 1 2.407946e+00 2
1+
DCv2 D C 2 0.000000e+00 2 0
2+
DCv2 E C 1 1.386294e+00 2 1
3+
DCv2 E D 1 7.133499e-01 2 1

0 commit comments

Comments
 (0)