Skip to content

Commit 911cb8e

Browse files
committed
Release 1.13
2 parents bd133ac + 09255e6 commit 911cb8e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+2155
-428
lines changed

INSTALL

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -49,15 +49,15 @@ Storage is enabled.
4949

5050
Amazon S3 support requires an HMAC function to calculate a message
5151
authentication code. On MacOS, the CCHmac function from the standard
52-
library is used. Systems that do not have CChmac will get this from
52+
library is used. Systems that do not have CCHmac will get this from
5353
libcrypto. libcrypto is part of OpenSSL or one of its derivatives (LibreSSL
5454
or BoringSSL).
5555

5656
On Microsoft Windows we recommend use of Mingw64/Msys2. Note that
5757
currently for the test harness to work you will need to override the
5858
test temporary directory with e.g.: make check TEST_OPTS="-t C:/msys64/tmp/_"
5959
Whilst the code may work on Windows with other environments, these have
60-
not be verified.
60+
not been verified.
6161

6262
Update htscodecs submodule
6363
==========================
@@ -103,7 +103,7 @@ configure and just type 'make; make install' as for previous versions
103103
of HTSlib. However if the build fails you should run './configure' as
104104
it can diagnose the common reasons for build failures.
105105

106-
The 'make' command builds the HTSlib library and and various useful
106+
The 'make' command builds the HTSlib library and various useful
107107
utilities: bgzip, htsfile, and tabix. If compilation fails you should
108108
run './configure' as it can diagnose problems with your build environment
109109
that cause build failures.
@@ -150,7 +150,10 @@ various features and specify further optional external requirements:
150150

151151
--enable-libcurl
152152
Use libcurl (<http://curl.se/>) to implement network access to
153-
remote files via FTP, HTTP, HTTPS, etc.
153+
remote files via FTP, HTTP, HTTPS, etc. By default or with
154+
--enable-libcurl=check, configure will probe for libcurl and include
155+
this functionality if libcurl is available. Use --disable-libcurl
156+
to prevent this.
154157

155158
--enable-gcs
156159
Implement network access to Google Cloud Storage. By default or with
@@ -176,6 +179,12 @@ various features and specify further optional external requirements:
176179
By default, ./configure will probe for libdeflate and use it if
177180
available. To prevent this, use --without-libdeflate.
178181

182+
Each --enable-FEATURE/--disable-FEATURE/--with-PACKAGE/--without-PACKAGE
183+
option listed also has an opposite, e.g., --without-external-htscodecs
184+
or --disable-plugins. However, apart from those options for which the
185+
default is to probe for related facilities, using these opposite options
186+
is mostly unnecessary as they just select the default configure behaviour.
187+
179188
The configure script also accepts the usual options and environment variables
180189
for tuning installation locations and compilers: type './configure --help'
181190
for details. For example,

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ according to the terms of the following MIT/Expat license.]
33

44
The MIT/Expat License
55

6-
Copyright (C) 2012-2020 Genome Research Ltd.
6+
Copyright (C) 2012-2021 Genome Research Ltd.
77

88
Permission is hereby granted, free of charge, to any person obtaining a copy
99
of this software and associated documentation files (the "Software"), to deal

Makefile

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -113,26 +113,28 @@ htscodecs.mk:
113113
echo '# Default htscodecs.mk generated by Makefile' > $@
114114
echo 'include $$(HTSPREFIX)htscodecs_bundled.mk' >> $@
115115

116+
srcdir = .
117+
srcprefix =
116118
HTSPREFIX =
117119
include htslib_vars.mk
118120
include htscodecs.mk
119121

120122
# If not using GNU make, you need to copy the version number from version.sh
121123
# into here.
122-
PACKAGE_VERSION := $(shell ./version.sh)
124+
PACKAGE_VERSION := $(shell $(srcdir)/version.sh)
123125

124126
LIBHTS_SOVERSION = 3
125127

126128
# Version numbers for the Mac dynamic library. Note that the leading 3
127129
# is not strictly necessary and should be removed the next time
128130
# LIBHTS_SOVERSION is bumped (see #1144 and
129131
# https://developer.apple.com/library/archive/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/DynamicLibraryDesignGuidelines.html#//apple_ref/doc/uid/TP40002013-SW23)
130-
MACH_O_COMPATIBILITY_VERSION = 3.1.12
131-
MACH_O_CURRENT_VERSION = 3.1.12
132+
MACH_O_COMPATIBILITY_VERSION = 3.1.13
133+
MACH_O_CURRENT_VERSION = 3.1.13
132134

133135
# $(NUMERIC_VERSION) is for items that must have a numeric X.Y.Z string
134136
# even if this is a dirty or untagged Git working tree.
135-
NUMERIC_VERSION := $(shell ./version.sh numeric)
137+
NUMERIC_VERSION := $(shell $(srcdir)/version.sh numeric)
136138

137139
# Force version.h to be remade if $(PACKAGE_VERSION) has changed.
138140
version.h: $(if $(wildcard version.h),$(if $(findstring "$(PACKAGE_VERSION)",$(shell cat version.h)),,force))
@@ -254,7 +256,7 @@ config.h:
254256
# on htslib.pc.in listed, as if that file is newer the usual way to regenerate
255257
# this target is via configure or config.status rather than this rule.
256258
htslib.pc.tmp:
257-
sed -e '/^static_libs=/s/@static_LIBS@/$(htslib_default_libs)/;s#@[^-][^@]*@##g' htslib.pc.in > $@
259+
sed -e '/^static_libs=/s/@static_LIBS@/$(htslib_default_libs)/;s#@[^-][^@]*@##g' $(srcprefix)htslib.pc.in > $@
258260

259261
# Create a makefile fragment listing the libraries and LDFLAGS needed for
260262
# static linking. This can be included by projects that want to build
@@ -449,16 +451,15 @@ htscodecs/htscodecs:
449451

450452
# Build the htscodecs/htscodecs/version.h file if necessary
451453
htscodecs/htscodecs/version.h: force
452-
@if test -e htscodecs/.git && test -e htscodecs/configure.ac ; then \
453-
cd htscodecs && \
454-
vers=`git describe --always --dirty --match 'v[0-9]\.[0-9]*'` && \
454+
@if test -e $(srcdir)/htscodecs/.git && test -e $(srcdir)/htscodecs/configure.ac ; then \
455+
vers=`cd $(srcdir)/htscodecs && git describe --always --dirty --match 'v[0-9]\.[0-9]*'` && \
455456
case "$$vers" in \
456457
v*) vers=$${vers#v} ;; \
457-
*) iv=`awk '/^AC_INIT/ { match($$0, /^AC_INIT\(htscodecs, *([0-9](\.[0-9])*)\)/, m); print substr($$0, m[1, "start"], m[1, "length"]) }' configure.ac` ; vers="$$iv$${vers:+-g$$vers}" ;; \
458+
*) iv=`awk '/^AC_INIT/ { match($$0, /^AC_INIT\(htscodecs, *([0-9](\.[0-9])*)\)/, m); print substr($$0, m[1, "start"], m[1, "length"]) }' $(srcdir)/htscodecs/configure.ac` ; vers="$$iv$${vers:+-g$$vers}" ;; \
458459
esac ; \
459-
if ! grep -s -q '"'"$$vers"'"' htscodecs/version.h ; then \
460+
if ! grep -s -q '"'"$$vers"'"' $@ ; then \
460461
echo 'Updating $@ : #define HTSCODECS_VERSION_TEXT "'"$$vers"'"' ; \
461-
echo '#define HTSCODECS_VERSION_TEXT "'"$$vers"'"' > htscodecs/version.h ; \
462+
echo '#define HTSCODECS_VERSION_TEXT "'"$$vers"'"' > $@ ; \
462463
fi ; \
463464
fi
464465
endif
@@ -470,6 +471,11 @@ maintainer-check:
470471
test/maintainer/check_copyright.pl .
471472
test/maintainer/check_spaces.pl .
472473

474+
# Create a shorthand. We use $(SRC) or $(srcprefix) rather than $(srcdir)/
475+
# for brevity in test and install rules, and so that build logs do not have
476+
# ./ sprinkled throughout.
477+
SRC = $(srcprefix)
478+
473479
# For tests that might use it, set $REF_PATH explicitly to use only reference
474480
# areas within the test suite (or set it to ':' to use no reference areas).
475481
#
@@ -490,6 +496,7 @@ check test: $(BUILT_PROGRAMS) $(BUILT_TEST_PROGRAMS) $(BUILT_PLUGINS) $(HTSCODEC
490496
cd test/sam_filter && ./filter.sh filter.tst
491497
cd test/tabix && ./test-tabix.sh tabix.tst
492498
cd test/mpileup && ./test-pileup.sh mpileup.tst
499+
cd test/fastq && ./test-fastq.sh
493500
REF_PATH=: test/sam test/ce.fa test/faidx.fa test/fastqs.fq
494501
test/test-regidx
495502
cd test && REF_PATH=: ./test.pl $${TEST_OPTS:-}
@@ -686,11 +693,11 @@ shlib-exports-dll.txt: hts.dll.a
686693
install: libhts.a $(BUILT_PROGRAMS) $(BUILT_PLUGINS) installdirs install-$(SHLIB_FLAVOUR) install-pkgconfig
687694
$(INSTALL_PROGRAM) $(BUILT_PROGRAMS) $(DESTDIR)$(bindir)
688695
if test -n "$(BUILT_PLUGINS)"; then $(INSTALL_PROGRAM) $(BUILT_PLUGINS) $(DESTDIR)$(plugindir); fi
689-
$(INSTALL_DATA) htslib/*.h $(DESTDIR)$(includedir)/htslib
696+
$(INSTALL_DATA) $(SRC)htslib/*.h $(DESTDIR)$(includedir)/htslib
690697
$(INSTALL_DATA) libhts.a $(DESTDIR)$(libdir)/libhts.a
691-
$(INSTALL_MAN) bgzip.1 htsfile.1 tabix.1 $(DESTDIR)$(man1dir)
692-
$(INSTALL_MAN) faidx.5 sam.5 vcf.5 $(DESTDIR)$(man5dir)
693-
$(INSTALL_MAN) htslib-s3-plugin.7 $(DESTDIR)$(man7dir)
698+
$(INSTALL_MAN) $(SRC)bgzip.1 $(SRC)htsfile.1 $(SRC)tabix.1 $(DESTDIR)$(man1dir)
699+
$(INSTALL_MAN) $(SRC)faidx.5 $(SRC)sam.5 $(SRC)vcf.5 $(DESTDIR)$(man5dir)
700+
$(INSTALL_MAN) $(SRC)htslib-s3-plugin.7 $(DESTDIR)$(man7dir)
694701

695702
installdirs:
696703
$(INSTALL_DIR) $(DESTDIR)$(bindir) $(DESTDIR)$(includedir) $(DESTDIR)$(includedir)/htslib $(DESTDIR)$(libdir) $(DESTDIR)$(man1dir) $(DESTDIR)$(man5dir) $(DESTDIR)$(man7dir) $(DESTDIR)$(pkgconfigdir)

NEWS

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,129 @@
1+
Noteworthy changes in release 1.13 (7th July 2021)
2+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3+
4+
Features and Updates
5+
--------------------
6+
7+
* In case a PG header line has multiple ID tags supplied by other applications,
8+
the header API now selects the first one encountered as the identifying tag
9+
and issues a warning when detecting subsequent ID tags.
10+
(#1256; fixed samtools/samtools#1393)
11+
12+
* VCF header reading function (vcf_hdr_read) no longer tries to download a
13+
remote index file by default.
14+
(#1266; fixes #380)
15+
16+
* Support reading and writing FASTQ format in the same way as SAM, BAM or CRAM.
17+
Records read from a FASTQ file will be treated as unmapped data.
18+
(#1156)
19+
20+
* Added GCP requester pays bucket access. Thanks to @indraniel.
21+
(#1255)
22+
23+
* Made mpileup's overlap removal choose which copy to remove at random instead
24+
of always removing the second one. This avoids strand bias in experiments
25+
where the +ve and -ve strand reads always appear in the same order.
26+
(#1273; fixes samtools/bcftools#1459)
27+
28+
* It is now possible to use platform specific BAQ parameters. This also
29+
selects long-read parameters for read lengths bigger than 1kb, which helps
30+
bcftools mpileup call SNPs on PacBio CCS reads.
31+
(#1275)
32+
33+
* Improved bcf_remove_allele_set. This fixes a bug that stopped iteration over
34+
alleles prematurely, marks removed alleles as 'missing' and does automatic
35+
lazy unpacking.
36+
(#1288; fixes #1259)
37+
38+
* Improved compression metrics for unsorted CRAM files. This improves the
39+
choice of codecs when handling unsorted data.
40+
(#1291)
41+
42+
* Linear index entries for empty intervals are now initialised with the file
43+
offset in the next non-empty interval instead of the previous one. This
44+
may reduce the amount of data iterators have to discard before reaching
45+
the desired region, when the starting location is in a sequence gap.
46+
Thanks to @carsonh for reporting the issue.
47+
(#1286; fixes #486)
48+
49+
* A new hts_bin_level API function has been added, to compute the level of a
50+
given bin in the binning index.
51+
(#1286)
52+
53+
* Related to the above, a new API method, hts_idx_nseq, now returns the total
54+
number of contigs from an index.
55+
(#1295 and #1299)
56+
57+
* Added bracket handling to bcf_hdr_parse_line, for use with ##META lines.
58+
Thanks to Alberto Casas Ortiz.
59+
(#1240)
60+
61+
Build changes
62+
-------------
63+
64+
These are compiler, configuration and makefile based changes.
65+
66+
* HTSlib now uses libhtscodecs release 1.1.1.
67+
68+
* Added a curl/curl.h check to configure and improved INSTALL documentation on
69+
build options. Thanks to Melanie Kirsche and John Marshall.
70+
(#1265; fixes #1261)
71+
72+
* Some fixes to address GCC 11.1 warnings.
73+
(#1280, #1284, #1285; fixes #1283)
74+
75+
* Supports building HTSlib in a separate directory. Thanks to John Marshall.
76+
(#1277; fixes #231)
77+
78+
* Supports building HTSlib on MinGW 32-bit environments. Thanks to
79+
John Marshall.
80+
(#1301)
81+
82+
Bug fixes
83+
---------
84+
85+
* Fixed hts_itr_query() et al region queries: fixed bug introduced in
86+
HTSlib 1.12, which led to iterators producing very few reads for some
87+
queries (especially for larger target regions) when unmapped reads were
88+
present. HTSlib 1.11 had a related problem in which iterators would omit
89+
a few unmapped reads that should have been produced; cf #1142.
90+
Thanks to Daniel Cooke for reporting the issue.
91+
(#1281; fixes #1279)
92+
93+
* Removed compressBound assertions on opening bgzf files. Thanks to
94+
Gurt Hulselmans for reporting the issue.
95+
(#1258; fixed #1257)
96+
97+
* Duplicate sample name error message for a VCF file now only displays the
98+
duplicated name rather the entire same name list.
99+
(#1262; fixes samtools/bcftools#1451)
100+
101+
* Fix to make samtools cat work on CRAMs again.
102+
(#1276; fixes samtools/samtools#1420)
103+
104+
* Fix for a double memory free in SAM header creation. Thanks to @ihsineme.
105+
(#1274)
106+
107+
* Prevent assert in bcf_sr_set_regions. Thanks to Dr K D Murray.
108+
(#1270)
109+
110+
* Fixed crash in knet_open() etc stubs. Thanks to John Marshall.
111+
(#1289)
112+
113+
* Fixed filter expression "cigar" on unmapped reads. Stop treating an empty
114+
CIGAR string as an error. Thanks to Chang Y for reporting the issue.
115+
(#1298, fixes samtools/samtools#1445)
116+
117+
* Bug fixes in the bundled copy of htscodecs:
118+
119+
- Fixed an uninitialized access in the name tokeniser decoder.
120+
(samtools/htscodecs#23)
121+
122+
- Fixed a bug with name tokeniser and variable number of names per slice,
123+
causing it to incorrectly report an error on certain valid inputs.
124+
(samtools/htscodecs#24)
125+
126+
1127
Noteworthy changes in release 1.12 (17th March 2021)
2128
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3129

README

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,25 @@ formats, such as SAM, CRAM, VCF, and BCF, used for high-throughput sequencing
33
data. It is the core library used by samtools and bcftools.
44

55
See INSTALL for building and installation instructions.
6+
7+
Please cite this paper when using HTSlib for your publications:
8+
9+
HTSlib: C library for reading/writing high-throughput sequencing data
10+
James K Bonfield, John Marshall, Petr Danecek, Heng Li, Valeriu Ohan, Andrew Whitwham, Thomas Keane, Robert M Davies
11+
GigaScience, Volume 10, Issue 2, February 2021, giab007, https://doi.org/10.1093/gigascience/giab007
12+
13+
@article{10.1093/gigascience/giab007,
14+
author = {Bonfield, James K and Marshall, John and Danecek, Petr and Li, Heng and Ohan, Valeriu and Whitwham, Andrew and Keane, Thomas and Davies, Robert M},
15+
title = "{HTSlib: C library for reading/writing high-throughput sequencing data}",
16+
journal = {GigaScience},
17+
volume = {10},
18+
number = {2},
19+
year = {2021},
20+
month = {02},
21+
abstract = "{Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files. To facilitate this a library was produced out of the original SAMtools implementation, with a focus on performance and robustness. The file formats themselves have become international standards under the jurisdiction of the Global Alliance for Genomics and Health.We present a software library for providing programmatic access to sequencing alignment and variant formats. It was born out of the widely used SAMtools and BCFtools applications. Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use of threading.Since the original Samtools release, performance has been considerably improved, with a BAM read-write loop running 5 times faster and BAM to SAM conversion 13 times faster (both using 16 threads, compared to Samtools 0.1.19). Widespread adoption has seen HTSlib downloaded \\&gt;1 million times from GitHub and conda. The C library has been used directly by an estimated 900 GitHub projects and has been incorporated into Perl, Python, Rust, and R, significantly expanding the number of uses via other languages. HTSlib is open source and is freely available from htslib.org under MIT/BSD license.}",
22+
issn = {2047-217X},
23+
doi = {10.1093/gigascience/giab007},
24+
url = {https://doi.org/10.1093/gigascience/giab007},
25+
note = {giab007},
26+
eprint = {https://academic.oup.com/gigascience/article-pdf/10/2/giab007/36332285/giab007.pdf},
27+
}

README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,29 @@ make install
3535
```
3636

3737
[download]: http://www.htslib.org/download/
38+
39+
### Citing
40+
41+
Please cite this paper when using HTSlib for your publications.
42+
43+
> HTSlib: C library for reading/writing high-throughput sequencing data </br>
44+
> James K Bonfield, John Marshall, Petr Danecek, Heng Li, Valeriu Ohan, Andrew Whitwham, Thomas Keane, Robert M Davies </br>
45+
> _GigaScience_, Volume 10, Issue 2, February 2021, giab007, https://doi.org/10.1093/gigascience/giab007
46+
47+
```
48+
@article{10.1093/gigascience/giab007,
49+
author = {Bonfield, James K and Marshall, John and Danecek, Petr and Li, Heng and Ohan, Valeriu and Whitwham, Andrew and Keane, Thomas and Davies, Robert M},
50+
title = "{HTSlib: C library for reading/writing high-throughput sequencing data}",
51+
journal = {GigaScience},
52+
volume = {10},
53+
number = {2},
54+
year = {2021},
55+
month = {02},
56+
abstract = "{Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files. To facilitate this a library was produced out of the original SAMtools implementation, with a focus on performance and robustness. The file formats themselves have become international standards under the jurisdiction of the Global Alliance for Genomics and Health.We present a software library for providing programmatic access to sequencing alignment and variant formats. It was born out of the widely used SAMtools and BCFtools applications. Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use of threading.Since the original Samtools release, performance has been considerably improved, with a BAM read-write loop running 5 times faster and BAM to SAM conversion 13 times faster (both using 16 threads, compared to Samtools 0.1.19). Widespread adoption has seen HTSlib downloaded \\&gt;1 million times from GitHub and conda. The C library has been used directly by an estimated 900 GitHub projects and has been incorporated into Perl, Python, Rust, and R, significantly expanding the number of uses via other languages. HTSlib is open source and is freely available from htslib.org under MIT/BSD license.}",
57+
issn = {2047-217X},
58+
doi = {10.1093/gigascience/giab007},
59+
url = {https://doi.org/10.1093/gigascience/giab007},
60+
note = {giab007},
61+
eprint = {https://academic.oup.com/gigascience/article-pdf/10/2/giab007/36332285/giab007.pdf},
62+
}
63+
```

0 commit comments

Comments
 (0)