-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Dear All,
Take https://github.com/KirillKryukov/naf/, a pretty straightforward use of zstandard to compress nucleotide data. My test file is GCA_004837865.1.fna at around 500M, you can find it in a couple of places across the internet. The issue is only noticeable at high compression levels. I timed and ran ennaf/ennaf --temp-dir . -19 GCA_004837865.1.fna overnight, checking out every version between v1.5.0 and v1.5.7 and rebuilding the code from scratch - this is the only thing that I changed.
I observed this:
v1.5.0: 85.98s average, 97141591 bytes.
v1.5.1: 85.50s average, 97204512 bytes.
v1.5.2: 86.49s average, 97204512 bytes.
v1.5.4: 87.76s average, 97204987 bytes.
v1.5.5: 87.20s average, 97204987 bytes.
v1.5.6: 88.63s average, 97941622 bytes.
v1.5.7: 88.73s average, 98103951 bytes.
Given the original size (after 4-bit coding) of 249586605 bytes, we see the steady regression from 3.113 bits per byte to 3.144 bits per byte, as well as a statistically significant ~3% slow-down.
I have not identified the sources of these regressions, as there are multiple.