Skip to content

Multiple regressions (v1.5.6 -> v1.5.7 most significant)Β #4548

@iczelia

Description

@iczelia

Dear All,

Take https://github.com/KirillKryukov/naf/, a pretty straightforward use of zstandard to compress nucleotide data. My test file is GCA_004837865.1.fna at around 500M, you can find it in a couple of places across the internet. The issue is only noticeable at high compression levels. I timed and ran ennaf/ennaf --temp-dir . -19 GCA_004837865.1.fna overnight, checking out every version between v1.5.0 and v1.5.7 and rebuilding the code from scratch - this is the only thing that I changed.

I observed this:

v1.5.0: 85.98s average, 97141591 bytes.
v1.5.1: 85.50s average, 97204512 bytes.
v1.5.2: 86.49s average, 97204512 bytes.
v1.5.4: 87.76s average, 97204987 bytes.
v1.5.5: 87.20s average, 97204987 bytes.
v1.5.6: 88.63s average, 97941622 bytes.
v1.5.7: 88.73s average, 98103951 bytes.

Given the original size (after 4-bit coding) of 249586605 bytes, we see the steady regression from 3.113 bits per byte to 3.144 bits per byte, as well as a statistically significant ~3% slow-down.

I have not identified the sources of these regressions, as there are multiple.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions