-
Notifications
You must be signed in to change notification settings - Fork 29
Description
I am doing some compression studies that involve difficult-to-compress (even incompressible) data. Consider the chaotic data generated by the logistic map xi+1 = 4 xi (1 - xi):
#include <cstdio>
int main()
{
double x = 1. / 3;
for (int i = 0; i < 256 * 256 * 256; i++) {
fwrite(&x, sizeof(x), 1, stdout);
x = 4 * x * (1 - x);
}
return 0;
}
We wouldn't expect this data to compress at all, but the inherent randomness at least suggests a predictable relationship between L2 error, E, and rate, R. Let σ = 1/√8 denote the standard deviation of the input data and define the accuracy gain as
α = log₂(σ / E) - R.
Then each increment in storage, R, by one bit should result in a halving of E, so that α is essentially constant. The limit behavior is slightly different as R → 0 or E → 0, but over a large range α ought to be constant.
Below is a plot of α(R) for MGARD 1.2.0 and other compressors applied to the above data interpreted as a 3D array of size 256 × 256 × 256. Here I used a smoothness parameter of 0, which should result in an L2 optimal reconstruction: mgard compress --smoothness 0 --tolerance tolerance --datatype double --shape 256x256x256 --input input.bin --output output.mgard. The tolerance was halved for each subsequent data point, starting with tolerance = 1.
The plot suggests an odd relationship between R and α, where α is far from stable when R > 17. Is this perhaps a bug in MGARD? Similar behavior is observed for other difficult-to-compress data sets (see rballester/tthresh#7).
