Skip to content

Poor compression & quality for difficult-to-compress data #189

@lindstro

Description

@lindstro

I am doing some compression studies that involve difficult-to-compress (even incompressible) data. Consider the chaotic data generated by the logistic map xi+1 = 4 xi (1 - xi):

#include <cstdio>

int main()
{
  double x = 1. / 3;
  for (int i = 0; i < 256 * 256 * 256; i++) {
    fwrite(&x, sizeof(x), 1, stdout);
    x = 4 * x * (1 - x);
  }
  return 0;
}

We wouldn't expect this data to compress at all, but the inherent randomness at least suggests a predictable relationship between L2 error, E, and rate, R. Let σ = 1/√8 denote the standard deviation of the input data and define the accuracy gain as

α = log₂(σ / E) - R.

Then each increment in storage, R, by one bit should result in a halving of E, so that α is essentially constant. The limit behavior is slightly different as R → 0 or E → 0, but over a large range α ought to be constant.

Below is a plot of α(R) for MGARD 1.2.0 and other compressors applied to the above data interpreted as a 3D array of size 256 × 256 × 256. Here I used a smoothness parameter of 0, which should result in an L2 optimal reconstruction: mgard compress --smoothness 0 --tolerance tolerance --datatype double --shape 256x256x256 --input input.bin --output output.mgard. The tolerance was halved for each subsequent data point, starting with tolerance = 1.

The plot suggests an odd relationship between R and α, where α is far from stable when R > 17. Is this perhaps a bug in MGARD? Similar behavior is observed for other difficult-to-compress data sets (see rballester/tthresh#7).

logistic

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions