Skip to content

Commit 9b1a135

Browse files
committed
add some documentation on the histogram representation
1 parent 01b03df commit 9b1a135

File tree

1 file changed

+24
-0
lines changed

1 file changed

+24
-0
lines changed

Histograms.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
Histograms
2+
==========
3+
4+
MAD primarily computes histograms. This document details the implementation of the histograms created
5+
by MAD.
6+
7+
Representation
8+
--------------
9+
Histograms are built by incrementing a counter in a bucket and stored as a sparse list of
10+
bucket -> count entries. The bucket is defined by the smallest number stored in that bucket, computed
11+
by using an IEEE Double with a mantissa truncated to n bits of precision. The default in MAD is to use
12+
7 bits of precision. Additionally, the min, max and sum of the samples are stored as IEEE Double values
13+
alongside the histogram.
14+
15+
Statistics
16+
----------
17+
18+
min - accurate (stored)
19+
max - accurate (stored)
20+
sum - accurate (stored)
21+
quantiles - estimated. accuracy depends on precision of the histogram and value stored. n bits of
22+
precision provides accuracy to within (1 / (2^n)) * value. ex: 7 bits of precision provides accuracy
23+
to within 1% of the computed value.
24+

0 commit comments

Comments
 (0)