Skip to content

Encoded Histogram format

Lee Campbell edited this page May 10, 2016 · 5 revisions

The Encoded Histogram format is a binary format of a histogram header and its count values. The format is used as part of the V2-Log-Encoding to allow histograms to be persisted or shared.

File format

[cookie][lengthOfCompressedContents][CompressedHeader]

CompressedHeader format

DEFALTE(
	[Cookie][payloadLengthInBytes][normalizingIndexOffset][numberOfSignificantValueDigits][lowestTrackableUnitValue][highestTrackableValue][integerToDoubleValueConversionRatio][CompressedContents]		
)

The Zlib compressed payload of an HdrHistogram. The uncompressed payload is stored in binary in the following format

01234567890123456789012345678901
[A ][B ][C ][D ][E     ][F     ][G....>
  • A. Cookie value. 32bit integer (4bytes) that stores a code for the type of Histogram.
  • B. Payload length as a 32bit integer (4bytes).
  • C. NomalizingIndex offset as an 32bit integer (4bytes). Currently not implemented in HdrHistogram.NET
  • D. Lowest Trackable Unit value as a 64bit integer (8bytes).
  • E. Highest Trackable Unit value as a 64bit integer (8bytes).
  • F. Integer to double conversion ratio as a double floating point number (8bytes)
  • G. Compressed contents

CompressedContents

ZigZaggedEncoding(
	[count]*
)

A ZigZag encoded (LEB128-64b9B-variant format) compression of the counts for an HdrHistogram. The LEB128-64b9B-variant format is

  • Little Endian Base128 Encoding
  • 64bit values store as a maximum of 9Bytes. The standard LEB128 can use 10Bytes.
  • Only supports values up to 2^63 (not 2^64) i.e. HdrHistogram only support positive signed long values for counts.
  • Zig Zagged so that common/small positive and negative values use the least amount of space. As counts are only allowed to be 0 or a positive 64bit number, we can use the negative range of numbers to represent consecutive zero values. As it can be very common for there to be numerous consecutive buckets with counts of zero, this can provide significant space saving.

Consecutive 0 counts will be compressed into a single negative number indicating the span of the zero count sequence. Positive counts will be represented as the actual number. Counts are recorded as 64bit integers.

LEB128's variable length encoding provides for using a smaller number of bytes for smaller values, and the use of ZigZag encoding allows small(closer to zero) negative values to use fewer bytes.

The LEB128-64b9B-variant encoding used here diverges from the "original" LEB128 as it extends to 64 bit values. In the original LEB128, a 64 bit value can take up to 10 bytes in the stream, where this variant's encoding of a 64 bit values will max out at 9 bytes. As such, this encoder/decoder should NOT be used for encoding or decoding "standard" LEB128 formats (e.g.Google Protocol Buffers).

See also

  • ZLIB RCF-1950 compression - [https://en.wikipedia.org/wiki/Zlib], [https://www.ietf.org/rfc/rfc1950.txt]
  • Base64 Encoding - [https://en.wikipedia.org/wiki/Base64], [https://www.base64encode.org/]
  • ZigZag LEB128 compression - [https://en.wikipedia.org/wiki/LEB128], [https://gist.github.com/mfuerstenau/ba870a29e16536fdbaba]

Clone this wiki locally