Variant Binary Encoding

This directory contains binary artifacts encoded using the Parquet Variant binary encoding. These files are not valid Parquet files, but rather raw binary data.

Structure

data_dictionary.json - contains the JSON representation for each example

Each example consists of 2 files:

.metadata -- the binary contents of the metadata field
.value -- the binary contents of the value field

Descriptions

primitive_<type> -- Examples primitive (basic_type = 1), one for each of the primitive types listed in the spec
short_string -- Example of short string (basic_type = 2)
object_empty -- Example of object (basic_type = 3) with no fields
object_primitive -- Example of object with only primitive fields
object_nested -- Example of object with other objects in fields
array_empty -- Example of array (basic_type = 4) with no elements
array_primitive -- Example of array with only primitive elements
array_nested -- Example of an with objects and other arrays in the elements

Regenerating these files

The files in this directory were initially generated by running the regen.py script which used Apache Spark to generate the files. The files have been subsequently modified when necessary to ensure that they conform to the Parquet spec.

Modification 1: Created metadata and value for `primitive_null` as a single byte (`0x01`)

Per #81, Spark did not generate any metadata for null and left primitive_null.metadata empty. The metadata for primitive_null should be the same 3 bytes as other primitive types

header = 0x01
dictionary_size = 0x00
dictionary_size + 1 = 1 byte values: 0x00

cp primitive_int8.metadata primitive_null.metadata

The value for a primitive should be a value_header and no value_data, resulting in a single 0 byte:

echo -n 'a' | tr a '\0' > primitive_null.value

Modification 2: Created `TimeNTZ/Timestamp with timezone nanos/Timestamp without timezone nanos/UUID` with Iceberg test code

Currently, Spark does not support Variant values containing UUID, Time, or nanosecond-precision Timestamp. the primitive_time.[metadata/value], primitive_timestamp_nanos.[metadata/value], primitive_timestampntz_nanos.[metadata/value] and primitive_uuid.[metadata/data] was generated by Iceberg test code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variant Binary Encoding

Structure

Descriptions

Regenerating these files

Modification 1: Created metadata and value for `primitive_null` as a single byte (`0x01`)

Modification 2: Created `TimeNTZ/Timestamp with timezone nanos/Timestamp without timezone nanos/UUID` with Iceberg test code

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Variant Binary Encoding

Structure

Descriptions

Regenerating these files

Modification 1: Created metadata and value for primitive_null as a single byte (0x01)

Modification 2: Created TimeNTZ/Timestamp with timezone nanos/Timestamp without timezone nanos/UUID with Iceberg test code

Modification 1: Created metadata and value for `primitive_null` as a single byte (`0x01`)

Modification 2: Created `TimeNTZ/Timestamp with timezone nanos/Timestamp without timezone nanos/UUID` with Iceberg test code