Skip to content

Add WriteZeroOptionalFields option for Snowflake compatibility#3

Open
aratz-lasa wants to merge 10 commits intomainfrom
feature/write-zero-optional-fields
Open

Add WriteZeroOptionalFields option for Snowflake compatibility#3
aratz-lasa wants to merge 10 commits intomainfrom
feature/write-zero-optional-fields

Conversation

@aratz-lasa
Copy link

No description provided.

This commit adds a new WriterConfig option 'WriteZeroOptionalFields' that forces
the thrift encoder to write optional fields even when they have zero values.

This fixes compatibility issues with systems like Snowflake that require explicit
null counts in statistics (NullCount=0) even when columns have no null values.

Changes:
- Added WriteZeroOptionalFields feature flag to thrift protocol
- Made CompactProtocol configurable with SetFeatures method
- Updated thrift encoder to respect writeZeroOptionalFields flag
- Added WriteZeroOptionalFields config option to WriterConfig
- Wired the option through to ColumnWriter protocol initialization

The default behavior remains unchanged (false) for backward compatibility.

Usage:
  writer := parquet.NewWriter(output, schema,
      parquet.WriteZeroOptionalFields(true),
  )
@airlock-warpstreamlabs airlock-warpstreamlabs bot force-pushed the feature/write-zero-optional-fields branch from e15cfe9 to 780e670 Compare November 27, 2025 19:08
Tests cover:
- Default behavior (disabled) vs enabled
- Columns with and without null values
- Thrift encoding validation
- Metadata verification
- Backward compatibility between enabled/disabled modes
- Data integrity across both modes
The feature was only configuring the protocol for page headers but not
for the file footer where Statistics are actually written. Added:

- Store writeZeroOptionalFields flag in writer struct
- Configure protocol features in writeFileFooter()
- Add tests that verify the feature works by:
  1. Inspecting raw parquet file from user
  2. Comparing file sizes between enabled/disabled
  3. Comparing footer sizes

Tests confirm files with WriteZeroOptionalFields(true) are larger,
proving zero-valued optional fields like NullCount=0 are being written.

This fixes Snowflake compatibility for columns with no nulls.
The previous fix only configured the protocol for ColumnIndex/OffsetIndex
encoding, but the FileMetaData footer was still being encoded with a fresh
unconfigured protocol at line 1134.

This meant Statistics in the footer still had zero-valued fields omitted.

Changed line 1134 from:
  thrift.Marshal(new(thrift.CompactProtocol), w.fileMetaData)
To:
  thrift.Marshal(protocol, w.fileMetaData)

This uses the already-configured protocol that has WriteZeroOptionalFields
set if enabled.

Test results now show:
- Before: +45 bytes (only ColumnIndex/OffsetIndex affected)
- After: +105 bytes (FileMetaData footer also affected)

Also updated test file name to match current parquet file.
Previous tests used size comparison as a proxy which could miss issues.
This test validates the feature works by directly comparing file sizes
between files with WriteZeroOptionalFields enabled vs disabled.

When enabled, files are 105 bytes larger (63 bytes in footer) proving
that zero-valued optional fields like NullCount=0 are being written.
This is a simplified approach that ensures NullCount=0 and other zero-valued
optional fields are always written to Parquet file statistics, making files
compatible with strict consumers like Snowflake.

Key changes:
- Modified thrift encoder to always write zero-valued optional fields for
  regular structs (unions still skip zero fields to maintain their invariant)
- Removed all WriterConfig complexity and feature flags
- Updated test expectations to match new behavior

This is essentially a one-line change in encode.go that fixes the core issue.
Reverted CompactProtocol changes - no need for features field or SetFeatures.
The fix works with the default protocol as-is.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant