-
Notifications
You must be signed in to change notification settings - Fork 154
Description
Apache Iceberg version
main (development)
Please describe the bug π
No usable warning on timestamp precision loss or invalid compression codec
Bug 1: No warning when nanosecond timestamps are downcast to microsecond
Summary
When downcastTimestamp is enabled and a nanosecond timestamp column is encountered, the code silently converts it to microsecond precision β potentially losing up to 999 nanoseconds per value β with no notification to the caller.
Code
// table/arrow_utils.go:335-341
case *arrow.TimestampType:
if dt.Unit == arrow.Nanosecond {
if !c.downcastTimestamp {
panic(fmt.Errorf("%w: 'ns' timestamp precision not supported", iceberg.ErrType))
}
// TODO: log something
}Problems
-
No warning at all. The TODO has not been implemented. There is zero notification that precision loss is occurring.
-
Silent data loss. Nanosecond to microsecond truncation can lose up to 999 nanoseconds per value. The caller has no programmatic signal that this happened.
-
Inconsistent error handling. Every other exceptional condition in
Primitive()usespanic(caught by a recover higher up and converted to an error). This path silently continues with no signal of any kind.
Bug 2: Unrecognized compression codec silently falls through to uncompressed
Summary
table/internal/parquet_files.go:231-232 has a bare // warn placeholder where actual error handling should exist. When a user configures an unrecognized compression codec string, the default branch silently falls through. The codec variable retains its zero value (uncompressed), so data files are written without compression with zero notification to the user.
Code
// table/internal/parquet_files.go:215-236
switch strings.ToLower(props.Codec) {
case "snappy":
codec = compress.Codecs.Snappy
case "zstd":
codec = compress.Codecs.Zstd
case "uncompressed":
codec = compress.Codecs.Uncompressed
case "gzip":
codec = compress.Codecs.Gzip
case "brotli":
codec = compress.Codecs.Brotli
case "lz4":
codec = compress.Codecs.Lz4
case "lz4raw":
codec = compress.Codecs.Lz4Raw
case "lzo":
codec = compress.Codecs.Lzo
default:
// warn
}
return append(writerProps, parquet.WithCompression(codec),
parquet.WithCompressionLevel(compressionLevel))Problems
-
No warning, no error, no log. The
// warncomment is a placeholder where someone intended to add a warning but never did. An unrecognized codec string (e.g., a typo like"zsdt") is completely ignored. -
Silent behavior change. The
codecvariable defaults tocompress.Codecs.Uncompressed. A user who configures"zsdt"expecting zstd compression gets uncompressed output β potentially orders of magnitude larger data files β with no indication.
Environment
- iceberg-go at current
mainbranch - Go 1.22+