Skip to content

Missing logging for silent data-affecting behaviorΒ #797

@akarpovspl

Description

@akarpovspl

Apache Iceberg version

main (development)

Please describe the bug 🐞

No usable warning on timestamp precision loss or invalid compression codec

Bug 1: No warning when nanosecond timestamps are downcast to microsecond

Summary

When downcastTimestamp is enabled and a nanosecond timestamp column is encountered, the code silently converts it to microsecond precision β€” potentially losing up to 999 nanoseconds per value β€” with no notification to the caller.

Code

// table/arrow_utils.go:335-341

case *arrow.TimestampType:
    if dt.Unit == arrow.Nanosecond {
        if !c.downcastTimestamp {
            panic(fmt.Errorf("%w: 'ns' timestamp precision not supported", iceberg.ErrType))
        }
        // TODO: log something
    }

Problems

  1. No warning at all. The TODO has not been implemented. There is zero notification that precision loss is occurring.

  2. Silent data loss. Nanosecond to microsecond truncation can lose up to 999 nanoseconds per value. The caller has no programmatic signal that this happened.

  3. Inconsistent error handling. Every other exceptional condition in Primitive() uses panic (caught by a recover higher up and converted to an error). This path silently continues with no signal of any kind.


Bug 2: Unrecognized compression codec silently falls through to uncompressed

Summary

table/internal/parquet_files.go:231-232 has a bare // warn placeholder where actual error handling should exist. When a user configures an unrecognized compression codec string, the default branch silently falls through. The codec variable retains its zero value (uncompressed), so data files are written without compression with zero notification to the user.

Code

// table/internal/parquet_files.go:215-236

switch strings.ToLower(props.Codec) {
case "snappy":
    codec = compress.Codecs.Snappy
case "zstd":
    codec = compress.Codecs.Zstd
case "uncompressed":
    codec = compress.Codecs.Uncompressed
case "gzip":
    codec = compress.Codecs.Gzip
case "brotli":
    codec = compress.Codecs.Brotli
case "lz4":
    codec = compress.Codecs.Lz4
case "lz4raw":
    codec = compress.Codecs.Lz4Raw
case "lzo":
    codec = compress.Codecs.Lzo
default:
    // warn
}

return append(writerProps, parquet.WithCompression(codec),
    parquet.WithCompressionLevel(compressionLevel))

Problems

  1. No warning, no error, no log. The // warn comment is a placeholder where someone intended to add a warning but never did. An unrecognized codec string (e.g., a typo like "zsdt") is completely ignored.

  2. Silent behavior change. The codec variable defaults to compress.Codecs.Uncompressed. A user who configures "zsdt" expecting zstd compression gets uncompressed output β€” potentially orders of magnitude larger data files β€” with no indication.


Environment

  • iceberg-go at current main branch
  • Go 1.22+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions