Skip to content

parquet-cli cannot display data containing int96 type #3345

@qian0817

Description

@qian0817

Describe the bug, including details regarding any error messages, version, and platform.

parquet-cli cannot display the contents of a Parquet file containing int96 types.

parquet cat part-00000-e4f30ffc-fdfa-465d-8205-23562b26616f-c000.zstd.parquet
Argument error: INT96 is deprecated. As interim enable READ_INT96_AS_FIXED flag to read as byte array.

After adding the parameter parquet.avro.readInt96AsFixed=true, reading will still fail due to schema mismatch.

parquet -Dparquet.avro.readInt96AsFixed=true cat ~/Downloads/part-00000-e4f30ffc-fdfa-465d-8205-23562b26616f-c000.zstd.parquet
Unknown error
java.lang.RuntimeException: Failed on record 0 in file /Users/admin/Downloads/part-00000-e4f30ffc-fdfa-465d-8205-23562b26616f-c000.zstd.parquet
	at org.apache.parquet.cli.commands.CatCommand.runWithAvroSchema(CatCommand.java:101)
	at org.apache.parquet.cli.commands.CatCommand.run(CatCommand.java:67)
	at org.apache.parquet.cli.Main.run(Main.java:217)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.parquet.cli.Main.main(Main.java:245)
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/Users/admin/Downloads/part-00000-e4f30ffc-fdfa-465d-8205-23562b26616f-c000.zstd.parquet
	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:280)
	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:140)
	at org.apache.parquet.cli.BaseCommand$2$1.advance(BaseCommand.java:505)
	at org.apache.parquet.cli.BaseCommand$2$1.<init>(BaseCommand.java:486)
	at org.apache.parquet.cli.BaseCommand$2.iterator(BaseCommand.java:484)
	at org.apache.parquet.cli.commands.CatCommand.runWithAvroSchema(CatCommand.java:88)
	... 4 more
Caused by: org.apache.parquet.io.ParquetDecodingException: The requested schema is not compatible with the file schema. incompatible types: required fixed_len_byte_array(12) timestamp_col != required int96 timestamp_col
	at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:104)
	at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:95)
	at org.apache.parquet.schema.PrimitiveType.accept(PrimitiveType.java:693)
	at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:83)
	at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57)
	at org.apache.parquet.schema.MessageType.accept(MessageType.java:52)
	at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:167)
	at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:155)
	at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:245)
	... 10 more

Component(s)

CLI

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions