Skip to content

Corrupt Parquet Schema when loading parquet file into Databricks #174

@characat0

Description

@characat0

While loading a file produced using Parquet.jl into Databricks I ran into the following error:

Corrupt Parquet Schema: Only one of num_children and type should be set in SchemaElement

According to https://github.com/apache/parquet-format/blob/4701809cb65373b4404b46b6f01110d020f4d1c8/src/main/thrift/parquet.thrift#L437

  /** Nested fields.  Since thrift does not support nested fields,
   * the nesting is flattened to a single list by a depth-first traversal.
   * The children count is used to construct the nested relationship.
   * This field is not set when the element is a primitive type
   */
  5: optional i32 num_children;

the field num_children should not be set if the element is a leaf node.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions