Skip to content

Potential malformed output when using CARQUET_REPETITION_OPTIONAL #9

@jerome-fmad

Description

@jerome-fmad

Reproduction steps:

  1. Build the latest master at time of writing (a303659) with examples and tests.
  2. Run build/example_nullable /tmp/example_nullable.parquet.
  3. Install the DuckDB CLI (https://duckdb.org/install/?platform=linux&environment=cli)
  4. Run the DuckDB CLI and use SELECT * FROM '/tmp/example_nullable.parquet';

Expected output: an ASCII representation of the table contents.

Actual output:

D SELECT * FROM '/tmp/example_nullable.parquet';
Invalid Error:
TProtocolException: Invalid data

Another hand-made example:

#include <assert.h>

#include <carquet/carquet.h>

int main(int argc, char* argv[]) {
        carquet_error_t err = CARQUET_ERROR_INIT;
        carquet_schema_t* schema = carquet_schema_create(&err);
        carquet_schema_add_column(schema, "nullable", CARQUET_PHYSICAL_INT32, NULL, CARQUET_REPETITION_OPTIONAL, 0, 0);
        int32_t col_idx = carquet_schema_find_column(schema, "nullable");

        carquet_writer_options_t opts;
        carquet_writer_options_init(&opts);

        carquet_writer_t* writer = carquet_writer_create("/tmp/repro.parquet", schema, &opts, &err);
        int32_t value = 1337;
        carquet_writer_write_batch(writer, col_idx, (const void*)&value, 1, NULL, NULL);

        carquet_status_t status = carquet_writer_close(writer);
        assert(status == CARQUET_OK);

        return 0;
}

Results in the DuckDB CLI saying the following:

D SELECT * FROM '/tmp/repro.parquet';
Invalid Error:
Out of buffer

This isn't so much a bug report as a "request for comment", since I acknowledge the possibility that DuckDB is in the wrong here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions