Skip to content

[Geneva Exporter] Eliminate schema_id collision during schema grouping #440

@lalitb

Description

@lalitb

Problem
We currently use a hash-derived schema_id and rely on equality of that id to decide if a schema already exists in a batch. Hashes can collide. Also, metadata.schema_ids is emitted as MD5(hex) and semicolon-separated, which isn’t what we want going forward.

What we want

  1. Use a local, per-batch auto-incrementing schema_id (0,1,2,…) for each unique schema shape inside the batch.
  2. Deduplicate schemas by exact schema equality, not by hash.
  3. Emit metadata.schema_ids as a comma-separated list of those local ids, sorted ascending. Example: 0,1,2.
  4. Keep the CentralBlob wire format the same otherwise.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions