-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit 567f441
Add array/map/fixed schema resolution and default value support to arrow-avro codec (#8292)
# Which issue does this PR close?
This work continues arrow-avro schema resolution support and aligns
behavior with the Avro spec.
- **Related to**: #4886 (“Add Avro Support”): ongoing work to round out
the reader/decoder, including schema resolution and type promotion.
- **Follow-ups/Context**: #8124 (schema resolution & type promotion for
the decoder), #8223 (enum mapping for schema resolution). These previous
efforts established the foundations that this PR extends to default
values and additional resolvable types.
# Rationale for this change
Avro’s **schema resolution** requires readers to reconcile differences
between the writer and reader schemas, including:
- Using record-field **default values** when the writer lacks a field
present in the reader; defaults must be type-correct (i.e., union
defaults match the first union member; bytes/fixed defaults are JSON
strings).
- Recursively resolving **arrays** (by item schema) and **maps** (by
value schema).
- Resolving **fixed** types (size and unqualified name must match) and
erroring when they do not.
Prior to this change, arrow-avro’s resolution handled some cases but
lacked full Codec support for **default values** and for resolving
**array/map/fixed** shapes between writer and reader. This led to gaps
when reading evolved data or datasets produced by heterogeneous systems.
This PR implements these missing pieces so the Arrow reader behaves per
the spec in common evolution scenarios.
# What changes are included in this PR?
This PR modifies **`arrow-avro/src/codec.rs`** to extend the
schema-resolution path
- **Default value handling** for record fields
- Reads and applies default values when the reader expects a field
absent from the writer, including **nested defaults**.
- Validates defaults per the Avro spec (e.g., union defaults match the
first schema; bytes/fixed defaults are JSON strings).
- **Array / Map / Fixed schema resolution**
- **Array**: recursively resolves item schemas (writer↔reader).
- **Map**: recursively resolves value schemas.
- **Fixed**: enforces matching size and (unqualified) name; otherwise
signals an error, consistent with the spec.
- **Codec updates**
- Refactors internal codec logic to support the above during decoding,
including resolution for **record fields** and **nested defaults**. (See
commit message for the high-level summary.)
# Are these changes tested?
**Yes.** This PR includes new unit tests in `arrow-avro/src/codec.rs`
covering:
1) **Default validation & persistence**
- `Null`/union‑nullability rules; metadata persistence of defaults
(`AVRO_FIELD_DEFAULT_METADATA_KEY`).
2) **`AvroLiteral` Parsing**
- Range checks for `i32`/`f32`; correct literals for `i64`/`f64`;
`Utf8`/`Utf8View`; `uuid` strings (RFC‑4122).
- Byte‑range mapping for `bytes`/`fixed` defaults; `Fixed(n)` length
enforcement; `decimal` on `fixed` vs `bytes`; `duration`/interval fixed
**12**‑byte enforcement.
3) **Collections & records**
- Array/map defaults shape; enum symbol validity; record defaults for
missing fields, required‑field errors, and honoring field‑level
defaults; skip‑fields retained for writer‑only fields.
4) **Resolution mechanics**
- Element **promotion** (`int` to `long`) for arrays; **reader metadata
precedence** for colliding attributes; `fixed` name/size match including
**alias**.
# Are there any user-facing changes?
N/A
---------
Co-authored-by: Andrew Lamb <[email protected]>1 parent 226a425 commit 567f441Copy full SHA for 567f441
File tree
Expand file treeCollapse file tree
1 file changed
+810
-53
lines changedOpen diff view settings
Filter options
- arrow-avro/src
Expand file treeCollapse file tree
1 file changed
+810
-53
lines changedOpen diff view settings
0 commit comments