Skip to content

Commit 6834f70

Browse files
committed
docs: add avro upgrade notes for arrow-avro migration
1 parent c3bd1c8 commit 6834f70

File tree

1 file changed

+43
-0
lines changed
  • docs/source/library-user-guide/upgrading

1 file changed

+43
-0
lines changed

docs/source/library-user-guide/upgrading/53.0.0.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -472,3 +472,46 @@ Now:
472472
+-------+
473473
0 row(s) fetched.
474474
```
475+
476+
### Avro API, feature, and decoding changes
477+
478+
As part of the Avro reader migration (see [#17861]), DataFusion now delegates
479+
Avro-to-Arrow type decoding to `arrow-avro` (aligned with `arrow-avro` semantics),
480+
and several Avro-related APIs/feature wiring changed:
481+
482+
- `DataFusionError::AvroError` has been removed.
483+
- `From<apache_avro::Error> for DataFusionError` has been removed.
484+
- Avro crate re-export changed:
485+
- Before: `datafusion::apache_avro`
486+
- After: `datafusion::arrow_avro`
487+
- Cargo feature wiring changed:
488+
- `datafusion` crate `avro` feature no longer enables `datafusion-common/avro`
489+
- `datafusion-proto` crate `avro` feature no longer enables `datafusion-common/avro`
490+
- **Avro datatype interpretation now follows `arrow-avro` behavior.** Notable effects:
491+
- Avro `string` logical values are read as Arrow `Binary` in DataFusion Avro scans
492+
- Avro `timestamp-*` logical types are read as UTC timezone-aware Arrow timestamps
493+
(`Timestamp(..., Some("+00:00"))`)
494+
- Avro `local-timestamp-*` logical types remain timezone-naive
495+
(`Timestamp(..., None)`)
496+
497+
**Who is affected:**
498+
499+
- Users matching on `DataFusionError::AvroError`
500+
- Users importing `datafusion::apache_avro`
501+
- Users depending on the old `datafusion-common/avro` feature wiring
502+
- Users relying on DataFusion-specific Avro decoding behavior (especially `string`
503+
and timestamp logical types)
504+
505+
**Migration guide:**
506+
507+
- Replace `datafusion::apache_avro` imports with `datafusion::arrow_avro`.
508+
- Update error handling code that matches on `DataFusionError::AvroError` to use
509+
the current error surface.
510+
- If you depend on Avro feature propagation, update Cargo feature expectations:
511+
`datafusion`/`datafusion-proto` `avro` no longer enables `datafusion-common/avro`.
512+
- Review Avro table schemas and add explicit casts where needed for binary-backed
513+
string values (for example, `CAST(binary_col AS VARCHAR)`).
514+
- Validate timestamp handling where timezone semantics matter:
515+
`timestamp-*` is UTC timezone-aware, while `local-timestamp-*` is timezone-naive.
516+
517+
[#17861]: https://github.com/apache/datafusion/pull/17861

0 commit comments

Comments
 (0)