You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Spark] Fix a bug that prevents altering array/map/struct<varchar> column (delta-io#4499)
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
3. Be sure to keep the PR description updated to reflect all changes.
4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->
#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->
- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)
## Description
### Context
Previously, there was a bug where char/varchar to string conversion
didn't work correctly due to how they are internally represented.
For example:
- An `array<varchar<10>>` is represented as `(dataType = array<string>,
char_varchar_metadata = "array<varchar<10>>")`.
- If we convert `array<varchar<10>>` into `array<string>` by setting the
`dataType` part with`array<string>`, nothing will change.
This has been fixed in delta-io#3346. The
idea is to first convert the representation into one without the
metadata part. In the example above, `(dataType = array<string>,
char_varchar_metadata = "array<varchar<10>>")` will first be converted
into `(dataType = array<varchar<10>>, char_varchar_metadata = "")`
before setting the `dataType` part to `string`.
### Problem
In the previous fix, the non-metadata representation is left open and is
converted back to the metadata-based representation where necessary.
This is causing an issue where we incorrectly do this converting-back
step that breaks ALTER COLUMN on any columns with the type
`container<varchar>` where `container` can be `array`/`map`/`struct`.
Specifically, the `verifyColumnChange` is taking in `change` which is a
non-metadata-based representation, and a `oldColumnForVerification`,
which is a metadata-based representation. This method has a check to
prevent any change to complex data types (`array`/`map`/`struct`). Since
it's a simple equality check, the check fails when comparing the
non-metadata and metadata-based representation of the type.
### New Approach
To avoid having to reason about what representation to use where, this
PR takes a more targeted approach. It always defaults to the
metadata-based representation of char/varchar and only do the
representation conversion when setting the data type.
## How was this patch tested?
New and existing unit tests
## Does this PR introduce _any_ user-facing changes?
No
0 commit comments