Skip to content

Conversation

richardc-db
Copy link
Owner

@richardc-db richardc-db commented Mar 12, 2024

relies on #1 and #4

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

@richardc-db richardc-db force-pushed the kernel_read_variant branch from a296c41 to 1d20a35 Compare March 20, 2024 01:57
@richardc-db richardc-db changed the base branch from add_variant_type to fix_kernel_tests_java17 March 20, 2024 01:59
@richardc-db richardc-db force-pushed the kernel_read_variant branch from 1d20a35 to 1c897a5 Compare March 20, 2024 02:01
@richardc-db richardc-db changed the base branch from fix_kernel_tests_java17 to add_variant_type March 20, 2024 02:01
@richardc-db richardc-db force-pushed the add_variant_type branch 2 times, most recently from 4dc4cb9 to 7a30942 Compare March 25, 2024 19:51
@richardc-db richardc-db force-pushed the kernel_read_variant branch from 843995c to a9a1c5b Compare March 25, 2024 21:45
* Abstraction to represent a single Variant value in a {@link ColumnVector}.
*/
public interface VariantValue {
byte[] getValue();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need a design decision. Is there a better interface we could provide that hides the complexity of the knowing metadata format?

(... rough idea ... - haven't thought about all details)

  • API to expose the paths (and potentially type of each path as well)
  • API to get value. If the path is an int VariantValue.getInt(String path) -> return integer.

}

@Override
public String toString() {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: change this to match OSS Spark's implementation.

toString currently is required to accurately test Variant Vals because the test infra calls toString to order the results

@richardc-db richardc-db force-pushed the kernel_read_variant branch from a9a1c5b to 844cb0b Compare April 12, 2024 01:07
@richardc-db richardc-db changed the base branch from add_variant_type to fix_kernel_tests_java17 April 12, 2024 01:07
@richardc-db richardc-db force-pushed the fix_kernel_tests_java17 branch 3 times, most recently from 6fdc752 to d24a0b1 Compare April 12, 2024 06:38
@richardc-db richardc-db force-pushed the kernel_read_variant branch from 844cb0b to 5722e8d Compare April 16, 2024 01:17
@richardc-db richardc-db changed the title [KERNEL][VARIANT] Add basic read in delta kernel [WIP][KERNEL][VARIANT] Add basic read in delta kernel Apr 16, 2024
@richardc-db richardc-db force-pushed the kernel_read_variant branch from 3860375 to ddc75dc Compare May 2, 2024 05:18
@richardc-db richardc-db changed the base branch from fix_kernel_tests_java17 to master May 2, 2024 05:19
@richardc-db richardc-db closed this May 2, 2024
richardc-db pushed a commit that referenced this pull request Dec 20, 2024
…rClient` API (delta-io#3797)

This is a stacked PR. Please view this PR's diff here:
-
scottsand-db/delta@delta_kernel_cc_1...delta_kernel_cc_2

#### Which Delta project/connector is this regarding?
- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [X] Kernel
- [ ] Other (fill in here)

## Description

Adds new `TableDescriptor` and `CommitCoordinatorClient` API. Adds a new
`getCommitCoordinatorClient` API to the `Engine` (with a default
implementation that throws an exception).

## How was this patch tested?

N/A trivial.

## Does this PR introduce _any_ user-facing changes?

Yes. See the above.
richardc-db pushed a commit that referenced this pull request Dec 20, 2024
…inatorClient API" (delta-io#3917)

This reverts commit 6ae4b62

We seem to be rethinking our Coordinated Commits CUJ / APIs, and we
don't want these APIs leaked in Delta 3.3.
richardc-db pushed a commit that referenced this pull request May 13, 2025
…ColumnMapping (delta-io#4319)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [x] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
Split the main PR delta-io#4265 for faster
review

Add a util func `convertToPhysicalColumnNames` in ColumnMapping to get
the corresponding physical column name for a logical column
## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
Add unit test cases in ColumnMappingSuite.scala

## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
richardc-db pushed a commit that referenced this pull request May 13, 2025
…sage with replace table (delta-io#4520)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [X] Kernel
- [ ] Other (fill in here)

## Description

Update SchemaUtils and ColumnMapping with unit tests in order to support
REPLACE TABLE with column mapping + fieldId re-use in PR #2.
Specifically this involves the following changes (not necessarily
related, but combined in this PR)

1) When a connector provides its own column mapping info in the schema
pre-populated we require that it's complete (i.e. fieldId AND
physicalName must be present)
2) We add an argument to our schema validation checks
`allowNewNonNullableFields`. This is useful in cases where we can be
sure the table state has been completely cleared, and thus new non-null
fields are valid (like REPLACE).
3) We don't allow adding a new column with a fieldId less than the
maxColId. For now, do this proactively for safety. In the future in the
case of something like RESTORE in the future we will likely need a
config to bypass this check.

## How was this patch tested?

Updates unit tests.

Also, all the changes in this PR are used by
delta-io#4520 which adds a lot more E2E
tests with multiple schema scenarios.

## Does this PR introduce _any_ user-facing changes?

No.
richardc-db pushed a commit that referenced this pull request Sep 8, 2025
…elta-io#4732)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [x] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
This PR is refactoring only, move all the IcebergCompatChecks from
`IcebergCompatV2MetadataValidatorAndUpdater.java` to its base class
`IcebergCompatMetadataValidatorAndUpdater.java` so that later newly
added `IcebergCompatV3MetadataValidatorAndUpdater.java` can use them.

## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
Existing unit tests since it is only doing refactoring.
## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
richardc-db pushed a commit that referenced this pull request Sep 8, 2025
…lta-io#4734)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [x] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
This PR is refactoring only, move all the validation checks from
`IcebergWriterCompatV1MetadataValidatorAndUpdater.java` to a new base
class `IcebergWriterCompatMetadataValidatorAndUpdater.java` so that
later newly added
`IcebergWriterCompatV3MetadataValidatorAndUpdater.java` can use them.



## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
Existing unit tests since it is only doing refactoring.



## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants