-
Notifications
You must be signed in to change notification settings - Fork 0
[WIP][KERNEL][VARIANT] Add basic read in delta kernel #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a296c41
to
1d20a35
Compare
1d20a35
to
1c897a5
Compare
4dc4cb9
to
7a30942
Compare
843995c
to
a9a1c5b
Compare
* Abstraction to represent a single Variant value in a {@link ColumnVector}. | ||
*/ | ||
public interface VariantValue { | ||
byte[] getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may need a design decision. Is there a better interface we could provide that hides the complexity of the knowing metadata
format?
(... rough idea ... - haven't thought about all details)
- API to expose the paths (and potentially type of each path as well)
- API to get value. If the path is an int
VariantValue.getInt(String path) -> return integer
.
} | ||
|
||
@Override | ||
public String toString() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: change this to match OSS Spark's implementation.
toString currently is required to accurately test Variant Vals because the test infra calls toString
to order the results
7a30942
to
0fe08cb
Compare
0fe08cb
to
a84e8f4
Compare
a9a1c5b
to
844cb0b
Compare
6fdc752
to
d24a0b1
Compare
844cb0b
to
5722e8d
Compare
3860375
to
ddc75dc
Compare
…rClient` API (delta-io#3797) This is a stacked PR. Please view this PR's diff here: - scottsand-db/delta@delta_kernel_cc_1...delta_kernel_cc_2 #### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [X] Kernel - [ ] Other (fill in here) ## Description Adds new `TableDescriptor` and `CommitCoordinatorClient` API. Adds a new `getCommitCoordinatorClient` API to the `Engine` (with a default implementation that throws an exception). ## How was this patch tested? N/A trivial. ## Does this PR introduce _any_ user-facing changes? Yes. See the above.
…inatorClient API" (delta-io#3917) This reverts commit 6ae4b62 We seem to be rethinking our Coordinated Commits CUJ / APIs, and we don't want these APIs leaked in Delta 3.3.
…ColumnMapping (delta-io#4319) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> Split the main PR delta-io#4265 for faster review Add a util func `convertToPhysicalColumnNames` in ColumnMapping to get the corresponding physical column name for a logical column ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> Add unit test cases in ColumnMappingSuite.scala ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
…sage with replace table (delta-io#4520) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [X] Kernel - [ ] Other (fill in here) ## Description Update SchemaUtils and ColumnMapping with unit tests in order to support REPLACE TABLE with column mapping + fieldId re-use in PR #2. Specifically this involves the following changes (not necessarily related, but combined in this PR) 1) When a connector provides its own column mapping info in the schema pre-populated we require that it's complete (i.e. fieldId AND physicalName must be present) 2) We add an argument to our schema validation checks `allowNewNonNullableFields`. This is useful in cases where we can be sure the table state has been completely cleared, and thus new non-null fields are valid (like REPLACE). 3) We don't allow adding a new column with a fieldId less than the maxColId. For now, do this proactively for safety. In the future in the case of something like RESTORE in the future we will likely need a config to bypass this check. ## How was this patch tested? Updates unit tests. Also, all the changes in this PR are used by delta-io#4520 which adds a lot more E2E tests with multiple schema scenarios. ## Does this PR introduce _any_ user-facing changes? No.
…elta-io#4732) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> This PR is refactoring only, move all the IcebergCompatChecks from `IcebergCompatV2MetadataValidatorAndUpdater.java` to its base class `IcebergCompatMetadataValidatorAndUpdater.java` so that later newly added `IcebergCompatV3MetadataValidatorAndUpdater.java` can use them. ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> Existing unit tests since it is only doing refactoring. ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
…lta-io#4734) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [x] Kernel - [ ] Other (fill in here) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> This PR is refactoring only, move all the validation checks from `IcebergWriterCompatV1MetadataValidatorAndUpdater.java` to a new base class `IcebergWriterCompatMetadataValidatorAndUpdater.java` so that later newly added `IcebergWriterCompatV3MetadataValidatorAndUpdater.java` can use them. ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> Existing unit tests since it is only doing refactoring. ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. -->
relies on #1 and #4
Which Delta project/connector is this regarding?
Description
How was this patch tested?
Does this PR introduce any user-facing changes?