-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Added new WriteStrategy ItemPatchIfExists #47034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Added new WriteStrategy ItemPatchIfExists #47034
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new write strategy ItemPatchIfExists
to the Azure Cosmos Spark connector, which allows patch operations to gracefully skip documents that don't exist instead of failing the job.
- Added
ItemPatchIfExists
enum value to theItemWriteStrategy
enumeration - Updated point and bulk writers to handle the new strategy by ignoring 404/Not Found errors
- Added comprehensive test coverage for the new functionality
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
SparkE2EWriteITest.scala |
Added integration test for the new ItemPatchIfExists strategy |
PointWriter.scala |
Updated patch operations to support ignoring not found errors |
CosmosConfig.scala |
Added ItemPatchIfExists to the enum and configuration parsing |
BulkWriter.scala |
Updated bulk writer to handle the new strategy |
configuration-reference.md |
Updated documentation to describe the new strategy |
CHANGELOG.md |
Added changelog entry for the new feature |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - spark |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This PR adds a new WriteStrategy for the Spark connector allowing to use ItemPatch - but skip errors when documents with patch-instructions in the dataframe ebing written don't exist (anymore). The existing strategy "ItemPatch" would fail the Spark job because the 404/NotFound is a non-transient error. There are several use cases where customers want to pacth documents - but it is ok/accepted when the documents don't exist anymore - and this should be a no-op in that case instead of failing the entire job. The new write startegy
ItemPatchIfExists
allows that.All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines