Skip to content

Conversation

FabianMeiswinkel
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel commented Oct 16, 2025

Description

This PR adds a new WriteStrategy for the Spark connector allowing to use ItemPatch - but skip errors when documents with patch-instructions in the dataframe ebing written don't exist (anymore). The existing strategy "ItemPatch" would fail the Spark job because the 404/NotFound is a non-transient error. There are several use cases where customers want to pacth documents - but it is ok/accepted when the documents don't exist anymore - and this should be a no-op in that case instead of failing the entire job. The new write startegy ItemPatchIfExists allows that.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@Copilot Copilot AI review requested due to automatic review settings October 16, 2025 22:03
@FabianMeiswinkel FabianMeiswinkel requested a review from a team as a code owner October 16, 2025 22:03
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new write strategy ItemPatchIfExists to the Azure Cosmos Spark connector, which allows patch operations to gracefully skip documents that don't exist instead of failing the job.

  • Added ItemPatchIfExists enum value to the ItemWriteStrategy enumeration
  • Updated point and bulk writers to handle the new strategy by ignoring 404/Not Found errors
  • Added comprehensive test coverage for the new functionality

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
SparkE2EWriteITest.scala Added integration test for the new ItemPatchIfExists strategy
PointWriter.scala Updated patch operations to support ignoring not found errors
CosmosConfig.scala Added ItemPatchIfExists to the enum and configuration parsing
BulkWriter.scala Updated bulk writer to handle the new strategy
configuration-reference.md Updated documentation to describe the new strategy
CHANGELOG.md Added changelog entry for the new feature

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@aayush3011 aayush3011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants