Skip to content

[Cosmos] Session container fixes new branch#41678

Merged
simorenoh merged 103 commits intoAzure:mainfrom
bambriz:session_token_fixes_bambriz_branch
Jul 28, 2025
Merged

[Cosmos] Session container fixes new branch#41678
simorenoh merged 103 commits intoAzure:mainfrom
bambriz:session_token_fixes_bambriz_branch

Conversation

@bambriz
Copy link
Member

@bambriz bambriz commented Jun 20, 2025

Description

Moved this PR 40366 Due to pipeline Issues, please see description of that PR for information on Session container fixes.

Original PR Description:

This PR initially aimed at closing several gaps in the session token handling logic of the Python SDK's session container, specifically sending the entire compound session token for a container for every request, but as a result of this has now grown beyond that in scope. This PR now also addresses and closes the following issues:

Current state

The Python SDK currently does several things that should be improved upon for session consistency behaviors:

  • We currently send out a session token for every single request so long as the default account consistency is Session, which is undesired behavior for write operations in single-write region scenarios.
  • The session token that we send out with our requests is a compound session token including every single partition in the container, which is unfeasible for large accounts since these can become large enough to cause request size issues.
  • The SDK had no pk cache refreshing logic for partition split scenarios since we don't receive 410/1002 status codes to react to for normal requests sending out a partition key value and not a partition key range id.
  • The SDK was not updating session tokens after read requests, allowing stale reads for workloads if other clients are interacting with the same container resource.

Changes introduced

In order to address the above issues, the following changes have been made:

  • We will now only send out session tokens for the relevant requests under session consistency - read operations, batch operations, or requests sent by multi-write configured accounts.
  • The session token that now gets sent out will only have the relevant information for its partition the same way the .NET and Java SDKs do, only sending the minimum information: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/SessionTokenHelper.java#L45
  • Now, once we receive a partition key range id in the response headers that is unaccounted for in the partition key range cache, we will force a refresh to the cache in order to obtain all the new ids to be used in session token computing for subsequent requests.
  • We now update session tokens on read requests as well, ensuring all requests are fetching the newest available session token.
  • We now update session tokens on exceptions as well.

Caveats

We currently only initiate the session container within a client if the user properly initializes their client. While this is not a problem for the sync client, it means that users that are not directly initializing their asynchronous clients as outlined in our README will not be able to leverage the session container, and will have to implement their own session token handling logic to achieve session consistency.

Follow-ups

  • [Cosmos] make queries fetch query plan in every query #38577 - ensuring we send query plan calls for every cross-partition query. This is also the first follow up item marked in the PPCB PR, and would be needed to ensure we are fully covered on that front: Per Partition Circuit Breaker #40302. This is pending because pagination logic seemed to have completely broken after splitting this up. Further work will be needed to ensure our query pipeline can handle this scenario properly. As of now, we maintain the current behavior, and send the container's compound session token for the request.
  • We should initialize the session container regardless of consistency level within the SDK, since users utilizing consistencies greater than session can always downgrade into session. Issue: Session Container not created for other consistency levels #41920.
  • An enhancement that was made in the Java SDK to handle compound session tokens passed in by the user exists, that parses through a compound token to attempt to extract only the relevant partition token in order to only pass that to the service. We should consider implementing that as well to ensure customers attempting to use compound tokens only have the relevant one used.

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Azure Azure deleted a comment from azure-pipelines bot Jul 23, 2025
@simorenoh
Copy link
Member

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@Azure Azure deleted a comment from azure-pipelines bot Jul 24, 2025
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simorenoh
Copy link
Member

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simorenoh
Copy link
Member

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simorenoh simorenoh merged commit 0497631 into Azure:main Jul 28, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants