Adding Location Cache #2801

gsa9989 · 2025-07-17T21:35:07Z

PR Summary:

Location cache is responsible for organizing the database account data gathered by the global endpoint manager. It holds the client’s preferred locations as well as lists of write and read regions separated by location and availability. Its purpose is to provide an available endpoint based on the client's preferred locations.
Added location cache functionality and tests.

Implementation details:

APIs include:
- update
- refresh_endpoints
- refresh_stale_endpoints
- get_endpoints_by_location
- get_preferred_available_endpoints
- mark_endpoint_unavailable
- is_endpoint_unavailable
- resolve_service_endpoint

Current limitations:

Currently does not support multi-write capabilities, planning to add it in a follow-up pr with global endpoint manager.

read me change

ItemOptions-changes 41f2acf if_match_etag changes with tests * main b1d2f9f [origin/main] Merge pull request #1 from gsa9989/read-me-change read-me-change 653b308 read me change Please enter a commit message to explain why this merge is necessary, ItemOption-changes 41f2acf if_match_etag changes with tests * main b1d2f9f [origin/main] Merge pull request #1 from gsa9989/read-me-change read-me-change 653b308 merge upstream/main

…ter a commit message to explain why this merge is necessary,

Copilot

Pull Request Overview

This PR adds location cache functionality to the Azure Cosmos SDK, providing a system to organize database account data and manage endpoint availability. The location cache tracks client preferred locations, available read/write regions, and endpoint availability to provide optimal endpoint selection for database operations.

Key Changes:

Added comprehensive location cache implementation with endpoint management and availability tracking
Implemented API methods for updating, refreshing, and resolving service endpoints
Added extensive test coverage for all location cache functionality

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
sdk/cosmos/azure_data_cosmos/src/location_cache.rs	Complete implementation of LocationCache with availability tracking, endpoint management, and comprehensive test suite
sdk/cosmos/azure_data_cosmos/src/lib.rs	Added module declarations for location_cache and global_endpoint_manager
sdk/cosmos/azure_data_cosmos/src/global_endpoint_manager.rs	Added empty GlobalEndpointManager struct placeholder

sdk/cosmos/azure_data_cosmos/src/location_cache.rs

sdk/cosmos/azure_data_cosmos/src/global_endpoint_manager.rs

analogrelay

Some initial stuff, but nothing significant! Looking good!

sdk/cosmos/azure_data_cosmos/src/location_cache.rs

tvaron3 · 2025-08-07T18:51:41Z

sdk/cosmos/azure_data_cosmos/src/location_cache.rs

+        let now = std::time::SystemTime::now();
+
+        {
+            let mut location_unavailability_info_map =


Would it be possible to log a message when an endpoint is marked unavailable?

As in return a message to the user that the function worked?

Something like the examples here https://docs.rs/log/latest/log/ and the message can "<endpoint> was marked unavailable for <write/read>". It could help debugging if a customer asks why there traffic switched from one region to another.

I added info!( "Endpoint {} was marked as unavailable for {:?}", endpoint, operation ); without any issues other than importing tracing::info. However, if I wanted to test it I'd have to add the log crate to cargo.toml. Should I do that or is adding the message sufficient?

tracing is the correct API to use (we don't actually use the log crate in the Rust SDK, we use tracing instead). I don't think we need a test for this case. We do test some of our diagnostics/telemetry, but I don't think individual log messages need to be tested.

Got it, thank you!

Small nit on the phrasing you had above: Our log messages generally start with lowercase letters. For example:

2025-08-12T19:40:21.225780Z DEBUG azure_data_cosmos::pipeline::signature_target: generating Cosmos auth signature signature_payload="get\ndocs\ndbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D\ntue, 12 aug 2025 19:40:21 gmt\n\n" 2025-08-12T19:40:21.225858Z DEBUG typespec_client_core::http::policies::transport: sending request 'https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D' request=Request { url: "https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D", method: Get, headers: {"user-agent": "*****", "x-ms-documentdb-partitionkey": "*****", "x-ms-client-request-id": "*****", "x-ms-date": "*****", "x-ms-version": "*****", "authorization": "*****"}, body: Bytes {} } 2025-08-12T19:40:21.226064Z DEBUG typespec_client_core::http::clients::reqwest: performing request GET 'https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D' with `reqwest` 2025-08-12T19:40:21.226185Z DEBUG reqwest::connect: starting new connection: https://localhost:8081/ 2025-08-12T19:40:21.226863Z DEBUG hyper_util::client::legacy::connect::http: connecting to [::1]:8081 2025-08-12T19:40:21.227041Z DEBUG hyper_util::client::legacy::connect::http: connected to [::1]:8081

Also, "was marked" is passive voice, which we try to avoid where possible (we're not perfect ;)). We try to use active statements as much as possible, so "endpoint marked unavailable" (describing what the application is doing right now) rather than "endpoint was marked unavailable" (describing what the application did sometime in the past).

So it should be endpoint {} marked as unavailable for {:?}

sdk/cosmos/azure_data_cosmos/src/location_cache.rs

Updates done via pair programming (I couldn't merge directly 😭 )

… gaby/global-endpoint-manager

…re/azure-sdk-for-rust into gaby/global-endpoint-manage

analogrelay

Looking good! I think we're close! The currently build failure isn't your responsibility, it's an issue in a different crate that's also failing in main. We'll have to wait for it to be resolved before merging, but it shouldn't be an issue.

analogrelay · 2025-08-12T19:39:58Z

sdk/cosmos/azure_data_cosmos/src/location_cache.rs

+    fn resolve_service_endpoint_default() {
+        let cache = create_test_location_cache();
+
+        let endpoint = cache.resolve_service_endpoint(5, RequestOperation::Read);


nit: We should be able to access the read_endpoints list, so rather than picking a magic number we know is currently more than the number of locations in the test data, we can use the length itself:

Suggested change

let endpoint = cache.resolve_service_endpoint(5, RequestOperation::Read);

let endpoint = cache.resolve_service_endpoint(cache.read_endpoints.len() + 1, RequestOperation::Read);

analogrelay · 2025-08-12T19:44:01Z

sdk/cosmos/azure_data_cosmos/src/location_cache.rs

+        let now = std::time::SystemTime::now();
+
+        {
+            let mut location_unavailability_info_map =


Small nit on the phrasing you had above: Our log messages generally start with lowercase letters. For example:

2025-08-12T19:40:21.225780Z DEBUG azure_data_cosmos::pipeline::signature_target: generating Cosmos auth signature signature_payload="get\ndocs\ndbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D\ntue, 12 aug 2025 19:40:21 gmt\n\n" 2025-08-12T19:40:21.225858Z DEBUG typespec_client_core::http::policies::transport: sending request 'https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D' request=Request { url: "https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D", method: Get, headers: {"user-agent": "*****", "x-ms-documentdb-partitionkey": "*****", "x-ms-client-request-id": "*****", "x-ms-date": "*****", "x-ms-version": "*****", "authorization": "*****"}, body: Bytes {} } 2025-08-12T19:40:21.226064Z DEBUG typespec_client_core::http::clients::reqwest: performing request GET 'https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D' with `reqwest` 2025-08-12T19:40:21.226185Z DEBUG reqwest::connect: starting new connection: https://localhost:8081/ 2025-08-12T19:40:21.226863Z DEBUG hyper_util::client::legacy::connect::http: connecting to [::1]:8081 2025-08-12T19:40:21.227041Z DEBUG hyper_util::client::legacy::connect::http: connected to [::1]:8081

Also, "was marked" is passive voice, which we try to avoid where possible (we're not perfect ;)). We try to use active statements as much as possible, so "endpoint marked unavailable" (describing what the application is doing right now) rather than "endpoint was marked unavailable" (describing what the application did sometime in the past).

So it should be endpoint {} marked as unavailable for {:?}

analogrelay · 2025-08-12T19:54:19Z

sdk/cosmos/azure_data_cosmos/src/location_cache.rs

+
+        let endpoint1 = "https://location1.documents.example.com";
+
+        let before_marked_unavailable_time = SystemTime::now();


I like asserting this, but there's a potential problem. There's no guarantee that SystemTime::now() will have advanced between when you call it, and when mark_endpoint_unavailable runs. The SystemTime type in Rust is not guaranteed to be monotonic (i.e. it is not guaranteed to consistently increase). On most modern OSes, it will be, but that's not a guarantee.

Also, it's possible the CPU instructions between here and when you capture SystemTime::now again inside mark_endpoint_unavailable all execute before the clock moves forward. In most cases, this will work fine, there are enough instructions and CPU cycles between each call to get the time that it's likely the clock will have moved forward, but there's no guarantee of that and slower/faster processors may disrupt the test.

All that is to say that asserting the passage of time in tests is tricky ;). Instead of capturing now here, pick a time several seconds before now to use as before_marked_unavailable_time (SystemTime::now() - Duration::from_secs(10)) and then set the last_check_time of the endpoint to that value (similar to what you're doing with the staleness check). Then, when you assert below that last_check_time is greater than before_marked_unavailable_time, you have a much higher chance of it being true.

gsa9989 and others added 8 commits June 3, 2025 11:42

read me change

653b308

Merge pull request #1 from gsa9989/read-me-change

b1d2f9f

read me change

Merge remote-tracking branch 'upstream/main'

6d21360

Merge branch 'main' of https://github.com/Azure/azure-sdk-f Please en…

b482aad

…ter a commit message to explain why this merge is necessary,

Merge remote-tracking branch 'upstream/main'

c21ce13

Merge branch 'main' of https://github.com/Azure/azure-sdk-for-rust

aab7aff

adding location cache, global endpoint manager, and lib.rs changes

35aeddc

github-actions bot added the Cosmos The azure_cosmos crate label Jul 17, 2025

gsa9989 added 6 commits July 21, 2025 12:34

adding get preferred endpoints

62fdce7

location cache updates + starting tests

4806407

more tests

257a5cd

changing location cache functions

45cb42d

adding cache tests

5376e2e

adding comments to functions

aa26e4f

gsa9989 changed the title ~~Adding Global Endpoint Manager and Location Cache~~ Adding Location Cache Aug 7, 2025

gsa9989 marked this pull request as ready for review August 7, 2025 18:03

Copilot AI review requested due to automatic review settings August 7, 2025 18:03

gsa9989 requested review from analogrelay, Pilchie, kirankumarkolli, tvaron3, FabianMeiswinkel, kundadebdatta, nehrao1 and kushagraThapar as code owners August 7, 2025 18:03

Copilot AI reviewed Aug 7, 2025

View reviewed changes

analogrelay reviewed Aug 7, 2025

View reviewed changes

sdk/cosmos/azure_data_cosmos/src/location_cache.rs Outdated Show resolved Hide resolved

analogrelay reviewed Aug 7, 2025

View reviewed changes

sdk/cosmos/azure_data_cosmos/src/global_endpoint_manager.rs Outdated Show resolved Hide resolved

analogrelay reviewed Aug 7, 2025

View reviewed changes

tvaron3 reviewed Aug 7, 2025

View reviewed changes

analogrelay and others added 7 commits August 7, 2025 12:18

add an 'includes' method to RequestOperation (#2886)

5654bd8

Updates done via pair programming (I couldn't merge directly 😭 )

rge branch 'main' of https://github.com/Azure/azure-sdk-for-rust into…

d471295

… gaby/global-endpoint-manager

Merge branch 'gaby/global-endpoint-manager' of https://github.com/Azu…

5cbc88b

…re/azure-sdk-for-rust into gaby/global-endpoint-manage

fixing tests + comments

907cea9

adding tracing message

a69b0a0

adjust trace message

0595ff6

adjust trace import

58dad3f

analogrelay reviewed Aug 12, 2025

View reviewed changes

	let endpoint = cache.resolve_service_endpoint(5, RequestOperation::Read);
	let endpoint = cache.resolve_service_endpoint(cache.read_endpoints.len() + 1, RequestOperation::Read);


		let endpoint1 = "https://location1.documents.example.com";

		let before_marked_unavailable_time = SystemTime::now();

Adding Location Cache #2801

Are you sure you want to change the base?

Adding Location Cache #2801

Uh oh!

Conversation

gsa9989 commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary:

Implementation details:

Current limitations:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes:

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

analogrelay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tvaron3 Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

analogrelay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gsa9989 commented Jul 17, 2025 •

edited

Loading

tvaron3 Aug 11, 2025 •

edited

Loading