-
Notifications
You must be signed in to change notification settings - Fork 299
Adding Location Cache #2801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding Location Cache #2801
Conversation
read me change
ItemOptions-changes 41f2acf if_match_etag changes with tests * main b1d2f9f [origin/main] Merge pull request #1 from gsa9989/read-me-change read-me-change 653b308 read me change Please enter a commit message to explain why this merge is necessary, ItemOption-changes 41f2acf if_match_etag changes with tests * main b1d2f9f [origin/main] Merge pull request #1 from gsa9989/read-me-change read-me-change 653b308 merge upstream/main
…ter a commit message to explain why this merge is necessary,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds location cache functionality to the Azure Cosmos SDK, providing a system to organize database account data and manage endpoint availability. The location cache tracks client preferred locations, available read/write regions, and endpoint availability to provide optimal endpoint selection for database operations.
Key Changes:
- Added comprehensive location cache implementation with endpoint management and availability tracking
- Implemented API methods for updating, refreshing, and resolving service endpoints
- Added extensive test coverage for all location cache functionality
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
File | Description |
---|---|
sdk/cosmos/azure_data_cosmos/src/location_cache.rs | Complete implementation of LocationCache with availability tracking, endpoint management, and comprehensive test suite |
sdk/cosmos/azure_data_cosmos/src/lib.rs | Added module declarations for location_cache and global_endpoint_manager |
sdk/cosmos/azure_data_cosmos/src/global_endpoint_manager.rs | Added empty GlobalEndpointManager struct placeholder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial stuff, but nothing significant! Looking good!
let now = std::time::SystemTime::now(); | ||
|
||
{ | ||
let mut location_unavailability_info_map = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to log a message when an endpoint is marked unavailable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in return a message to the user that the function worked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like the examples here https://docs.rs/log/latest/log/ and the message can "<endpoint> was marked unavailable for <write/read>". It could help debugging if a customer asks why there traffic switched from one region to another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added info!( "Endpoint {} was marked as unavailable for {:?}", endpoint, operation );
without any issues other than importing tracing::info. However, if I wanted to test it I'd have to add the log crate to cargo.toml. Should I do that or is adding the message sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracing
is the correct API to use (we don't actually use the log
crate in the Rust SDK, we use tracing
instead). I don't think we need a test for this case. We do test some of our diagnostics/telemetry, but I don't think individual log messages need to be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit on the phrasing you had above: Our log messages generally start with lowercase letters. For example:
2025-08-12T19:40:21.225780Z DEBUG azure_data_cosmos::pipeline::signature_target: generating Cosmos auth signature signature_payload="get\ndocs\ndbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D\ntue, 12 aug 2025 19:40:21 gmt\n\n"
2025-08-12T19:40:21.225858Z DEBUG typespec_client_core::http::policies::transport: sending request 'https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D' request=Request { url: "https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D", method: Get, headers: {"user-agent": "*****", "x-ms-documentdb-partitionkey": "*****", "x-ms-client-request-id": "*****", "x-ms-date": "*****", "x-ms-version": "*****", "authorization": "*****"}, body: Bytes {} }
2025-08-12T19:40:21.226064Z DEBUG typespec_client_core::http::clients::reqwest: performing request GET 'https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D' with `reqwest`
2025-08-12T19:40:21.226185Z DEBUG reqwest::connect: starting new connection: https://localhost:8081/
2025-08-12T19:40:21.226863Z DEBUG hyper_util::client::legacy::connect::http: connecting to [::1]:8081
2025-08-12T19:40:21.227041Z DEBUG hyper_util::client::legacy::connect::http: connected to [::1]:8081
Also, "was marked" is passive voice, which we try to avoid where possible (we're not perfect ;)). We try to use active statements as much as possible, so "endpoint marked unavailable" (describing what the application is doing right now) rather than "endpoint was marked unavailable" (describing what the application did sometime in the past).
So it should be endpoint {} marked as unavailable for {:?}
Updates done via pair programming (I couldn't merge directly 😭 )
… gaby/global-endpoint-manager
…re/azure-sdk-for-rust into gaby/global-endpoint-manage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I think we're close! The currently build failure isn't your responsibility, it's an issue in a different crate that's also failing in main
. We'll have to wait for it to be resolved before merging, but it shouldn't be an issue.
fn resolve_service_endpoint_default() { | ||
let cache = create_test_location_cache(); | ||
|
||
let endpoint = cache.resolve_service_endpoint(5, RequestOperation::Read); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We should be able to access the read_endpoints
list, so rather than picking a magic number we know is currently more than the number of locations in the test data, we can use the length itself:
let endpoint = cache.resolve_service_endpoint(5, RequestOperation::Read); | |
let endpoint = cache.resolve_service_endpoint(cache.read_endpoints.len() + 1, RequestOperation::Read); |
let now = std::time::SystemTime::now(); | ||
|
||
{ | ||
let mut location_unavailability_info_map = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit on the phrasing you had above: Our log messages generally start with lowercase letters. For example:
2025-08-12T19:40:21.225780Z DEBUG azure_data_cosmos::pipeline::signature_target: generating Cosmos auth signature signature_payload="get\ndocs\ndbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D\ntue, 12 aug 2025 19:40:21 gmt\n\n"
2025-08-12T19:40:21.225858Z DEBUG typespec_client_core::http::policies::transport: sending request 'https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D' request=Request { url: "https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D", method: Get, headers: {"user-agent": "*****", "x-ms-documentdb-partitionkey": "*****", "x-ms-client-request-id": "*****", "x-ms-date": "*****", "x-ms-version": "*****", "authorization": "*****"}, body: Bytes {} }
2025-08-12T19:40:21.226064Z DEBUG typespec_client_core::http::clients::reqwest: performing request GET 'https://localhost:8081/dbs/SampleDB/colls/SampleContainer/docs/14174164-F6C0-47FC-83FB-604C6A63408D' with `reqwest`
2025-08-12T19:40:21.226185Z DEBUG reqwest::connect: starting new connection: https://localhost:8081/
2025-08-12T19:40:21.226863Z DEBUG hyper_util::client::legacy::connect::http: connecting to [::1]:8081
2025-08-12T19:40:21.227041Z DEBUG hyper_util::client::legacy::connect::http: connected to [::1]:8081
Also, "was marked" is passive voice, which we try to avoid where possible (we're not perfect ;)). We try to use active statements as much as possible, so "endpoint marked unavailable" (describing what the application is doing right now) rather than "endpoint was marked unavailable" (describing what the application did sometime in the past).
So it should be endpoint {} marked as unavailable for {:?}
|
||
let endpoint1 = "https://location1.documents.example.com"; | ||
|
||
let before_marked_unavailable_time = SystemTime::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like asserting this, but there's a potential problem. There's no guarantee that SystemTime::now()
will have advanced between when you call it, and when mark_endpoint_unavailable
runs. The SystemTime
type in Rust is not guaranteed to be monotonic (i.e. it is not guaranteed to consistently increase). On most modern OSes, it will be, but that's not a guarantee.
Also, it's possible the CPU instructions between here and when you capture SystemTime::now
again inside mark_endpoint_unavailable
all execute before the clock moves forward. In most cases, this will work fine, there are enough instructions and CPU cycles between each call to get the time that it's likely the clock will have moved forward, but there's no guarantee of that and slower/faster processors may disrupt the test.
All that is to say that asserting the passage of time in tests is tricky ;). Instead of capturing now
here, pick a time several seconds before now
to use as before_marked_unavailable_time
(SystemTime::now() - Duration::from_secs(10)
) and then set the last_check_time
of the endpoint to that value (similar to what you're doing with the staleness check). Then, when you assert below that last_check_time
is greater than before_marked_unavailable_time
, you have a much higher chance of it being true.
PR Summary:
Implementation details:
Current limitations: