Skip to content

High latencies observed at the SDK entry point during Cloud operations #6817

@NaveenSai1605

Description

@NaveenSai1605

ask: We are currently investigating high latencies observed at the SDK entry point during Cloud operations. Our measurements indicate that these latencies occur when performing Put() operations via the Azure SDK. There have been suggestions that the elevated latencies may be related to CPU utilization within the I/O stack, which appears to be actively engaged in Cloud Put() activities.

This observation seems counterintuitive because our understanding is that Put() calls should primarily wait for server-side acknowledgment of successful completion and therefore should not consume significant CPU resources on the client side. However, profiling consistently shows that I/O threads are on CPU, executing within the SDK, specifically in Put() calls, with the top of the stack being:

Azure::Core::Http::CurlConnection::SendBuffer()

Integration Details

SDK Version: azure-identity-1.5.1
SDK is using the C++ interfaces, as described in https://learn.microsoft.com/en-us/azure/storage/blobs/quickstart-blobs-c-plus-plus?tabs=managed-ide…

  • The main entry point from Rubrik code into the SDK for putting a BlockBlok into a container, that we see as occupying the CPU is:

        - Azure::Response<Models::UploadBlockBlobResult> Upload(Azure::Core::IO::BodyStream& content, const UploadBlockBlobOptions& options = UploadBlockBlobOptions(), const Azure::Core::Context& context = Azure::Core::Context()) const
    
  • Have attached a detailed thread stack of what it looks like from Upload(), all the way to Azure::Core::Http::CurlConnection::SendBuffer(), in case that is helpful, please see attached file,

thread_stack_bt_upload_sendbuffer.txt

  • For completeness, the corresponding Get() interface we use is:
    - Azure::ResponseModels::DownloadBlobToResult DownloadTo(uint8_t* buffer, size_t bufferSize, const DownloadBlobToOptions& options = DownloadBlobToOptions(), const Azure::Core::Context& context = Azure::Core::Context()) const

Key Observations
Threads executing SendBuffer() are consistently in the ‘R’ state (as seen in top -H output), indicating active CPU usage.
This behavior raises questions about why these threads are spinning on CPU rather than being idle while awaiting server acknowledgment.
Request to SDK Team:

We seek clarity on the following points:
Root Cause Analysis:
Why are threads in SendBuffer() consuming CPU cycles during Put() operations? Is this expected behavior or indicative of an underlying issue?
Latency Attribution:
How can we measure and differentiate latencies within the SDK layer versus lower-level factors (e.g., network delays) that may contribute to overall high latencies observed at the SDK entry point?

Goal is to disambiguate SDK-level delays from external factors to better understand and optimize performance.

Metadata

Metadata

Labels

customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-triageWorkflow: This is a new issue that needs to be triaged to the appropriate team.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions