Skip to content

Perform blob download-batch concurrently #32235

@ryancasburn-KAI

Description

@ryancasburn-KAI

Related command
az storage blob download-batch

Is your feature request related to a problem? Please describe.
I have a need to download a large number of small blobs. I set the "--max-connections" property thinking this would allow for multiple files to download at once, but this does not appear to be the case. I'm downloading these blobs to an Azure VM (with more than 20 cores) in the same region and network receive rate is hovering around 1 MiB/second. I'd expect that rate to feasibly be well over 100 MiB/second or more with concurrency.

Describe the solution you'd like
My understanding reading through the code repository is that this functionality is handled with this function:

def storage_blob_download_batch(client, source, destination, source_container_name, pattern=None, dryrun=False,

At a minimum, I'd like to see the for loop which loops over the blobs_to_download dictionary be converted into an asynchronous process. This either means that _download_blob function (and everything lower in the stack) would need to be async functions. Or an async run in thread.

It would be even more awesome if the collect_blobs function could return an async generator, which could allow the downloads to start before the full list of blobs has been collected (though I realize this has more side effects). My current download took several hours to accumulate the list of blobs to download (since the API list call is paginated).

Describe alternatives you've considered
There isn't really any great alternatives other than rewriting my own version of this api which is undesirable.

Additional context
The sample download-batch I am running is 12,024,850 blobs with an average size of 10KiB. The download is running on an Azure VM of size: NC24ads_A100_v4 in the same region (East US 2) as the storage account. My expectation is that this would feasibly be able to be complete in under 2 hours, yet we are at about 14 hours and not yet to 30% complete yet.

Metadata

Metadata

Labels

Auto-AssignAuto assign by botAzure CLI TeamThe command of the issue is owned by Azure CLI teamPossible-SolutionSimilar-IssueStorageaz storagecustomer-reportedIssues that are reported by GitHub users external to the Azure organization.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions