Skip to content

Conversation

@yuanjieding-db
Copy link
Collaborator

@yuanjieding-db yuanjieding-db commented Oct 14, 2025

What changes are proposed in this pull request?

WHAT

  • Add a new interface upload_from to databricks.sdk.mixins.FilesExt to support upload from a file in local filesystem.
  • Improve databricks.sdk.mixins.FilesExt upload throughput by uploading data in parallel by default.
  • Add a new interface download_to to databricks.sdk.mixins.FilesExt to support download to a file in local filesystem. This interface downloads file in parallel to reduce the end-to-end latency of the download. The parallel downloading feature is temporarily unavailable to Windows.
  • Improve databricks.sdk.mixins.FilesExt.upload to support uploading when Presigned URL is not enabled for the Workspace by introducing a fallback to Single Part Upload.
  • Add use_parallel, parallelism, part_size field for databricks.sdk.mixins.FilesExt.upload.

WHY

  • The upload_from and download_to are added for two purposes:
    • Free users from opening file when uploading and downloading
    • Allow client to perform parallel uploading and downloading to improve end-to-end latency of the operations.
  • The function fields were added to allow users to fine tune the performance of the upload operation easily. The configurations will be automatically set to give good enough performance, but the users can easily overwrite them if they have specific requirements.
  • The configurations for the databricks.sdk.mixins.FilesExt were updated to have a files_ext prefix to organize the configurations.

How is this tested?

The functionalities are tested using unit tests, and manual tests over benchmarking scrips running in local laptop and in Notebooks in different real workspaces.

@yuanjieding-db yuanjieding-db force-pushed the yuanjie/large_file_upload_pupr branch from 345dd69 to cb773f1 Compare October 17, 2025 09:25
@yuanjieding-db yuanjieding-db force-pushed the yuanjie/large_file_upload_pupr branch from 3f38293 to 770e0cd Compare October 17, 2025 11:57
@github-actions
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 1075
  • Commit SHA: 770e0cd1d5f35cc0c871cf78fa3ec6d40fea030b

Checks will be approved automatically on success.

@parthban-db parthban-db changed the title Public Preview of Large File Upload through Files API Add Parallel Large File Upload and Download in FilesAPI Oct 17, 2025
@parthban-db parthban-db added this pull request to the merge queue Oct 20, 2025
Merged via the queue into databricks:main with commit b4eff2f Oct 20, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants