You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Parallel Large File Upload and Download in FilesAPI (#1075)
## What changes are proposed in this pull request?
**WHAT**
* Add a new interface `upload_from` to `databricks.sdk.mixins.FilesExt`
to support upload from a file in local filesystem.
* Improve `databricks.sdk.mixins.FilesExt` upload throughput by
uploading data in parallel by default.
* Add a new interface `download_to` to `databricks.sdk.mixins.FilesExt`
to support download to a file in local filesystem. This interface
downloads file in parallel to reduce the end-to-end latency of the
download. The parallel downloading feature is temporarily unavailable to
Windows.
* Improve `databricks.sdk.mixins.FilesExt.upload` to support uploading
when Presigned URL is not enabled for the Workspace by introducing a
fallback to Single Part Upload.
* Add `use_parallel`, `parallelism`, `part_size` field for
`databricks.sdk.mixins.FilesExt.upload`.
**WHY**
* The `upload_from` and `download_to` are added for two purposes:
* Free users from opening file when uploading and downloading
* Allow client to perform parallel uploading and downloading to improve
end-to-end latency of the operations.
* The function fields were added to allow users to fine tune the
performance of the upload operation easily. The configurations will be
automatically set to give good enough performance, but the users can
easily overwrite them if they have specific requirements.
* The configurations for the `databricks.sdk.mixins.FilesExt` were
updated to have a `files_ext` prefix to organize the configurations.
## How is this tested?
The functionalities are tested using unit tests, and manual tests over
benchmarking scrips running in local laptop and in Notebooks in
different real workspaces.
Copy file name to clipboardExpand all lines: NEXT_CHANGELOG.md
+16Lines changed: 16 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,10 +4,26 @@
4
4
5
5
### New Features and Improvements
6
6
7
+
* Add a new interface `upload_from` to `databricks.sdk.mixins.FilesExt` to support upload from a file in local filesystem.
8
+
* Improve `databricks.sdk.mixins.FilesExt` upload throughput by uploading data in parallel by default.
9
+
* Add a new interface `download_to` to `databricks.sdk.mixins.FilesExt` to support download to a file in local filesystem. This interface will also download the file in parallel by default. Parallel downloading is currently unavailable on Windows.
10
+
* Improve `databricks.sdk.mixins.FilesExt.upload` to support uploading when Presigned URL is not enabled for the Workspace by introducing a fallback to Single Part Upload.
11
+
7
12
### Bug Fixes
8
13
9
14
### Documentation
10
15
11
16
### Internal Changes
12
17
13
18
### API Changes
19
+
20
+
* Add `upload_from()`, `download_to()` method for `databricks.sdk.mixins.FilesExt`.
21
+
* Add `use_parallel`, `parallelism`, `part_size` field for `databricks.sdk.mixins.FilesExt.upload`.
22
+
*[Breaking] Change `files_api_client_download_max_total_recovers` to `files_ext_client_download_max_total_recovers` for `databricks.sdk.Config`
23
+
*[Breaking] Change `files_api_client_download_max_total_recovers_without_progressing` to `files_ext_client_download_max_total_recovers_without_progressing` for `databricks.sdk.Config`
24
+
*[Breaking] Change `multipart_upload_min_stream_size` to `files_ext_multipart_upload_min_stream_size` for `databricks.sdk.Config`
25
+
*[Breaking] Change `multipart_upload_batch_url_count` to `files_ext_multipart_upload_batch_url_count` for `databricks.sdk.Config`
26
+
*[Breaking] Change `multipart_upload_chunk_size` to `files_ext_multipart_upload_default_part_size` for `databricks.sdk.Config`
27
+
*[Breaking] Change `multipart_upload_url_expiration_duration` to `files_ext_multipart_upload_url_expiration_duration` for `databricks.sdk.Config`
28
+
*[Breaking] Change `multipart_upload_max_retries` to `files_ext_multipart_upload_max_retries` for `databricks.sdk.Config`
0 commit comments