Skip to content

Commit 5660796

Browse files
Updating FilesExt documentation (#1189)
## What changes are proposed in this pull request? **WHAT** - Update the documentation of `FilesExt` interfaces - `upload` - `upload_from` - `download` - `download_to` **WHY** Rewording some of the documentations to avoid confusion and increase clarity for users. ## How is this tested? N/A NO_CHANGELOG=true
1 parent cc0e2ec commit 5660796

File tree

1 file changed

+25
-15
lines changed

1 file changed

+25
-15
lines changed

databricks/sdk/mixins/files.py

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -784,12 +784,11 @@ def download(
784784
) -> DownloadResponse:
785785
"""Download a file.
786786
787-
Downloads a file of any size. The file contents are the response body.
788-
This is a standard HTTP file download, not a JSON RPC.
787+
Downloads a file as a stream into memory.
789788
790-
It is strongly recommended, for fault tolerance reasons,
791-
to iteratively consume from the stream with a maximum read(size)
792-
defined instead of using indefinite-size reads.
789+
Use this when you want to process the downloaded file in memory or pipe it into another system. Supports files of any size in SDK v0.72.0+. Earlier versions have a 5 GB file size limit.
790+
791+
If the download is successful, the function returns the downloaded file result. If the download is unsuccessful, the function raises an exception.
793792
794793
:param file_path: str
795794
The remote path of the file, e.g. /Volumes/path/to/your/file
@@ -817,14 +816,18 @@ def download_to(
817816
use_parallel: bool = False,
818817
parallelism: Optional[int] = None,
819818
) -> DownloadFileResult:
820-
"""Download a file to a local path. There would be no responses returned if the download is successful.
819+
"""Downloads a file directly to a local file path.
820+
821+
Use this when you want to write the file straight to disk instead of holding it in memory. Supports files of any size in SDK v0.72.0+. Earlier versions have a 5 GB file size limit.
822+
823+
Supports parallel download (use_parallel=True), which may improve performance for large files. This is available on all operating systems except Windows.
821824
822825
:param file_path: str
823826
The remote path of the file, e.g. /Volumes/path/to/your/file
824827
:param destination: str
825828
The local path where the file will be saved.
826829
:param overwrite: bool
827-
If true, an existing file will be overwritten. When not specified, assumed True.
830+
If true, an existing file will be overwritten. When not specified, defaults to True.
828831
:param use_parallel: bool
829832
If true, the download will be performed using multiple threads.
830833
:param parallelism: int
@@ -1078,18 +1081,22 @@ def upload(
10781081
parallelism: Optional[int] = None,
10791082
) -> UploadStreamResult:
10801083
"""
1081-
Upload a file with stream interface.
1084+
Uploads a file from memory or a stream interface.
1085+
1086+
Use this when you want to upload data already in memory or piped from another system. Supports files of any size in SDK v0.72.0+. Earlier versions have a 5 GB file size limit.
1087+
1088+
Limitations: If the storage account is on Azure and has firewall enabled, the maximum file size is 5GB.
10821089
10831090
:param file_path: str
10841091
The absolute remote path of the target file, e.g. /Volumes/path/to/your/file
10851092
:param contents: BinaryIO
10861093
The contents of the file to upload. This must be a BinaryIO stream.
10871094
:param overwrite: bool (optional)
1088-
If true, an existing file will be overwritten. When not specified, assumed True.
1095+
If true, an existing file will be overwritten. When not specified, defaults to True.
10891096
:param part_size: int (optional)
1090-
If set, multipart upload will use the value as its size per uploading part.
1097+
If set, multipart upload will use the value as its size per uploading part. If not set, an appropriate value will be automatically used.
10911098
:param use_parallel: bool (optional)
1092-
If true, the upload will be performed using multiple threads. Be aware that this will consume more memory
1099+
If true, the upload will be performed using multiple threads. Note that this will consume more memory
10931100
because multiple parts will be buffered in memory before being uploaded. The amount of memory used is proportional
10941101
to `parallelism * part_size`.
10951102
If false, the upload will be performed in a single thread.
@@ -1166,16 +1173,19 @@ def upload_from(
11661173
use_parallel: bool = True,
11671174
parallelism: Optional[int] = None,
11681175
) -> UploadFileResult:
1169-
"""Upload a file directly from a local path.
1176+
"""
1177+
Uploads a file from a local file path.
1178+
1179+
Use this when your data already exists on disk and you want to upload it directly without manually opening it yourself. Supports files of any size in SDK v0.72.0+. Earlier versions have a 5 GB file size limit.
11701180
11711181
:param file_path: str
11721182
The absolute remote path of the target file.
11731183
:param source_path: str
11741184
The local path of the file to upload. This must be a path to a local file.
1175-
:param part_size: int
1176-
The size of each part in bytes for multipart upload. This is a required parameter for multipart uploads.
1185+
:param part_size: int (optional)
1186+
If set, multipart upload will use the value as its size per uploading part. If not set, an appropriate default value will be automatically used.
11771187
:param overwrite: bool (optional)
1178-
If true, an existing file will be overwritten. When not specified, assumed True.
1188+
If true, an existing file will be overwritten. When not specified, defaults True.
11791189
:param use_parallel: bool (optional)
11801190
If true, the upload will be performed using multiple threads. Default is True.
11811191
:param parallelism: int (optional)

0 commit comments

Comments
 (0)