Skip to content

Commit 69682cd

Browse files
jabberaamoeba
andauthored
GH-45758 [Python] Add AzureFileSystem documentation (#45759)
### Rationale for this change Missing documentation on AzureFileSystem ### What changes are included in this PR? Added ArrowFileSystem documentation ### Are these changes tested? Not yet ### Are there any user-facing changes? Documentation * GitHub Issue: #45758 Lead-authored-by: Mike Barry <[email protected]> Co-authored-by: Mike Barry <[email protected]> Co-authored-by: Bryce Mecum <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
1 parent dc84232 commit 69682cd

File tree

2 files changed

+42
-0
lines changed

2 files changed

+42
-0
lines changed

docs/source/python/api/filesystems.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ Filesystem Implementations
4343
GcsFileSystem
4444
HadoopFileSystem
4545
SubTreeFileSystem
46+
AzureFileSystem
4647

4748
To define filesystems with behavior implemented in Python:
4849

docs/source/python/filesystems.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Pyarrow implements natively the following filesystem subclasses:
4242
* :ref:`filesystem-s3` (:class:`S3FileSystem`)
4343
* :ref:`filesystem-gcs` (:class:`GcsFileSystem`)
4444
* :ref:`filesystem-hdfs` (:class:`HadoopFileSystem`)
45+
* :ref:`filesystem-azurefs` (:class:`AzureFileSystem`)
4546

4647
It is also possible to use your own fsspec-compliant filesystem with pyarrow functionalities as described in the section :ref:`filesystem-fsspec`.
4748

@@ -295,6 +296,46 @@ some environment variables.
295296
In contrast to the legacy HDFS filesystem with ``pa.hdfs.connect``, setting
296297
``CLASSPATH`` is not optional (pyarrow will not attempt to infer it).
297298

299+
.. _filesystem-azurefs:
300+
301+
Azure Storage File System
302+
-------------------------
303+
304+
PyArrow implements natively an Azure filesystem for Azure Blob Storage with or
305+
without heirarchical namespace enabled.
306+
307+
The :class:`AzureFileSystem` constructor has several options to configure the
308+
Azure Blob Storage connection (e.g. account name, account key, SAS token, etc.).
309+
310+
If neither ``account_key`` or ``sas_token`` is specified a `DefaultAzureCredential <https://github.com/Azure/azure-sdk-for-cpp/blob/main/sdk/identity/azure-identity/README.md#defaultazurecredential>`__
311+
is used for authentication. This means it will try several types of authentication
312+
and go with the first one that works. If any authentication parameters are provided when
313+
initialising the FileSystem, they will be used instead of the default credential.
314+
315+
Example showing how you can read contents from an Azure Blob Storage account::
316+
317+
>>> from pyarrow import fs
318+
>>> azure_fs = fs.AzureFileSystem(account_name='myaccount')
319+
320+
# List all contents in a container, recursively
321+
>>> azure_fs.get_file_info(fs.FileSelector('my-container', recursive=True))
322+
[<FileInfo for 'my-container/File1': type=FileType.File, size=10>,
323+
<FileInfo for 'my-container/File2': type=FileType.File, size=20>,
324+
<FileInfo for 'my-container/Dir1': type=FileType.Directory>,
325+
<FileInfo for 'my-container/Dir1/File3': type=FileType.File, size=30>]
326+
327+
# Open a file for reading and download its contents
328+
>>> f = azure_fs.open_input_stream('my-container/File1')
329+
>>> f.readall()
330+
b'some data'
331+
332+
For more details on the parameters and usage, refer to the :class:`AzureFileSystem` class documentation.
333+
334+
.. seealso::
335+
336+
See the `Azure SDK for C++ documentation <https://github.com/Azure/azure-sdk-for-cpp>`__
337+
for more information on authentication and configuration options.
338+
298339
.. _filesystem-fsspec:
299340

300341
Using fsspec-compatible filesystems with Arrow

0 commit comments

Comments
 (0)