Skip to content

Commit 0365c32

Browse files
add comprehensive documentation on retries
1 parent d1bb2cc commit 0365c32

File tree

2 files changed

+81
-0
lines changed

2 files changed

+81
-0
lines changed

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -239,6 +239,7 @@ Contents
239239
developer
240240
hns_buckets
241241
rapid_storage_support
242+
retries
242243
fuse
243244
changelog
244245
code-of-conduct

docs/source/retries.rst

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
Retries
2+
=======
3+
4+
``gcsfs`` implements retry logic to handle transient errors and improve the reliability of operations against Google Cloud Storage.
5+
6+
Default Retry Implementation (Standard Buckets)
7+
-----------------------------------------------
8+
9+
For standard buckets, ``gcsfs`` uses a custom retry decorator (``retry_request``) for most HTTP requests. Since most high-level operations utilize this decorator internally, they benefit from the retry logic.
10+
11+
- **Applicable Methods:**
12+
- ``ls`` / ``_ls``: Listing objects and prefixes.
13+
- ``info`` / ``_info``: Retrieving object metadata.
14+
- ``cat`` / ``_cat_file``: Reading object contents.
15+
- ``get`` / ``_get_file``: Downloading objects.
16+
- ``put`` / ``_put_file``: Uploading objects (including resumable uploads).
17+
- ``mkdir`` / ``_mkdir``: Creating buckets.
18+
- ``rm`` / ``_rm_file``: Deleting objects.
19+
- ``mv`` / ``_mv_file``: Moving/renaming objects.
20+
- ``cp`` / ``_cp_file``: Copying objects.
21+
- **Number of Retries:** The default number of retries is **6**. This is defined as a class attribute ``retries = 6`` in ``GCSFileSystem``.
22+
- **Backoff Strategy:** Exponential backoff with jitter. The wait time between retries is calculated as ``min(random.random() + 2 ** (retry - 1), 32)``.
23+
- **Retriable Errors:**
24+
- ``requests.exceptions.ChunkedEncodingError``
25+
- ``requests.exceptions.ConnectionError``
26+
- ``requests.exceptions.ReadTimeout``
27+
- ``requests.exceptions.Timeout``
28+
- ``requests.exceptions.ProxyError``
29+
- ``requests.exceptions.SSLError``
30+
- ``requests.exceptions.ContentDecodingError``
31+
- ``google.auth.exceptions.RefreshError``
32+
- ``aiohttp.client_exceptions.ClientError``
33+
- ``ChecksumError``
34+
- HTTP status codes: 500-504, 408, 429.
35+
- HTTP status code 401 with "Invalid Credentials" message (auth expiration).
36+
37+
Hierarchical Namespace (HNS) Buckets
38+
------------------------------------
39+
40+
For HNS buckets, ``ExtendedGcsFileSystem`` utilizes the specialized Storage Control client (``StorageControlAsyncClient``) for folder-level operations (e.g., ``mkdir``, ``rename``).
41+
42+
- These calls utilize the underlying Google Cloud Python SDK's default retry behavior. Standard ``gcsfs`` retry logic (``retry_request``) is not applied to these control plane calls.
43+
- **Applicable Methods:**
44+
- ``get_storage_layout``: Used to determine bucket type.
45+
- ``create_folder``: Used for ``mkdir``.
46+
- ``get_folder``: Used for directory metadata and existence checks.
47+
- ``list_folders``: Used for directory listings (``ls``).
48+
- ``rename_folder``: Used for moving/renaming directories (``mv``).
49+
- **Non-Retried Methods:** Methods like ``delete_folder`` (used for ``rmdir``) are not retried by default.
50+
- **Retriable Errors:**
51+
- ``google.api_core.exceptions.DeadlineExceeded``
52+
- ``google.api_core.exceptions.InternalServerError``
53+
- ``google.api_core.exceptions.ResourceExhausted``
54+
- ``google.api_core.exceptions.ServiceUnavailable``
55+
- ``google.api_core.exceptions.Unknown``
56+
- **Backoff Strategy:** Exponential backoff with ``initial=1.0s``, ``maximum=60.0s``, and ``multiplier=2.0``.
57+
- **Overall Timeout (Deadline):** 60.0s
58+
59+
Rapid Storage (Zonal Buckets)
60+
-----------------------------
61+
62+
For Zonal buckets, ``ZonalFile`` utilizes the specialized gRPC clients (``AsyncMultiRangeDownloader`` for reads and ``AsyncAppendableObjectWriter`` for writes).
63+
64+
- Similar to HNS buckets, control plane operations for Zonal buckets (such as ``get_storage_layout`` or folder operations) utilize the same ``StorageControlAsyncClient`` retry mechanism described in the HNS section above.
65+
- File read/write operations (data plane) for Zonal buckets utilize the underlying Google Cloud Python SDK's default retry behavior for gRPC streams. Standard ``gcsfs`` retry logic (``retry_request``) is not applied to these data plane calls.
66+
- **AsyncMultiRangeDownloader (MRD) Retries (Reads):**
67+
- **Applicable Methods:**
68+
- ``open``: Establishes the initial gRPC stream.
69+
- ``download_ranges``: Fetches multiple byte ranges in a single request.
70+
- **Retriable Errors:** ``InternalServerError``, ``ServiceUnavailable``, ``DeadlineExceeded``, ``TooManyRequests`` (429), and ``Aborted`` (allowing the download to resume from the last successful byte offset without re-transferring data).
71+
- **Backoff Strategy:** Exponential backoff with ``initial=1.0s``, ``maximum=60.0s``, and ``multiplier=2.0``.
72+
- **Overall Timeout (Deadline):** 120.0s.
73+
- **AsyncAppendableObjectWriter (AAOW) Retries (Writes):**
74+
- **Applicable Methods:**
75+
- ``open``: Establishes the initial bidirectional gRPC stream.
76+
- ``append``: Streams data to the object.
77+
- **Methods without Automatic Retries:** ``flush`` and ``finalize`` do not have automatic retry logic in the underlying client.
78+
- **Retriable Errors:** ``InternalServerError``, ``ServiceUnavailable``, ``DeadlineExceeded``, ``TooManyRequests`` (429), and ``BidiWriteObjectRedirectedError`` (handled by re-opening the stream and resuming from the last persisted offset).
79+
- **Backoff Strategy:** Exponential backoff with ``initial=1.0s``, ``maximum=60.0s``, and ``multiplier=2.0``.
80+
- **Overall Timeout (Deadline):** 120.0s.

0 commit comments

Comments
 (0)