|
| 1 | +Retries |
| 2 | +======= |
| 3 | + |
| 4 | +``gcsfs`` implements retry logic to handle transient errors and improve the reliability of operations against Google Cloud Storage. |
| 5 | + |
| 6 | +Default Retry Implementation (Standard Buckets) |
| 7 | +----------------------------------------------- |
| 8 | + |
| 9 | +For standard buckets, ``gcsfs`` uses a custom retry decorator (``retry_request``) for most HTTP requests. Since most high-level operations utilize this decorator internally, they benefit from the retry logic. |
| 10 | + |
| 11 | +- **Applicable Methods:** |
| 12 | + - ``ls`` / ``_ls``: Listing objects and prefixes. |
| 13 | + - ``info`` / ``_info``: Retrieving object metadata. |
| 14 | + - ``cat`` / ``_cat_file``: Reading object contents. |
| 15 | + - ``get`` / ``_get_file``: Downloading objects. |
| 16 | + - ``put`` / ``_put_file``: Uploading objects (including resumable uploads). |
| 17 | + - ``mkdir`` / ``_mkdir``: Creating buckets. |
| 18 | + - ``rm`` / ``_rm_file``: Deleting objects. |
| 19 | + - ``mv`` / ``_mv_file``: Moving/renaming objects. |
| 20 | + - ``cp`` / ``_cp_file``: Copying objects. |
| 21 | +- **Number of Retries:** The default number of retries is **6**. This is defined as a class attribute ``retries = 6`` in ``GCSFileSystem``. |
| 22 | +- **Backoff Strategy:** Exponential backoff with jitter. The wait time between retries is calculated as ``min(random.random() + 2 ** (retry - 1), 32)``. |
| 23 | +- **Retriable Errors:** |
| 24 | + - ``requests.exceptions.ChunkedEncodingError`` |
| 25 | + - ``requests.exceptions.ConnectionError`` |
| 26 | + - ``requests.exceptions.ReadTimeout`` |
| 27 | + - ``requests.exceptions.Timeout`` |
| 28 | + - ``requests.exceptions.ProxyError`` |
| 29 | + - ``requests.exceptions.SSLError`` |
| 30 | + - ``requests.exceptions.ContentDecodingError`` |
| 31 | + - ``google.auth.exceptions.RefreshError`` |
| 32 | + - ``aiohttp.client_exceptions.ClientError`` |
| 33 | + - ``ChecksumError`` |
| 34 | + - HTTP status codes: 500-504, 408, 429. |
| 35 | + - HTTP status code 401 with "Invalid Credentials" message (auth expiration). |
| 36 | + |
| 37 | +Hierarchical Namespace (HNS) Buckets |
| 38 | +------------------------------------ |
| 39 | + |
| 40 | +For HNS buckets, ``ExtendedGcsFileSystem`` utilizes the specialized Storage Control client (``StorageControlAsyncClient``) for folder-level operations (e.g., ``mkdir``, ``rename``). |
| 41 | + |
| 42 | +- These calls utilize the underlying Google Cloud Python SDK's default retry behavior. Standard ``gcsfs`` retry logic (``retry_request``) is not applied to these control plane calls. |
| 43 | +- **Applicable Methods:** |
| 44 | + - ``get_storage_layout``: Used to determine bucket type. |
| 45 | + - ``create_folder``: Used for ``mkdir``. |
| 46 | + - ``get_folder``: Used for directory metadata and existence checks. |
| 47 | + - ``list_folders``: Used for directory listings (``ls``). |
| 48 | + - ``rename_folder``: Used for moving/renaming directories (``mv``). |
| 49 | +- **Non-Retried Methods:** Methods like ``delete_folder`` (used for ``rmdir``) are not retried by default. |
| 50 | +- **Retriable Errors:** |
| 51 | + - ``google.api_core.exceptions.DeadlineExceeded`` |
| 52 | + - ``google.api_core.exceptions.InternalServerError`` |
| 53 | + - ``google.api_core.exceptions.ResourceExhausted`` |
| 54 | + - ``google.api_core.exceptions.ServiceUnavailable`` |
| 55 | + - ``google.api_core.exceptions.Unknown`` |
| 56 | +- **Backoff Strategy:** Exponential backoff with ``initial=1.0s``, ``maximum=60.0s``, and ``multiplier=2.0``. |
| 57 | +- **Overall Timeout (Deadline):** 60.0s |
| 58 | + |
| 59 | +Rapid Storage (Zonal Buckets) |
| 60 | +----------------------------- |
| 61 | + |
| 62 | +For Zonal buckets, ``ZonalFile`` utilizes the specialized gRPC clients (``AsyncMultiRangeDownloader`` for reads and ``AsyncAppendableObjectWriter`` for writes). |
| 63 | + |
| 64 | +- Similar to HNS buckets, control plane operations for Zonal buckets (such as ``get_storage_layout`` or folder operations) utilize the same ``StorageControlAsyncClient`` retry mechanism described in the HNS section above. |
| 65 | +- File read/write operations (data plane) for Zonal buckets utilize the underlying Google Cloud Python SDK's default retry behavior for gRPC streams. Standard ``gcsfs`` retry logic (``retry_request``) is not applied to these data plane calls. |
| 66 | +- **AsyncMultiRangeDownloader (MRD) Retries (Reads):** |
| 67 | + - **Applicable Methods:** |
| 68 | + - ``open``: Establishes the initial gRPC stream. |
| 69 | + - ``download_ranges``: Fetches multiple byte ranges in a single request. |
| 70 | + - **Retriable Errors:** ``InternalServerError``, ``ServiceUnavailable``, ``DeadlineExceeded``, ``TooManyRequests`` (429), and ``Aborted`` (allowing the download to resume from the last successful byte offset without re-transferring data). |
| 71 | + - **Backoff Strategy:** Exponential backoff with ``initial=1.0s``, ``maximum=60.0s``, and ``multiplier=2.0``. |
| 72 | + - **Overall Timeout (Deadline):** 120.0s. |
| 73 | +- **AsyncAppendableObjectWriter (AAOW) Retries (Writes):** |
| 74 | + - **Applicable Methods:** |
| 75 | + - ``open``: Establishes the initial bidirectional gRPC stream. |
| 76 | + - ``append``: Streams data to the object. |
| 77 | + - **Methods without Automatic Retries:** ``flush`` and ``finalize`` do not have automatic retry logic in the underlying client. |
| 78 | + - **Retriable Errors:** ``InternalServerError``, ``ServiceUnavailable``, ``DeadlineExceeded``, ``TooManyRequests`` (429), and ``BidiWriteObjectRedirectedError`` (handled by re-opening the stream and resuming from the last persisted offset). |
| 79 | + - **Backoff Strategy:** Exponential backoff with ``initial=1.0s``, ``maximum=60.0s``, and ``multiplier=2.0``. |
| 80 | + - **Overall Timeout (Deadline):** 120.0s. |
0 commit comments