Document the default garbage_collection_threshold value and improve the organization of cuda docs (pytorch#155341)

ParagEkbote · pytorchmergebot · commit 2908c10259ba · 2025-06-08T22:09:35.000Z
Fixes pytorch#150917 As mentioned in the issue, I've updated the documentation of `garbage_collection_threshold`and improved the organization. Could you please review? Pull Request resolved: pytorch#155341 Approved by: https://github.com/AlannaBurke, https://github.com/ngimel
diff --git a/docs/source/notes/cuda.rst b/docs/source/notes/cuda.rst
@@ -511,6 +511,8 @@ Available options:
   80% of the total memory allocated to the GPU application). The algorithm prefers
   to free old & unused blocks first to avoid freeing blocks that are actively being
   reused. The threshold value should be between greater than 0.0 and less than 1.0.
+  The default value is set at 1.0.
+
   ``garbage_collection_threshold`` is only meaningful with ``backend:native``.
   With ``backend:cudaMallocAsync``, ``garbage_collection_threshold`` is ignored.
 * ``expandable_segments`` (experimental, default: `False`) If set to `True`, this setting instructs
@@ -546,20 +548,20 @@ Available options:
   appended to the end of the segment. This process does not create as many slivers
   of unusable memory, so it is more likely to succeed at finding this memory.
 
-  `pinned_use_cuda_host_register` option is a boolean flag that determines whether to
+* `pinned_use_cuda_host_register` option is a boolean flag that determines whether to
   use the CUDA API's cudaHostRegister function for allocating pinned memory instead
   of the default cudaHostAlloc. When set to True, the memory is allocated using regular
   malloc and then pages are mapped to the memory before calling cudaHostRegister.
   This pre-mapping of pages helps reduce the lock time during the execution
   of cudaHostRegister.
 
-  `pinned_num_register_threads` option is only valid when pinned_use_cuda_host_register
+* `pinned_num_register_threads` option is only valid when pinned_use_cuda_host_register
   is set to True. By default, one thread is used to map the pages. This option allows
   using more threads to parallelize the page mapping operations to reduce the overall
   allocation time of pinned memory. A good value for this option is 8 based on
   benchmarking results.
 
-  `pinned_use_background_threads` option is a boolean flag to enable background thread
+* `pinned_use_background_threads` option is a boolean flag to enable background thread
   for processing events. This avoids any slow path associated with querying/processing of
   events in the fast allocation path. This feature is disabled by default.