-
Notifications
You must be signed in to change notification settings - Fork 84
Description
Hello,
Thank you for this library, I just started researching RL and I see that single file implementations are nice to understand, and your library is extremely quick!
I run on my Laptop 3070 GPU with 8GB Ram, and when I try to use high total timesteps > 2e8, I get following error;
2025-09-29 17:11:57.621557: E external/xla/xla/service/gpu/gpu_hlo_schedule.cc:795] The byte size of input/output arguments (8501100448) exceeds the base limit (6244368384). This indicates an error in the calculation!
2025-09-29 17:11:57.785503: W external/xla/xla/hlo/transforms/simplifiers/hlo_rematerialization.cc:3023] Can't reduce memory use below 0B (0 bytes) by rematerialization; only reduced to 8.00GiB (8595834552 bytes), down from 8.00GiB (8595644652 bytes) originally
2025-09-29 17:12:11.556856: W external/xla/xla/tsl/framework/bfc_allocator.cc:501] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.86GiB (rounded to 1999872000)requested by op
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2025-09-29 17:12:11.557028: W external/xla/xla/tsl/framework/bfc_allocator.cc:512] *************************************************************************************************___
E0929 17:12:11.557053 22681 pjrt_stream_executor_client.cc:2916] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 1999872000 bytes. [tf-allocator-allocation-error='']
What is the best way to chunk the training into multiple pieces, so I can train for a longer durations?
Regards