Skip to content

Max GPU Memory Limitation? #38

@MustafaAlpAutonomy

Description

@MustafaAlpAutonomy

Hello,

Thank you for this library, I just started researching RL and I see that single file implementations are nice to understand, and your library is extremely quick!

I run on my Laptop 3070 GPU with 8GB Ram, and when I try to use high total timesteps > 2e8, I get following error;

2025-09-29 17:11:57.621557: E external/xla/xla/service/gpu/gpu_hlo_schedule.cc:795] The byte size of input/output arguments (8501100448) exceeds the base limit (6244368384). This indicates an error in the calculation!
2025-09-29 17:11:57.785503: W external/xla/xla/hlo/transforms/simplifiers/hlo_rematerialization.cc:3023] Can't reduce memory use below 0B (0 bytes) by rematerialization; only reduced to 8.00GiB (8595834552 bytes), down from 8.00GiB (8595644652 bytes) originally
2025-09-29 17:12:11.556856: W external/xla/xla/tsl/framework/bfc_allocator.cc:501] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.86GiB (rounded to 1999872000)requested by op
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2025-09-29 17:12:11.557028: W external/xla/xla/tsl/framework/bfc_allocator.cc:512] *************************************************************************************************___
E0929 17:12:11.557053 22681 pjrt_stream_executor_client.cc:2916] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 1999872000 bytes. [tf-allocator-allocation-error='']

What is the best way to chunk the training into multiple pieces, so I can train for a longer durations?

Regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions