Search before asking
Bug
Training of RFDETRSegSmall runs through Epoch 0 (2512 iters) successfully, then an evaluation phase labeled Test: starts and crashes with:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.32 GiB
The stack trace points to mask postprocessing:
rfdetr/models/lwdetr.py, line 774: res_i['masks'] = masks_i > 0.0
Despite passing run_test=False and (from printed args) eval=False, the evaluation still runs at the end of the epoch and triggers the OOM.
Environment
Environment
GPU / Driver
From nvidia-smi:
GPU: NVIDIA A100-SXM4-80GB
Driver: 565.57.01
CUDA: 12.7
nvidia-smi snapshot (from the run) showed:
Memory usage: 72415 MiB / 81920 MiB
GPU-Util: 0%
Processes table: empty (this looked odd because memory was still reported as used)
Timestamp in that output:
Fri Feb 6 21:09:49 2026
Python
From stack paths:
Python 3.11 (/usr/local/lib/python3.11/dist-packages/...)
CPU / RAM
From top snapshot:
load average: 33.87, 37.98, 49.11
CPU: 13.6 us, 1.4 sy, 85.0 id (lots of idle CPU overall)
RAM: 1031853 MiB total, 67125 MiB free, 860969 MiB buff/cache
Swap: 0
Multiple pt_main_thread processes were active; one showed very high CPU% (e.g. 2523%), others ~19–22%.
Minimal Reproducible Example
model.train(
dataset_dir=data_dir,
output_dir=out_dir,
epochs=20,
batch_size=16,
grad_accum_steps=2,
amp=True,
run_test=False,
fp16_eval=True,
eval_max_dets=100,
do_benchmark=False,
num_workers=12,
early_stopping=False,
tensorboard=True
)
Additional
Also printed during training:
Grad accum steps: 2
Total batch size: 32
LENGTH OF DATA LOADER: 2512

Are you willing to submit a PR?