CUDA OOM during evaluation after Epoch 0 on A100 80GB (RFDETRSegSmall Seg) — OOM at lwdetr.py:774 res_i['masks'] = masks_i > 0.0 (alloc 2.32 GiB) even with run_test=False / eval=False

### Search before asking

- [x] I have searched the RF-DETR issues and found no similar bug report.


### Bug

Training of RFDETRSegSmall runs through Epoch 0 (2512 iters) successfully, then an evaluation phase labeled Test: starts and crashes with:

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.32 GiB

The stack trace points to mask postprocessing:

rfdetr/models/lwdetr.py, line 774: res_i['masks'] = masks_i > 0.0

Despite passing run_test=False and (from printed args) eval=False, the evaluation still runs at the end of the epoch and triggers the OOM.

### Environment

Environment
GPU / Driver

From nvidia-smi:

GPU: NVIDIA A100-SXM4-80GB

Driver: 565.57.01

CUDA: 12.7

nvidia-smi snapshot (from the run) showed:

Memory usage: 72415 MiB / 81920 MiB

GPU-Util: 0%

Processes table: empty (this looked odd because memory was still reported as used)

Timestamp in that output:

Fri Feb 6 21:09:49 2026
Python

From stack paths:

Python 3.11 (/usr/local/lib/python3.11/dist-packages/...)

CPU / RAM

From top snapshot:

load average: 33.87, 37.98, 49.11

CPU: 13.6 us, 1.4 sy, 85.0 id (lots of idle CPU overall)

RAM: 1031853 MiB total, 67125 MiB free, 860969 MiB buff/cache

Swap: 0

Multiple pt_main_thread processes were active; one showed very high CPU% (e.g. 2523%), others ~19–22%.

### Minimal Reproducible Example

model.train(
  dataset_dir=data_dir,
  output_dir=out_dir,
  epochs=20,
  batch_size=16,
  grad_accum_steps=2,
  amp=True,
  run_test=False,
  fp16_eval=True,
  eval_max_dets=100,
  do_benchmark=False,
  num_workers=12,
  early_stopping=False,
  tensorboard=True
)

### Additional

Also printed during training:

Grad accum steps: 2

Total batch size: 32

LENGTH OF DATA LOADER: 2512

![Image](https://github.com/user-attachments/assets/dcb9119e-0f9d-4734-88d3-3b415fa17673)

### Are you willing to submit a PR?

- [ ] Yes, I'd like to help by submitting a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA OOM during evaluation after Epoch 0 on A100 80GB (RFDETRSegSmall Seg) — OOM at lwdetr.py:774 res_i['masks'] = masks_i > 0.0 (alloc 2.32 GiB) even with run_test=False / eval=False #646

Search before asking

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA OOM during evaluation after Epoch 0 on A100 80GB (RFDETRSegSmall Seg) — OOM at lwdetr.py:774 res_i['masks'] = masks_i > 0.0 (alloc 2.32 GiB) even with run_test=False / eval=False #646

Description

Search before asking

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions