fix: remove hardcoded cuda:0 in OTA loss for multi-GPU support by Mr-Neutr0n · Pull Request #2146 · WongKinYiu/yolov7

Mr-Neutr0n · 2026-02-11T18:18:33Z

Bug

The OTA loss computation hardcodes cuda:0 for tensor allocation, causing failures during multi-GPU (DDP) training when the model runs on other GPUs.

In utils/loss.py, several locations use .cuda() or device='cuda:0' when creating tensors, which forces them onto GPU 0 regardless of where the model and input data reside. This causes device mismatch errors during distributed training.

Fix

Replaced hardcoded device references with dynamic device inference from input tensors:

.cuda() → .to(logits.device) in RankSort, aLRPLoss, and APLoss forward methods
device='cuda:0' → device=targets.device in build_targets and build_targets2 methods

This is consistent with how device handling is already done elsewhere in the same file (e.g., torch.ones(7, device=targets.device)).

fix: use input tensor device instead of hardcoded cuda:0 in OTA loss

7cfb8fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove hardcoded cuda:0 in OTA loss for multi-GPU support#2146

fix: remove hardcoded cuda:0 in OTA loss for multi-GPU support#2146
Mr-Neutr0n wants to merge 1 commit intoWongKinYiu:mainfrom
Mr-Neutr0n:fix/ota-loss-multi-gpu-device

Mr-Neutr0n commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mr-Neutr0n commented Feb 11, 2026

Bug

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant