distill loss could not work in DDP

### Search before asking

- [x] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.


### Ultralytics YOLO Component

_No response_

### Bug

Hello @danielsyahputra :

Thank you for your distillation code! But, I found when I using it in DDP, **the distill loss always is 0**. I think you need do a little change in [here](https://github.com/danielsyahputra/yolo-distiller/blob/a2f004c6227ffb9e3c8fe172c824f96e51c20999/ultralytics/engine/trainer.py#L250) 
```Python
for name, ml in self.modelt.module.named_modules()
```
and also [here](https://github.com/danielsyahputra/yolo-distiller/blob/a2f004c6227ffb9e3c8fe172c824f96e51c20999/ultralytics/engine/trainer.py#L264)
```Python
for name, ml in self.models.module.named_modules():
```
Because the `module_name` in DDP mode has a `module.` prefix.

### Environment

OS                  Linux-5.15.0-142-generic-x86_64-with-glibc2.35
Environment         Linux
Python              3.12.2
Install             git
Path                /home/xuexufeng/project/ultralytics/ultralytics
RAM                 503.70 GB
Disk                2796.8/3518.7 GB
CPU                 AMD EPYC 7543 32-Core Processor
CPU count           128
GPU                 NVIDIA GeForce RTX 3090, 24253MiB
GPU count           8
CUDA                12.4

numpy               ✅ 2.1.1<=2.1.1,>=1.23.0
matplotlib          ✅ 3.10.0>=3.3.0
opencv-python       ✅ 4.11.0.86>=4.6.0
pillow              ✅ 11.1.0>=7.1.2
pyyaml              ✅ 6.0.2>=5.3.1
requests            ✅ 2.32.3>=2.23.0
scipy               ✅ 1.15.1>=1.4.1
torch               ✅ 2.6.0>=1.8.0
torch               ✅ 2.6.0!=2.4.0,>=1.8.0; sys_platform == "win32"
torchvision         ✅ 0.21.0>=0.9.0
tqdm                ✅ 4.67.1>=4.64.0
psutil              ✅ 6.1.1
py-cpuinfo          ✅ 9.0.0
pandas              ✅ 2.2.3>=1.1.4
seaborn             ✅ 0.13.2>=0.11.0
ultralytics-thop    ✅ 2.0.14>=2.0.0

### Minimal Reproducible Example

Here is the `distill.py`
```Python
from ultralytics import YOLO

teacher_model = YOLO("yolov5xu.pt")
student_model = YOLO("yolov5su.pt")

student_model.train(
    data="coco.yaml",
    teacher=teacher_model.model, # None if you don't wanna use knowledge distillation
    distillation_loss="cwd",
    epochs=300,
    workers=8,
    exist_ok=False,
    device=[0,1,2,3,4,5,6,7],
    batch=128
)
```
And run

```Bash
python distill.py
```


### Additional

_No response_

### Are you willing to submit a PR?

- [x] Yes I'd like to help by submitting a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distill loss could not work in DDP #13

Search before asking

Ultralytics YOLO Component

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

distill loss could not work in DDP #13

Description

Search before asking

Ultralytics YOLO Component

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions