Memory usage keeps growing without limit when training PP-OCRv5 rec model with PaddleOCR v3.2.0 on PaddlePaddle 3.1.0/3.2.0

### 🔎 Search before asking

- [x] I have searched the PaddleOCR [Docs](https://paddlepaddle.github.io/PaddleOCR/) and found no similar bug report.
- [x] I have searched the PaddleOCR [Issues](https://github.com/PaddlePaddle/PaddleOCR/issues) and found no similar bug report.
- [x] I have searched the PaddleOCR [Discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions) and found no similar bug report.

### 🐛 Bug (问题描述)

When training PP-OCRv5 rec model using PaddleOCR v3.2.0 inside Docker, I noticed that memory usage keeps increasing without limit on the following PaddlePaddle versions:

- PaddlePaddle 3.1.0 (paddlepaddle/paddle:3.1.0-gpu-cuda12.9-cudnn9.9, paddlepaddle/paddle:3.1.0-gpu-cuda12.6-cudnn9.5)
- PaddlePaddle 3.2.0 (paddlepaddle/paddle:3.2.0-gpu-cuda12.9-cudnn9.9, paddlepaddle/paddle:3.2.0-gpu-cuda12.6-cudnn9.5)

The same training setup works normally (no memory leak) on PaddlePaddle 3.0.0 (paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5).

### 🏃‍♂️ Environment (运行环境)

PaddleOCR v3.2.0 release
PaddlePaddle 3.1.0/3.2.0 docker images
OS: Ubuntu 24.04.3 LTS
CPU: Intel(R) Xeon(R) Platinum 8469C
GPU: H20 96G
Memory: 1.0T

### 🌰 Minimal Reproducible Example (最小可复现问题的Demo)

python -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c xxx.yml

Training data: ~10M samples (3×48×320)
Memory: eventually occupying 1 TB RAM after 15 epochs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory usage keeps growing without limit when training PP-OCRv5 rec model with PaddleOCR v3.2.0 on PaddlePaddle 3.1.0/3.2.0 #16613

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory usage keeps growing without limit when training PP-OCRv5 rec model with PaddleOCR v3.2.0 on PaddlePaddle 3.1.0/3.2.0 #16613

Description

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions