使用ch_PP-OCRv4_rec训练数据集报错：Out of memory error on GPU 0. Cannot allocate 129.394531MB memory on GPU 0, 23.611938GB memory has been allocated and available memory is only 31.687500MB. #12284

lili-changjiang · 2024-04-23T14:22:03Z

lili-changjiang
Apr 23, 2024

系统环境/System Environment：Linux
版本号/Version：Paddle：2.4.2.post112
运行指令/Command Code： python tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec.yml
完整报错/Complete Error Message：Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 129.394531MB memory on GPU 0, 23.611938GB memory has been allocated and available memory is only 31.687500MB.

Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.
If the above ways do not solve the out of memory problem, you can try to use CUDA managed memory. The command is export FLAGS_use_cuda_managed_memory=false.
(at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:95)

我设置的ch_PP-OCRv4_rec.yml:

Global:
debug: false
use_gpu: true
epoch_num: 20
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_ppocr_v4
save_epoch_step: 3
eval_batch_step: [0, 100]
cal_metric_during_train: true
pretrained_model: ./pretrained_models/ch_PP-OCRv4_rec_train/student
checkpoints:
save_inference_dir:
use_visualdl: false
infer_img: doc/imgs_words/ch/word_1.jpg
character_dict_path: ppocr/utils/ppocr_keys_v1.txt
max_text_length: &max_text_length 25
infer_mode: false
use_space_char: true
distributed: true
save_res_path: ./output/rec/predicts_ppocrv3.txt

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0001
warmup_epoch: 2
regularizer:
name: L2
factor: 3.0e-05

Architecture:
model_type: rec
algorithm: SVTR_LCNet
Transform:
Backbone:
name: PPLCNetV3
scale: 0.95
Head:
name: MultiHead
head_list:
- CTCHead:
Neck:
name: svtr
dims: 120
depth: 2
hidden_dims: 120
kernel_size: [1, 3]
use_guide: True
Head:
fc_decay: 0.00001
- NRTRHead:
nrtr_dim: 384
max_text_length: *max_text_length

Loss:
name: MultiLoss
loss_config_list:
- CTCLoss:
- NRTRLoss:

PostProcess:
name: CTCLabelDecode

Metric:
name: RecMetric
main_indicator: acc

Train:
dataset:
name: MultiScaleDataSet
ds_width: false
data_dir: ./train_data/train
ext_op_transform_idx: 1
label_file_list:
- ./train_data/rec/train.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [ 48, 320, 3 ]

    max_text_length: *max_text_length
- RecAug:
- MultiLabelEncode:
    gtc_encode: NRTRLabelEncode
- KeepKeys:
    keep_keys:
    - image
    - label_ctc
    - label_gtc
    - length
    - valid_ratio

sampler:
name: MultiScaleSampler
scales: [[320, 32], [320, 48], [320, 64]]
first_bs: &bs 192
fix_bs: false
divided_factor: [8, 16] # w, h
is_training: True
loader:
shuffle: true
batch_size_per_card: 2

drop_last: true
num_workers: 8

Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/val
label_file_list:
- ./train_data/rec/val.txt
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- MultiLabelEncode:
gtc_encode: NRTRLabelEncode
- RecResizeImg:
image_shape: [3, 48, 320]
- KeepKeys:
keep_keys:
- image
- label_ctc
- label_gtc
- length
- valid_ratio
loader:
shuffle: false
drop_last: false
batch_size_per_card: 1
num_workers: 4

为什么我的24G显存一下就满了，一点跑不了

UserWangZz · 2024-04-24T02:19:04Z

UserWangZz
Apr 24, 2024
Collaborator

运行前显卡上有没有其他任务？

0 replies

lili-changjiang · 2024-04-24T04:13:26Z

lili-changjiang
Apr 24, 2024
Author

运行前显卡上有没有其他任务？

没有其他任务，跑过很多次都是这样

0 replies

UserWangZz · 2024-04-24T08:25:02Z

UserWangZz
Apr 24, 2024
Collaborator

尝试一下paddle 2.5.2版本

0 replies

zhengmeng · 2024-05-14T16:15:11Z

zhengmeng
May 14, 2024

你好，请问解决了吗？我也遇到了这个问题，我有两张24G的

0 replies

gocse · 2024-07-25T07:47:16Z

gocse
Jul 25, 2024

用2.8.1版本使用ch_PP-OCRv4_det_teacher.yml也一样出现这个问题，重复多次都一样。环境是使用kaggle提供的GPU T4 x2

0 replies

RoyYMS01A · 2024-07-29T05:43:15Z

RoyYMS01A
Jul 29, 2024

我用3090也出现了这个问题

0 replies

Kagi217 · 2024-08-15T07:34:27Z

Kagi217
Aug 15, 2024

我用1660也出現這個問題

0 replies

Kyo1234567 · 2025-05-30T05:56:51Z

Kyo1234567
May 30, 2025

大佬们，这个问题解决没？我今天也遇到了这个问题

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

使用ch_PP-OCRv4_rec训练数据集报错：Out of memory error on GPU 0. Cannot allocate 129.394531MB memory on GPU 0, 23.611938GB memory has been allocated and available memory is only 31.687500MB. #12284

Uh oh!

{{title}}

Uh oh!

Replies: 8 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

使用ch_PP-OCRv4_rec训练数据集报错：Out of memory error on GPU 0. Cannot allocate 129.394531MB memory on GPU 0, 23.611938GB memory has been allocated and available memory is only 31.687500MB. #12284

Uh oh!

lili-changjiang Apr 23, 2024

Replies: 8 comments

Uh oh!

UserWangZz Apr 24, 2024 Collaborator

Uh oh!

lili-changjiang Apr 24, 2024 Author

Uh oh!

UserWangZz Apr 24, 2024 Collaborator

Uh oh!

zhengmeng May 14, 2024

Uh oh!

gocse Jul 25, 2024

Uh oh!

RoyYMS01A Jul 29, 2024

Uh oh!

Kagi217 Aug 15, 2024

Uh oh!

Kyo1234567 May 30, 2025

lili-changjiang
Apr 23, 2024

UserWangZz
Apr 24, 2024
Collaborator

lili-changjiang
Apr 24, 2024
Author

UserWangZz
Apr 24, 2024
Collaborator

zhengmeng
May 14, 2024

gocse
Jul 25, 2024

RoyYMS01A
Jul 29, 2024

Kagi217
Aug 15, 2024

Kyo1234567
May 30, 2025