PPStructureV3 loads GeneralOCR (det + rec models) twice when use_table_recognition=True

## Checklist:

- [✓ ] 查找[历史相关issue](https://github.com/PaddlePaddle/PaddleX/issues)寻求解答
- [✓ ] 翻阅[FAQ](https://paddlepaddle.github.io/PaddleX/main/FAQ.html)
- [✓ ] 翻阅[PaddleX 文档](https://paddlepaddle.github.io/PaddleX/main/index.html)
- [ ✓] 确认bug是否在新版本里还未修复

## 描述问题

When using PPStructureV3 with `lang='korean'` and `use_table_recognition=True`(the default), the **GeneralOCR sub-pipeline (text detection + text recognition models) is loaded into memory twice** — once by the parent `_LayoutParsingPipelineV2` at init, and once by the child `_TableRecognitionPipelineV2` on the first `predict()` call.

In our case, `lang='korean'` auto-selects `korean_PP-OCRv5_mobile_rec` as the recognition model (instead of the default Chinese `PP-OCRv5_server_rec`). Both `PP-OCRv5_server_det` and `korean_PP-OCRv5_mobile_rec` are loaded twice, unnecessarily doubling GPU/CPU memory consumption. This issue likely affects all non-default language configurations, not just Korean.

### Related issue

- **PaddleOCR#17266**: Under certain conditions, `general_ocr_pipeline` remains `None` inside TableRecognition, causing `AttributeError: 'NoneType' object has no attribute 'text_rec_model'`. Same root cause (TableRecognition does not share parent's GeneralOCR), different symptom (crash vs memory waste).

## 复现

1. 您是否已经正常运行我们提供的[教程](https://paddlepaddle.github.io/PaddleX/main/index.html)？

Yes. The PPStructureV3 tutorial runs successfully.

2. 您是否在教程的基础上修改代码内容？还请您提供运行的代码

```python
"""
Test script to reproduce duplicate GeneralOCR model loading in PPStructureV3.

Reproduces the exact configuration used in our project:
  - backend/app/core/ocr_engine.py (KoreanPPOCRv5Engine)
  - lang='korean' → auto-selects korean_PP-OCRv5_mobile_rec
  - All unnecessary features disabled
"""

from paddleocr import PPStructureV3
import numpy as np

print("=" * 60)
print("Initializing PPStructureV3 with Korean PP-OCRv5 model...")
print("  - lang: korean (auto-selects korean_PP-OCRv5_mobile_rec)")
print("  - Table recognition: Enabled")
print("  - Formula recognition: Disabled")
print("  - Region detection: Disabled")
print("=" * 60)

pipeline = PPStructureV3(
    lang='korean',
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    use_table_recognition=True,
    use_formula_recognition=False,
    use_region_detection=False,
    format_block_content=True,
)

print("=" * 60)
print("Initialization done. Running predict()...")
print("=" * 60)

test_img = np.ones((100, 300, 3), dtype=np.uint8) * 255
try:
    result = list(pipeline.predict(
        test_img,
        use_table_recognition=True,
        format_block_content=True,
    ))
    print("Predict done.")
except Exception as e:
    print(f"Predict error: {e}")

print("=" * 60)
print("Test complete.")
print("=" * 60)
```

3. 您使用的数据集是？

Not dataset-dependent. Any input triggers this issue since the duplicate loading occurs on the first `predict()` call regardless of input content.

4. 请提供您出现的报错信息及相关log

No error — the pipeline works correctly, but the log shows det + rec models loaded twice. The duplicate 4 models are lazily loaded on the first `predict()` call (after "Initialization done" in the log):

```
============================================================
Initializing PPStructureV3 with Korean PP-OCRv5 model...
  - lang: korean (auto-selects korean_PP-OCRv5_mobile_rec)
  - Table recognition: Enabled
  - Formula recognition: Disabled
  - Region detection: Disabled
============================================================
Creating model: ('PP-DocLayout_plus-L', None)
Creating model: ('PP-OCRv5_server_det', None)              ← 1st load
Creating model: ('korean_PP-OCRv5_mobile_rec', None)       ← 1st load
Creating model: ('PP-LCNet_x1_0_table_cls', None)
Creating model: ('SLANeXt_wired', None)
Creating model: ('SLANet_plus', None)
Creating model: ('RT-DETR-L_wired_table_cell_det', None)
Creating model: ('RT-DETR-L_wireless_table_cell_det', None)
Creating model: ('PP-Chart2Table', None)
============================================================
Initialization done. Running predict()...
============================================================
Creating model: ('PP-LCNet_x1_0_doc_ori', None)            ← DUPLICATE (disabled by user!)
Creating model: ('PP-LCNet_x1_0_textline_ori', None)       ← DUPLICATE (disabled by user!)
Creating model: ('PP-OCRv5_server_det', None)               ← DUPLICATE (2nd load)
Creating model: ('korean_PP-OCRv5_mobile_rec', None)        ← DUPLICATE (2nd load)
Predict done.
============================================================
Test complete.
============================================================
```

## 环境

1. 请提供您使用的PaddlePaddle和PaddleX的版本号

- PaddlePaddle-GPU 3.3.0
- PaddleX 3.4.2
- PaddleOCR 3.4.0

2. 请提供您使用的操作系统信息，如Linux/Windows/MacOS

Windows 11

3. 请问您使用的Python版本是？

Python 3.11.14 (Anaconda)

5. 请问您使用的CUDA/cuDNN的版本号是？

CUDA 12.6 / cuDNN 9.9.0 (compiled), GPU Compute Capability 8.9, Driver 561.17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPStructureV3 loads GeneralOCR (det + rec models) twice when use_table_recognition=True #5033

Checklist:

描述问题

Related issue

复现

环境

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PPStructureV3 loads GeneralOCR (det + rec models) twice when use_table_recognition=True #5033

Description

Checklist:

描述问题

Related issue

复现

环境

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions