huggingface · XingweiDeng · Jan 19, 2026 · Jan 19, 2026
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -1151,6 +1151,8 @@
         title: Pix2Struct
       - local: model_doc/pixtral
         title: Pixtral
+      - local: model_doc/pp_lcnet
+        title: PPLCNet
       - local: model_doc/qwen2_5_omni
         title: Qwen2.5-Omni
       - local: model_doc/qwen2_5_vl

diff --git a/docs/source/en/model_doc/pp_lcnet.md b/docs/source/en/model_doc/pp_lcnet.md
@@ -0,0 +1,131 @@
+# PP-LCNet
+
+<div class="flex flex-wrap space-x-1">
+<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
+</div>
+
+## Overview
+
+**PP-LCNet** PP-LCNet is a family of efficient, lightweight convolutional neural networks designed for real-world document understanding and OCR tasks. It balances accuracy, speed, and model size, making it ideal for both server-side and edge deployment. To address different document processing requirements, PP-LCNet has three main variants, each optimized for a specific task.
+
+## Model Architecture
+
+1. The Document Image Orientation Classification Module is primarily designed to distinguish the orientation of document images and correct them through post-processing. During processes such as document scanning or ID photo capturing, the device might be rotated to achieve clearer images, resulting in images with various orientations. Standard OCR pipelines may not handle these images effectively. By leveraging image classification techniques, the orientation of documents or IDs containing text regions can be pre-determined and adjusted, thereby improving the accuracy of OCR processing.
+
+2. The Table Classification Module is a key component in computer vision systems, responsible for classifying input table images. The performance of this module directly affects the accuracy and efficiency of the entire table recognition process. The Table Classification Module typically receives table images as input and, using deep learning algorithms, classifies them into predefined categories based on the characteristics and content of the images, such as wired and wireless tables. The classification results from the Table Classification Module serve as output for use in table recognition pipelines.
+
+3. The text line orientation classification module primarily distinguishes the orientation of text lines and corrects them using post-processing. In processes such as document scanning and license/certificate photography, to capture clearer images, the capture device may be rotated, resulting in text lines in various orientations. Standard OCR pipelines cannot handle such data well. By utilizing image classification technology, the orientation of text lines can be predetermined and adjusted, thereby enhancing the accuracy of OCR processing.
+
+
+## Usage
+
+### Single input inference
+
+The example below demonstrates how to classify image with PP-LCNet using [`Pipeline`] or the [`AutoModel`].
+
+<hfoptions id="usage">
+<hfoption id="Pipeline">
+
+```py
+import requests
+from PIL import Image
+from transformers import pipeline
+model_path = "PaddlePaddle/PP-LCNet_x1_0_doc_ori_safetensors"
+image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/img_rot180_demo.jpg", stream=True).raw)
+image_classifier = pipeline("image-classification", model=model_path, function_to_apply="none")
+result = image_classifier(image)
+print(result)
+```
+
+</hfoption>
+
+<hfoption id="AutoModel">
+
+```py
+import requests
+from PIL import Image
+from transformers import AutoImageProcessor, AutoModelForImageClassification
+
+model_path = "PaddlePaddle/PP-LCNet_x1_0_doc_ori_safetensors"
+model = AutoModelForImageClassification.from_pretrained(model_path)
+image_processor = AutoImageProcessor.from_pretrained(model_path)
+
+image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/img_rot180_demo.jpg", stream=True).raw)
+
+inputs = image_processor(images=image, return_tensors="pt")
+outputs = model(**inputs)
+print(outputs)
+predicted_label = outputs.logits.argmax(-1).item()
+print(model.config.id2label[predicted_label])
+```
+
+</hfoption>
+</hfoptions>
+
+### Batched inference
+
+Here is how you can do it with PP-LCNet using [`Pipeline`] or the [`AutoModel`]:
+
+<hfoptions id="usage">
+<hfoption id="Pipeline">
+
+```py
+import requests
+from PIL import Image
+from transformers import pipeline
+model_path = "PaddlePaddle/PP-LCNet_x1_0_doc_ori_safetensors"
+image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/img_rot180_demo.jpg", stream=True).raw)
+image_classifier = pipeline("image-classification", model=model_path, function_to_apply="none")
+result = image_classifier([image, image])
+print(result)
+
+```
+
+</hfoption>
+
+<hfoption id="AutoModel">
+
+```py
+import requests
+from PIL import Image
+from transformers import AutoImageProcessor, AutoModelForImageClassification
+
+model_path = "PaddlePaddle/PP-LCNet_x1_0_doc_ori_safetensors"
+model = AutoModelForImageClassification.from_pretrained(model_path)
+image_processor = AutoImageProcessor.from_pretrained(model_path)
+
+image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/img_rot180_demo.jpg", stream=True).raw)
+
+inputs = image_processor(images=[image, image], return_tensors="pt")
+outputs = model(**inputs)
+
+predicted_labels = outputs.logits.argmax(-1)
+
+for label_id in predicted_labels:
+    label_id_scalar = label_id.item()
+    label = model.config.id2label[label_id_scalar]
+    print(label)
+```
+
+</hfoption>
+</hfoptions>
+
+## PPLCNetForImageClassification
+
+[[autodoc]] PPLCNetForImageClassification
+
+## PPLCNetConfig
+
+[[autodoc]] PPLCNetConfig
+
+## PPLCNetModel
+
+[[autodoc]] PPLCNetModel
+
+## PPLCNetImageProcessorFast
+
+[[autodoc]] PPLCNetImageProcessorFast
+
+## PPLCNetImageProcessor
+
+[[autodoc]] PPLCNetImageProcessor
diff --git a/src/transformers/models/__init__.py b/src/transformers/models/__init__.py
@@ -292,6 +292,7 @@
     from .plbart import *
     from .poolformer import *
     from .pop2piano import *
+    from .pp_lcnet import *
     from .prompt_depth_anything import *
     from .prophetnet import *
     from .pvt import *

diff --git a/src/transformers/models/auto/configuration_auto.py b/src/transformers/models/auto/configuration_auto.py
@@ -331,6 +331,7 @@
         ("plbart", "PLBartConfig"),
         ("poolformer", "PoolFormerConfig"),
         ("pop2piano", "Pop2PianoConfig"),
+        ("pp_lcnet", "PPLCNetConfig"),
         ("prompt_depth_anything", "PromptDepthAnythingConfig"),
         ("prophetnet", "ProphetNetConfig"),
         ("pvt", "PvtConfig"),
@@ -799,6 +800,7 @@
         ("plbart", "PLBart"),
         ("poolformer", "PoolFormer"),
         ("pop2piano", "Pop2Piano"),
+        ("pp_lcnet", "PPLCNet"),
         ("prompt_depth_anything", "PromptDepthAnything"),
         ("prophetnet", "ProphetNet"),
         ("pvt", "PVT"),

diff --git a/src/transformers/models/auto/image_processing_auto.py b/src/transformers/models/auto/image_processing_auto.py
@@ -163,6 +163,7 @@
             ("pixio", ("BitImageProcessor", "BitImageProcessorFast")),
             ("pixtral", ("PixtralImageProcessor", "PixtralImageProcessorFast")),
             ("poolformer", ("PoolFormerImageProcessor", "PoolFormerImageProcessorFast")),
+            ("pp_lcnet", ("PPLCNetImageProcessor", "PPLCNetImageProcessorFast")),
             ("prompt_depth_anything", ("PromptDepthAnythingImageProcessor", "PromptDepthAnythingImageProcessorFast")),
             ("pvt", ("PvtImageProcessor", "PvtImageProcessorFast")),
             ("pvt_v2", ("PvtImageProcessor", "PvtImageProcessorFast")),

diff --git a/src/transformers/models/auto/modeling_auto.py b/src/transformers/models/auto/modeling_auto.py
@@ -896,6 +896,7 @@ class _BaseModelWithGenerate(PreTrainedModel, GenerationMixin):
             ),
         ),
         ("poolformer", "PoolFormerForImageClassification"),
+        ("pp_lcnet", "PPLCNetForImageClassification"),
         ("pvt", "PvtForImageClassification"),
         ("pvt_v2", "PvtV2ForImageClassification"),
         ("regnet", "RegNetForImageClassification"),

diff --git a/src/transformers/models/pp_lcnet/__init__.py b/src/transformers/models/pp_lcnet/__init__.py
@@ -0,0 +1,27 @@
+# Copyright 2025 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import TYPE_CHECKING
+
+from ...utils import _LazyModule
+from ...utils.import_utils import define_import_structure
+
+
+if TYPE_CHECKING:
+    from .configuration_pp_lcnet import *
+    from .modeling_pp_lcnet import *
+else:
+    import sys
+
+    _file = globals()["__file__"]
+    sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
diff --git a/src/transformers/models/pp_lcnet/configuration_pp_lcnet.py b/src/transformers/models/pp_lcnet/configuration_pp_lcnet.py
@@ -0,0 +1,88 @@
+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+#           This file was automatically generated from src/transformers/models/pp_lcnet/modular_pp_lcnet.py.
+#               Do NOT edit this file manually as any edits will be overwritten by the generation of
+#             the file from the modular. If any change should be done, please apply the change to the
+#                          modular_pp_lcnet.py file directly. One of our CI enforces this.
+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+
+from ...configuration_utils import PreTrainedConfig
+
+
+class PPLCNetConfig(PreTrainedConfig):
+    model_type = "pp_lcnet"
+
+    """
+    This is the configuration class to store the configuration of a [`PPLCNet`]. It is used to instantiate a
+    PP-LCNet model according to the specified arguments, defining the model architecture.
+    Instantiating a configuration with the defaults will yield a similar configuration to that of the PP-LCNet
+    [PaddlePaddle/PP-LCNet_x1_0_doc_ori_safetensors](https://huggingface.co/PaddlePaddle/PP-LCNet_x1_0_doc_ori_safetensors) architecture.
+    Configuration objects inherit from [`PreTrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PreTrainedConfig`] for more information.
+    Args:
+        scale (`float`, *optional*, defaults to 1.0):
+            The scaling factor for the model's channel dimensions, used to adjust the model size and computational cost
+            without changing the overall architecture (e.g., 0.25, 0.5, 1.0, 1.5).
+        class_num (`int`, *optional*, defaults to 4):
+            The number of output classes for the classification task. Typical values are 2 (binary classification) or
+            4 (document orientation classification: 0°, 90°, 180°, 270°).
+        stride_list (`List[int]`, *optional*, defaults to `[2, 2, 2, 2, 2]`):
+            The list of stride values for convolutional layers in the backbone network, controlling the downsampling
+            rate of feature maps at each stage to capture multi-scale visual information.
+        reduction (`int`, *optional*, defaults to 4):
+            The reduction factor for feature channel dimensions in the squeeze-and-excitation (SE) blocks, used to
+            reduce the number of model parameters and computational complexity while maintaining feature representability.
+        dropout_prob (`float`, *optional*, defaults to 0.2):
+            The dropout probability for the classification head, used to prevent overfitting by randomly zeroing out
+            a fraction of the neurons during training.
+        class_expand (`int`, *optional*, defaults to 1280):
+            The number of hidden units in the expansion layer of the classification head, used to enhance the model's
+            feature representation capability before the final classification layer.
+        use_last_conv (`bool`, *optional*, defaults to `True`):
+            Whether to use the final convolutional layer in the classification head. Setting this to `True` helps
+            extract more discriminative features for the classification task.
+        act (`str`, *optional*, defaults to `"hardswish"`):
+            The non-linear activation function used in the model's hidden layers. Supported functions include
+            `"hardswish"`, `"relu"`, `"silu"`, and `"gelu"`. `"hardswish"` is preferred for lightweight and efficient
+            inference on edge devices.
+        backbone_config (`Union[dict, PreTrainedConfig]`, *optional*, defaults to `None`):
+            The configuration of the backbone model. If `None`, the default backbone configuration for PP-LCNet
+            will be used, which includes the standard block settings for feature extraction.
+
+    Examples:
+    ```python
+    >>> from transformers import PPLCNetConfig, PPLCNetForImageClassification
+    >>> # Initializing a PP-LCNet configuration
+    >>> configuration = PPLCNetConfig()
+    >>> # Initializing a model (with random weights) from the configuration
+    >>> model = PPLCNetForImageClassification(configuration)
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    """
+
+    def __init__(
+        self,
+        scale: float = 1.0,
+        class_num: int = 4,
+        stride_list: list[int] = [2, 2, 2, 2, 2],
+        reduction: int = 4,
+        dropout_prob: float = 0.2,
+        class_expand: int = 1280,
+        use_last_conv: bool = True,
+        act: str = "hardswish",
+        backbone_config: dict | None = None,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+
+        self.scale = scale
+        self.class_num = class_num
+        self.stride_list = stride_list
+        self.reduction = reduction
+        self.dropout_prob = dropout_prob
+        self.class_expand = class_expand
+        self.use_last_conv = use_last_conv
+        self.act = act
+        self.backbone_config = backbone_config
+
+
+__all__ = ["PPLCNetConfig"]