How to fully control each module in TableRecognitionPipelineV2 pipeline #15996
-
I’m using the General Table Recognition v2 Pipeline, and I’d like to understand how to fully control each module in the pipeline. From what I know, the pipeline includes these mandatory modules:
And these optional modules:
What input does each module expect? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hello, the inputs for Table Structure Recognition, Table Classification, Table Cell Localization, Text Detection, and Text Recognition are cropped table regions or standalone table images. For Layout Area Detection, Document Image Orientation Classification, and Text Image Correction, the inputs are document images containing tables. For more specific details, you can refer to the respective module documentation link in 4. Secondary Development¶. Table Structure Recognition outputs the HTML structure of the table. Table Classification outputs the category of the table, indicating whether it is a bordered or borderless table. Table Cell Localization outputs the detection results for each cell in the table. Text Detection and Text Recognition together form the OCR pipeline, outputting the OCR information of the table (text detection boxes and recognized content). Document Image Orientation Classification and Text Image Correction output the image after document orientation classification and correction, and image correction. Layout Area Detection outputs the layout information (detection boxes and category IDs). The input/output format for linking them together and how to control the internal image processing steps can be found in the PaddleX TableRecognitionPipelineV2 pipeline code: https://github.com/PaddlePaddle/PaddleX/blob/3d1dda6684caf2d295fb04854b01110db1e2c1c7/paddlex/inference/pipelines/table_recognition/pipeline_v2.py#L1100. If you need to modify the details of the inference, it is recommended to use PaddleX. Otherwise, modifications need to be made in the site-package path, which can be quite cumbersome. In PaddleOCR, the essential modules cannot be skipped unless you directly modify the module combination logic in PaddleX. https://github.com/PaddlePaddle/PaddleX/blob/3d1dda6684caf2d295fb04854b01110db1e2c1c7/paddlex/inference/pipelines/table_recognition/pipeline_v2.py#L1100 |
Beta Was this translation helpful? Give feedback.
Hello, the inputs for Table Structure Recognition, Table Classification, Table Cell Localization, Text Detection, and Text Recognition are cropped table regions or standalone table images. For Layout Area Detection, Document Image Orientation Classification, and Text Image Correction, the inputs are document images containing tables. For more specific details, you can refer to the respective module documentation link in 4. Secondary Development¶.
Table Structure Recognition outputs the HTML structure of the table. Table Classification outputs the category of the table, indicating whether it is a bordered or borderless table. Table Cell Localization outputs the detection results for each c…