Skip to content
Discussion options

You must be logged in to vote

根据您提供的问题描述和错误日志,我可以看出这是一个并发调用ppStructure导致的问题。让我为您分析并提供解决方案:

  1. 问题原因:
  • 当同时调用同一个ppStructure实例处理多个PDF时,会出现并发问题
  • 错误显示在表格识别模块中:
InvalidArgumentError: The shape of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [], input[1]'s shape = [1].
  • 这表明在并发处理过程中,模型输入输出维度不匹配
  1. 解决方案:
  • 方案1: 使用多进程而不是多线程
from multiprocessing import Pool

def process_pdf(file_path):
    ocr_engine = PPStructure(...)
    result = ocr_engine(file_path)
    return result

def parallel_process_pdfs(file_paths):
    with Pool() as pool:
        results = pool.map(process_pdf, file_paths)
    return results
  • 方案2: 每个请求创建新的ppStructure实例
@app.post("/ocr")
def ocr(file_path: str = Form(...)):
    try:
  …

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by IeohMingChan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #14516 on January 10, 2025 05:41.