-
Notifications
You must be signed in to change notification settings - Fork 597
Open
Labels
Description
Before you submit
- I have searched existing issues
- I spent at least 5 minutes investigating and preparing this report
- I confirmed this is not caused by a network issue
- I have fully read and understood the README
- I am certain that this issue is with BabelDOC itself and can be reproduced through the BabelDOC cli
Environment
- OS: windows 11 24H2 26100.3775
- Python: 3.13.3
- BabelDOC: 0.3.49Describe the bug
我有一个英文文件,我使用命令行以及python sdk接口都无法将它翻译为中文,翻译后的文件,无论是双页对照还是单页文件都依旧保持原文内容不变
Steps to Reproduce
我的代码如下
import asyncio
from babeldoc.document_il.translator.translator import OpenAITranslator
from babeldoc.docvision.doclayout import DocLayoutModel
from babeldoc.docvision.table_detection.rapidocr import RapidOCRModel
from babeldoc.translation_config import WatermarkOutputMode, TranslationConfig, TranslateResult
from babeldoc.main import create_progress_handler
from babeldoc.high_level import async_translate
doc_layout_model = DocLayoutModel.load_onnx()
table_model = RapidOCRModel()
watermark_output_mode = WatermarkOutputMode.NoWatermark
lang_in = "en"
lang_out = "zh"
translator = OpenAITranslator(
lang_in=lang_in,
lang_out=lang_out,
model="gpt-4o-mini-2024-07-18",
base_url="https://openai-xxxx",
api_key="sk-xxxx",
ignore_cache=True,
)
tmpdir = "."
config = TranslationConfig(
input_file=r"xxxx.pdf",
font=None,
pages=None,
output_dir=tmpdir,
translator=translator,
debug=False,
lang_in=lang_in,
lang_out=lang_out,
no_dual=False,
no_mono=True,
qps=5,
doc_layout_model=doc_layout_model,
skip_clean=False,
dual_translate_first=False,
disable_rich_text_translate=False,
enhance_compatibility=True,
report_interval=0.5,
min_text_length=5,
watermark_output_mode=watermark_output_mode,
split_strategy=None,
table_model=table_model,
)
async def main():
progress_context, progress_handler = create_progress_handler(config)
overall_progress = 0
with progress_context:
async for event in async_translate(config):
progress_handler(event)
if event["type"] == "progress_update":
new_overall_progress = event["overall_progress"]
print("overall_progress:", overall_progress)
if new_overall_progress > overall_progress:
overall_progress = new_overall_progress
if event["type"] == "finish":
result: TranslateResult = event.get("translate_result")
print(str(result))
break
if __name__ == '__main__':
asyncio.run(main())我的cli命令如下
uv run babeldoc --files "E:\...\xxx.pdf" --openai --openai-model "gpt-4o-mini" --openai-base-url "https://openai-xxx" --openai-api-key "sk-xxxx"
Expected Behavior
No response
Relevant Log Output or Screenshots
Original PDF File
出错的PDF文件可以参考以下文件
TB2-SDC.VP124-00HSJ-M-M1A-PFD-0011 Rev1 Worst Coal - BMCR.pdf
Additional Context
No response
Reactions are currently unavailable