update doc

changdazhou · changdazhou · commit 4526a11b4db3 · 2025-02-24T12:43:23.000+08:00
diff --git a/docs/practical_tutorials/document_scene_information_extraction(deepseek)_tutorial.en.md b/docs/practical_tutorials/document_scene_information_extraction(deepseek)_tutorial.en.md
@@ -54,7 +54,7 @@ After executing the above code, you can obtain the following result:
 
 The result shows that PP-ChatOCRv3 can extract text information from the image and pass the extracted text information to the DeepSeek-V3 large model for question understanding and information extraction, returning the required extraction result.
 
-## 2. New Model Can Quickly Adapt to Multi-page PDF Files for Efficient Information Extraction
+## 3. New Model Can Quickly Adapt to Multi-page PDF Files for Efficient Information Extraction
 
 In practical application scenarios, besides a large number of image files, more document information extraction tasks involve multi-page PDF files. Since multi-page PDF files often contain a vast amount of text information, passing all this text information to a large language model at once not only increases the invocation cost but also reduces the accuracy of text information extraction. To address this issue, the PP-ChatOCRv3 pipeline integrates vector retrieval technology, which stores the text information from multi-page PDF files in the form of a vector database and retrieves the most relevant fragments through vector retrieval technology to pass them to the large language model, significantly reducing the invocation cost of the large language model and improving the accuracy of text information extraction. The Baidu Cloud Qianfan platform provides four vector models for establishing vector databases of text information. For the specific model support list and their functional characteristics, refer to the vector model section in the [API List](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Nlks5zkzu_en). Next, we will use the `embedding-v1` model to establish a vector database of text information and pass the most relevant fragments to the `DeepSeek-V3` large language model through vector retrieval technology, thereby efficiently extracting key information from multi-page PDF files.
 
@@ -156,7 +156,7 @@ Total Time: 6.9693s
 
 By comparing the results of the two executions, it can be observed that during the first execution, the PP-ChatOCRv3 Pipeline extracts all text information from multi-page PDF files and establishes a vector library, which takes a longer time. During subsequent executions, the PP-ChatOCRv3 Pipeline only needs to load and retrieve the vector library, significantly reducing the overall time consumption. The PP-ChatOCRv3 Pipeline, combined with vector retrieval technology, effectively reduces the number of calls to large language models when extracting ultra-long text, achieving faster text information extraction speed and more accurate key information location. This provides a more efficient solution for us in actual multi-page PDF file information extraction scenarios.
 
-## 3. Exploring the Thinking Mode of Large Models in Text and Image Information Extraction
+## 4. Exploring the Thinking Mode of Large Models in Text and Image Information Extraction
 
 DeepSeek-R1 impresses with its exceptional text dialogue capabilities and in-depth problem-solving thinking abilities. When executing complex tasks or processing user instructions, in addition to normally completing dialogue tasks, the model can also demonstrate its thinking process during problem-solving. The PP-ChatOCRv3 Pipeline already supports the ability to adaptively return the output of thinking model results. For models that support returning the thinking process, PP-ChatOCRv3 can return the thinking process through an additional `reasoning_content` output field. This field is a list field containing the thinking results of the PP-ChatOCRv3 when calling the large language model multiple times. By observing these thinking results, we can gain insight into how the model gradually extracts the answer to the question from the given text information, and these thinking results can help us provide more improvement ideas for prompt optimization of the model. Next, we will take a specific legal document information extraction task as an example, using the `DeepSeek-R1` model as the large language model called in PP-ChatOCRv3 for key information extraction, and briefly explore the thinking process of the DeepSeek-R1 model.
 
@@ -200,7 +200,7 @@ print(chat_result)
 
 The result shows that the use of the `DeepSeek-R1` model not only helps us complete the relationship information extraction task for the question 'When was this regulation announced?' but also returns its thinking process when solving the information extraction problem in the `reasoning_content` field. For example, when thinking, the model carefully distinguishes between the publication date and the implementation date of the regulation and rechecks the returned results.
 
-## 4. Supporting Custom Prompt Engineering to Expand the Functional Boundaries of Large Language Models
+## 5. Supporting Custom Prompt Engineering to Expand the Functional Boundaries of Large Language Models
 
 In document information extraction tasks, in addition to directly extracting key information from text information, we can also expand the functional boundaries of large language models through custom prompt engineering. For example, we can design new prompt rules to allow large language models to summarize these text information, thereby helping us quickly locate the key information we need from a large amount of text information, or allowing large language models to think and judge user questions based on the content in the text and give suggestions, etc. The PP-ChatOCRv3 Pipeline already supports custom prompt functionality, and the default prompts used by the Pipeline can be referred to in the Pipeline's [configuration file](../../paddlex/configs/pipelines/PP-ChatOCRv3-doc.yaml). We can refer to the prompt logic in the default configuration to customize and modify the prompts in the chat interface. Below is a brief introduction to the meaning of prompt parameters related to text content:
 
diff --git a/docs/practical_tutorials/document_scene_information_extraction(deepseek)_tutorial.md b/docs/practical_tutorials/document_scene_information_extraction(deepseek)_tutorial.md
@@ -54,7 +54,7 @@ print(chat_result)
 
 通过结果可以看出，PP-ChatOCRv3 能够从图像中提取出文本信息，并将提取到的文本信息通过 DeepSeek-V3 大模型进行问题理解和信息抽取，返回需要抽取的问题结果。
 
-## 2. 新模型可快速适配多页 PDF 文件，高效抽取信息。
+## 3. 新模型可快速适配多页 PDF 文件，高效抽取信息。
 
 在实际的应用场景中，除了大量的图片文件外，更多的文档信息抽取任务会涉及到多页 PDF 文件的处理。由于多页 PDF 文件中往往包含大量的文本信息，而将大量的文本信息一次性传递给大语言模型，除了会增加大语言模型的调用成本外，还会降低大语言模型文本信息抽取的准确性。为了解决这一问题，PP-ChatOCRv3 产线中集成了向量检索技术，能够将多页 PDF 文件中的文本信息通过建立向量库的方式进行存储，并通过向量检索技术将文本信息检索到最相关的片段传递给大语言模型，从而大幅降低大语言模型的调用成本并提高文本信息抽取的准确性。在百度云千帆平台，提供了4个向量模型用于建立文本信息的向量库，具体的模型支持列表及其功能特点可参考 [API列表](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Nlks5zkzu) 中的向量模型部分。接下来我们将使用 `embedding-v1` 模型建立文本信息的向量库，并通过向量检索技术将最相关的片段传递给 `DeepSeek-V3` 大语言模型，从而实现高效抽取多页 PDF 文件中的关键信息。
 
@@ -156,7 +156,7 @@ Total Time: 6.9693s
 
 通过对比两次执行结果可以发现，在首次执行时，PP-ChatOCRv3 产线会对多页 PDF 文件中的所有文本信息进行抽取和向量库的建立，耗时较长。而在后续执行时，PP-ChatOCRv3 产线仅需要对向量库进行加载和检索操作，大幅降低了整体的耗时。结合了向量检索技术的 PP-ChatOCRv3 产线有效的降低了对于超长文本进行抽取时大语言模型调用的次数，实现了更加快速的文本信息抽取速度和更加精准的关键信息定位，为我们在实际的多页 PDF 文件信息抽取场景中提供了更加高效的解决方案。
 
-## 3. 探究大模型对文本及图像信息抽取的思考方式
+## 4. 探究大模型对文本及图像信息抽取的思考方式
 
 
 DeepSeek-R1 凭借其卓越的文本对话能力和深入的问题思考能力，令人印象深刻。在执行复杂任务或处理用户指令时，该模型除了正常完成对话任务外，还能够展示其解决问题时的思考过程。PP-ChatOCRv3 特色产线已经支持了思考模型结果自适应返回的输出能力，对于支持返回思考过程的模型，PP-ChatOCRv3 能够将思考过程通过额外的 `reasoning_content` 输出字段进行返回。该字段为一个列表字段，包含了
@@ -202,7 +202,7 @@ print(chat_result)
 
 通过结果可以看出，`DeepSeek-R1` 模型的使用，除了帮助我们完成了'该规定是何时公布的？'这个问题的关系信息抽取任务外，还在 `reasoning_content` 字段返回了其在解决信息抽取问题时所经历的思考过程，例如模型在思考时对该法规的公布时间和实施时间进行了认真区分，对返回的结果进行再次检查等。
 
-## 4. 支持自定义提示词工程，拓展大语言模型的功能边界。
+## 5. 支持自定义提示词工程，拓展大语言模型的功能边界。
 
 在文档信息抽取任务中，除了直接从文本信息中提取出关键信息外，我们还可以通过自定义提示词工程的方式，拓展大语言模型的功能边界。例如，我们可以设计全新的提示词规则，让大语言模型对这些文本信息进行归纳总结，从而帮助我们从大量的文本信息中快速定位到我们需要的关键信息，或者让大语言模型根据文本信息中的内容对用户的问题进行思考判断，给出建议等等。PP-ChatOCRv3 产线已经支持了提示词自定义功能，产线使用的默认提示词可以参考产线的 [配置文件](../../paddlex/configs/pipelines/PP-ChatOCRv3-doc.yaml)，我们可以参考默认配置中的提示词逻辑在chat接口中对提示词进行自定义修改，下面简单介绍其中关于文本内容的相关提示词参数含义：