Skip to content

Commit 7b81731

Browse files
committed
update math question extract quickstart
1 parent 69e5426 commit 7b81731

File tree

2 files changed

+37
-9
lines changed

2 files changed

+37
-9
lines changed

docs/en/notes/guide/quickstart/mathquestion_extract.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,23 @@ The example repository includes a test PDF:
5050
```
5151
You can also replace it with any math textbook or exercise collection PDF.
5252

53-
## 4 Write the Execution Script
53+
## 4 Initialize and Modify the Script
5454

55-
In the project’s root directory, create `generate_question_extract_api.py` with the following content as an example:
55+
First, create a new `run_dataflow` folder anywhere, enter that directory, and then execute Dataflow project initialization:
56+
57+
```shell
58+
mkdir run_dataflow
59+
cd run_dataflow
60+
dataflow init
61+
```
62+
63+
After initialization is complete, the following file will appear in the project directory:
64+
65+
```shell
66+
run_dataflow/playground/mathbook_extract.py
67+
```
68+
69+
The contents of that script are as follows:
5670

5771
```python
5872
from dataflow.operators.generate import MathBookQuestionExtract
@@ -61,7 +75,7 @@ from dataflow.serving.APIVLMServing_openai import APIVLMServing_openai
6175
class QuestionExtractPipeline:
6276
def __init__(self, llm_serving: APIVLMServing_openai):
6377
self.extractor = MathBookQuestionExtract(llm_serving)
64-
self.test_pdf = "./dataflow/example/KBCleaningPipeline/questionextract_test.pdf"
78+
self.test_pdf = "../example/KBCleaningPipeline/questionextract_test.pdf"
6579

6680
def forward(
6781
self,
@@ -87,11 +101,11 @@ if __name__ == "__main__":
87101
# 1. Initialize LLM Serving
88102
llm_serving = APIVLMServing_openai(
89103
api_url="https://api.openai.com/v1/chat/completions",
90-
model_name="o4-mini", # Strong reasoning model recommended
104+
model_name="o4-mini", # It is recommended to use a strong reasoning model
91105
max_workers=20 # Number of concurrent requests
92106
)
93107

94-
# 2. Build and run the pipeline
108+
# 2. Construct and run the extraction pipeline
95109
pipeline = QuestionExtractPipeline(llm_serving)
96110
pipeline.forward(
97111
pdf_path=pipeline.test_pdf,

docs/zh/notes/guide/quickstart/mathquestion_extract.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,23 @@ permalink: /zh/guide/zchbl7uk/
5050
```
5151
你也可以替换为任意数学教材或习题集 PDF。
5252

53-
## 4 编写执行脚本
53+
## 4 初始化并修改脚本
5454

55-
在项目根目录新建 `generate_question_extract_api.py`,内容示例:
55+
首先,在任意位置创建一个新的 `run_dataflow` 文件夹,并进入该目录,然后执行 Dataflow 项目初始化:
56+
57+
```shell
58+
mkdir run_dataflow
59+
cd run_dataflow
60+
dataflow init
61+
```
62+
63+
初始化完成后,项目目录下会出现以下文件:
64+
65+
```shell
66+
run_dataflow/playground/mathbook_extract.py
67+
```
68+
69+
该脚本的内容如下:
5670

5771
```python
5872
from dataflow.operators.generate import MathBookQuestionExtract
@@ -61,7 +75,7 @@ from dataflow.serving.APIVLMServing_openai import APIVLMServing_openai
6175
class QuestionExtractPipeline:
6276
def __init__(self, llm_serving: APIVLMServing_openai):
6377
self.extractor = MathBookQuestionExtract(llm_serving)
64-
self.test_pdf = "./dataflow/example/KBCleaningPipeline/questionextract_test.pdf"
78+
self.test_pdf = "../example/KBCleaningPipeline/questionextract_test.pdf"
6579

6680
def forward(
6781
self,
@@ -91,7 +105,7 @@ if __name__ == "__main__":
91105
max_workers=20 # 并发请求数
92106
)
93107

94-
# 2. 构造并运行管道
108+
# 2. 构造并运行提取管道
95109
pipeline = QuestionExtractPipeline(llm_serving)
96110
pipeline.forward(
97111
pdf_path=pipeline.test_pdf,

0 commit comments

Comments
 (0)