OpenDCAI
diff --git a/‎docs/en/notes/api/operators/reasoning/eval/ReasoningCategoryDatasetEvaluator.md‎
Lines changed: 46 additions & 19 deletions b/‎docs/en/notes/api/operators/reasoning/eval/ReasoningCategoryDatasetEvaluator.md‎
Lines changed: 46 additions & 19 deletions
diff --git a/‎docs/en/notes/api/operators/reasoning/eval/ReasoningDifficultyDatasetEvaluator.md‎
Lines changed: 30 additions & 17 deletions b/‎docs/en/notes/api/operators/reasoning/eval/ReasoningDifficultyDatasetEvaluator.md‎
Lines changed: 30 additions & 17 deletions
diff --git a/‎docs/en/notes/api/operators/reasoning/eval/ReasoningQuestionCategorySampleEvaluator.md‎
Lines changed: 35 additions & 2 deletions b/‎docs/en/notes/api/operators/reasoning/eval/ReasoningQuestionCategorySampleEvaluator.md‎
Lines changed: 35 additions & 2 deletions
diff --git a/‎docs/en/notes/api/operators/reasoning/eval/ReasoningQuestionDifficultySampleEvaluator.md‎
Lines changed: 87 additions & 17 deletions b/‎docs/en/notes/api/operators/reasoning/eval/ReasoningQuestionDifficultySampleEvaluator.md‎
Lines changed: 87 additions & 17 deletions
@@ -30,32 +30,59 @@ Executes the main logic of the operator. It reads a DataFrame from storage, calc
 
 
 ## 🧠 Example Usage
+```python
+from dataflow.operators.reasoning import ReasoningCategoryDatasetEvaluator
+from dataflow.utils.storage import FileStorage
+from dataflow.core import LLMServingABC
+
+class ReasoningCategoryDatasetEvaluatorTest():
+    def __init__(self, llm_serving: LLMServingABC = None):
+        
+        self.storage = FileStorage(
+            first_entry_file_name="example.json",
+            cache_path="./cache_local",
+            file_name_prefix="dataflow_cache_step",
+            cache_type="jsonl",
+        )
+        
+        self.evaluator = ReasoningCategoryDatasetEvaluator()
+        
+    def forward(self):
+        self.evaluator.run(
+            storage = self.storage.step(),
+            input_primary_category_key = "primary_category",
+            input_secondary_category_key = "secondary_category",
+        )
 
+if __name__ == "__main__":
+    pl = ReasoningCategoryDatasetEvaluatorTest()
+    pl.forward()
+```
 
 #### 🧾 Default Output Format
-The `run` method returns a dictionary containing the statistical information of the categories.
+| Field | Type | Description |
+| :-------------- | :---- | :---------- |
+| key | str | Primary category name. |
+| value | dict | Dictionary containing the total number of samples for this primary category (`primary_num`) and the number of samples for each secondary category. |
 
-**Example Input Data in DataFrame:**
+Example input (dataframe rows stored in `storage`):
 ```json
-[
-    {"primary_category": "Humanities", "secondary_category": "History"},
-    {"primary_category": "STEM", "secondary_category": "Mathematics"},
-    {"primary_category": "STEM", "secondary_category": "Physics"},
-    {"primary_category": "STEM", "secondary_category": "Mathematics"}
-]
+{ "primary_category": "Science", "secondary_category": "Physics" }
+{ "primary_category": "Science", "secondary_category": "Chemistry" }
+{ "primary_category": "Science", "secondary_category": "Physics" }
+{ "primary_category": "Humanities", "secondary_category": "History" }
 ```
-
-**Example Output (Return Value):**
+Example output:
 ```json
 {
-    "STEM": {
-        "primary_num": 3,
-        "Mathematics": 2,
-        "Physics": 1
-    },
-    "Humanities": {
-        "primary_num": 1,
-        "History": 1
-    }
+  "Science": {
+    "primary_num": 3,
+    "Physics": 2,
+    "Chemistry": 1
+  },
+  "Humanities": {
+    "primary_num": 1,
+    "History": 1
+  }
 }
 ```
@@ -33,29 +33,42 @@ def run(self, storage: DataFlowStorage, input_diffulty_key: str = "difficulty_sc
 
 ## 🧠 Example Usage
 ```python
+from dataflow.operators.reasoning import ReasoningDifficultyDatasetEvaluator
+from dataflow.utils.storage import FileStorage
+from dataflow.core import LLMServingABC
 
+class ReasoningDifficultyDatasetEvaluatorTest():
+    def __init__(self, llm_serving: LLMServingABC = None):
+        
+        self.storage = FileStorage(
+            first_entry_file_name="example.json",
+            cache_path="./cache_local",
+            file_name_prefix="dataflow_cache_step",
+            cache_type="jsonl",
+        )
+        
+        self.evaluator = ReasoningDifficultyDatasetEvaluator()
+        
+    def forward(self):
+        self.evaluator.run(
+            storage = self.storage.step(),
+            input_diffulty_key = "difficulty_score",
+        )
+
+if __name__ == "__main__":
+    pl = ReasoningDifficultyDatasetEvaluatorTest()
+    pl.forward()
 ```
 
-#### 🧾 Output Format
-The `run` function returns a dictionary containing the statistics of the difficulty distribution. The keys of the dictionary are the unique difficulty levels found in the dataset, and the values are the counts of samples for each level.
+#### 🧾 Return Value
 
-**Example Input (Data in `storage`)**:
-A dataframe with a column named `difficulty_score` (or as specified by `input_diffulty_key`).
-```
-[
-    {"instruction": "Question A...", "difficulty_score": "easy"},
-    {"instruction": "Question B...", "difficulty_score": "medium"},
-    {"instruction": "Question C...", "difficulty_score": "easy"},
-    {"instruction": "Question D...", "difficulty_score": "hard"},
-    {"instruction": "Question E...", "difficulty_score": "medium"}
-]
-```
+This operator returns a dictionary where the keys are the difficulty levels found in the dataset, and the values are the corresponding sample counts for each difficulty level.
 
-**Example Output (Return value of `run` function)**:
+Example return value:
 ```json
 {
-    "easy": 2,
-    "medium": 2,
-    "hard": 1
+  "Easy": 150,
+  "Medium": 200,
+  "Hard": 80
 }
 ```
@@ -22,7 +22,7 @@ def __init__(self, llm_serving: LLMServingABC = None)
 
 | Prompt Template Name | Main Purpose | Applicable Scenarios | Feature Description |
 | :--- | :--- | :--- | :--- |
-| | | | |
+| MathQuestionCategoryPrompt | Multi-level question classification | Classifying user questions into primary and secondary categories | Takes input questions and outputs primary and secondary classifications |
 
 ## `run` function
 
@@ -39,7 +39,40 @@ def run(self, storage: DataFlowStorage, input_key:str = "instruction", output_ke
 ## 🧠 Example Usage
 
 ```python
-
+from dataflow.operators.reasoning import ReasoningQuestionCategorySampleEvaluator
+from dataflow.utils.storage import FileStorage
+from dataflow.core import LLMServingABC
+from dataflow.serving import APILLMServing_request
+
+class ReasoningQuestionCategorySampleEvaluatorTest():
+    def __init__(self, llm_serving: LLMServingABC = None):
+        
+        self.storage = FileStorage(
+            first_entry_file_name="example.json",
+            cache_path="./cache_local",
+            file_name_prefix="dataflow_cache_step",
+            cache_type="jsonl",
+        )
+        
+        # use API server as LLM serving
+        self.llm_serving = APILLMServing_request(
+                    api_url="",
+                    model_name="gpt-4o",
+                    max_workers=30
+        )
+        
+        self.evaluator = ReasoningQuestionCategorySampleEvaluator(llm_serving=self.llm_serving)
+        
+    def forward(self):
+        self.evaluator.run(
+            storage = self.storage.step(),
+            input_key = "instruction",
+            output_key = "category",
+        )
+
+if __name__ == "__main__":
+    pl = ReasoningQuestionCategorySampleEvaluatorTest()
+    pl.forward()
 ```
 
 #### 🧾 Default Output Format
 
@@ -5,35 +5,105 @@ permalink: /en/api/operators/reasoning/eval/reasoningquestiondifficultysampleeva
 ---
 
 ## 📘 Overview
-The `ReasoningQuestionDifficultySampleEvaluator` is an operator designed to evaluate the difficulty level of questions. It leverages a Large Language Model (LLM) to analyze the complexity of a given question and outputs a numerical difficulty score, typically on a scale of 1 to 10.
 
-## __init__
+[ReasoningQuestionDifficultySampleEvaluator](https://github.com/OpenDCAI/DataFlow/blob/main/dataflow/operators/reasoning/evaluate/reasoning_question_difficulty_sample_evaluator.py)
+is a question difficulty evaluation operator. It analyzes the complexity of questions by calling a Large Language Model (LLM) and generates a difficulty score from 1 to 10 for each question.
+
+## `__init__` function
+
 ```python
-def __init__(self, llm_serving: LLMServingABC = None)
+@prompt_restrict(
+    MathQuestionDifficultyPrompt
+)
+
+@OPERATOR_REGISTRY.register()
+class ReasoningQuestionDifficultySampleEvaluator(OperatorABC):
+    def __init__(self, llm_serving: LLMServingABC = None):
 ```
-| Parameter | Type | Default | Description |
-| :--- | :--- | :--- | :--- |
-| **llm_serving** | LLMServingABC | None | An instance of a large language model serving class, used for executing inference and generation. |
+
+### init Parameter Description
+
+| Parameter Name  | Type          | Default | Description                           |
+| :-------------- | :------------ | :----- | :----------------------------- |
+| **llm_serving** | LLMServingABC | Required   | Large language model service instance for executing inference and generation. |
 
 ### Prompt Template Descriptions
+
 | Prompt Template Name | Primary Use | Applicable Scenarios | Features |
-| :--- | :--- | :--- | :--- |
-| **MathQuestionDifficultyPrompt** | | | |
+| --------------- | -------- | -------- | -------- |
+| MathQuestionDifficultyPrompt | Question difficulty evaluation | Evaluating the difficulty of user questions | Input question, output difficulty score from 1 to 10 |
+
+## run function
 
-## run
 ```python
-def run(self, storage: DataFlowStorage, input_key: str, output_key: str = "difficulty_score")
+def run(self, storage: DataFlowStorage, input_key: str, output_key:str="difficulty_score")
 ```
-| Parameter | Type | Default | Description |
-| :--- | :--- | :--- | :--- |
-| **storage** | DataFlowStorage | Required | An instance of the DataFlow storage, responsible for reading and writing data. |
-| **input_key** | str | Required | The name of the input column, corresponding to the question field. |
-| **output_key** | str | "difficulty_score" | The name of the output column, corresponding to the generated difficulty score field. |
+
+#### Parameters
+
+| Name         | Type            | Default               | Description                           |
+| :----------- | :-------------- | :------------------- | :----------------------------- |
+| **storage**  | DataFlowStorage | Required                 | DataFlow storage instance for reading and writing data.   |
+| **input_key**| str             | Required                 | Input column name corresponding to the question field.         |
+| **output_key**| str             | "difficulty_score" | Output column name corresponding to the generated difficulty score field. |
 
 ## 🧠 Example Usage
 
+```python
+from dataflow.operators.reasoning import ReasoningQuestionDifficultySampleEvaluator
+from dataflow.utils.storage import FileStorage
+from dataflow.core import LLMServingABC
+from dataflow.serving import APILLMServing_request
+
+class ReasoningQuestionDifficultySampleEvaluatorTest():
+    def __init__(self, llm_serving: LLMServingABC = None):
+        
+        self.storage = FileStorage(
+            first_entry_file_name="example.json",
+            cache_path="./cache_local",
+            file_name_prefix="dataflow_cache_step",
+            cache_type="jsonl",
+        )
+        
+        # use API server as LLM serving
+        self.llm_serving = APILLMServing_request(
+                    api_url="",
+                    model_name="gpt-4o",
+                    max_workers=30
+        )
+        
+        self.evaluator = ReasoningQuestionDifficultySampleEvaluator(llm_serving=self.llm_serving)
+        
+    def forward(self):
+        self.evaluator.run(
+            storage = self.storage.step(),
+            input_key = "instruction",
+            output_key = "difficulty_score",
+        )
+
+if __name__ == "__main__":
+    pl = ReasoningQuestionDifficultySampleEvaluatorTest()
+    pl.forward()
+```
+
 #### 🧾 Default Output Format
+
 | Field | Type | Description |
 | :--- | :--- | :--- |
-| ... | ... | ... |
-| difficulty_score | float | The numerical difficulty score (1-10) generated by the model. -1 if parsing fails. |
+| **difficulty_score** | int | The difficulty score of the question, from 1 to 10. |
+
+Example input:
+
+```json
+{
+    "instruction": "Calculate 2 to the power of 5."
+}
+```
+
+Example output:
+
+```json
+{
+    "difficulty_score": 3
+}
+```