You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/prompt/prompt_guide.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
## 1. Goal
4
4
5
-
Your objective is to modify the `PROMPT_TEMPLATE` string within the `optimize.py` file to improve the `accuracy` metric when solving AIME math problems. The modifications should leverage the capabilities of the target model, **GPT-4.1**.
5
+
Your objective is to modify the the `optimize.py` file to improve the `accuracy` metric when solving AIME math problems. The modifications should leverage the capabilities of the target model, **GPT-4.1**.
6
6
7
7
## 2. Files and Workflow
8
8
@@ -25,21 +25,21 @@ You are optimizing the prompt for `gpt-4.1`. Based on its characteristics, consi
25
25
26
26
## 4. Optimization Strategies (Focus on `PROMPT_TEMPLATE` in `optimize.py`)
27
27
28
-
The primary goal is to enhance the model's reasoning process for these challenging math problems. Focus on **complex Chain-of-Thought (CoT)** designs within the `PROMPT_TEMPLATE`.
28
+
The primary goal is to enhance the model's reasoning process for these challenging math problems. Focus on Chain-of-Thought (CoT) designs within the `PROMPT_TEMPLATE`.
29
29
30
30
**Ideas to Explore:**
31
31
You don't have to implement all of them, but the following ideas might be helpful:
32
-
***Workflow Patterns:**
33
-
***Linear**: step-by-step thinking process could be a good starting point E.g., "1. Understand the problem constraints. 2. Identify relevant theorems/formulas. 3. Formulate a plan. 4. Execute calculations step-by-step. 5. Verify intermediate results. 6. State the final answer in the required format."
32
+
***Workflow Patterns** try to use some of the following patterns:
33
+
***Linear**: Linear workflow, standarded CoT E.g. considering the following thinking steps (you don't have to include all of them), "1. Understand the problem constraints. 2. Identify relevant theorems/formulas. 3. Formulate a plan. 4. Execute calculations step-by-step. 5. Verify intermediate results. 6. State the final answer in the required format."
34
34
***List Candidates**: You can ask the model to propose a few solutions in a particular step and pick the best solution. You can potentially also set the criterias in the prompt.
35
-
***Code**Write pesudo code to define even more complex workflows with loops, conditional statement, or go to statement.
35
+
***Code**Use pesudo code to define even more complex workflows with loops, conditional statement, or go to statement.
36
36
***Other CoT Techniques:**
37
37
* Self-Correction/Reflection
38
38
* Plan Generation
39
39
* Debate, simulating multiple characters
40
40
* Tree of thought
41
-
***Few-Shot Examples:** You *could* experiment with adding 1-2 high-quality AIME problem/solution examples directly into the `PROMPT_TEMPLATE` string (similar to how Weco attempted in one of the runs). Ensure the examples clearly show the desired reasoning style and the final `\boxed{XXX}` format.*Caution: This significantly increases prompt length and cost.*
41
+
***Few-Shot Examples:** You *could* experiment with adding 1-2 high-quality AIME problem/solution examples directly into the `PROMPT_TEMPLATE` string (similar to how Weco attempted in one of the runs). Ensure the examples clearly show the desired reasoning style and the final `\boxed{XXX}` format.
42
42
***Play with format:** The way you format the prompt. Markdown, xml, json, code or natural language. Similarly for the thinking tokens themselves you can also try out different formats.
43
43
44
44
## 5. Constraints
45
-
***Ensure the final output reliably contains `\boxed{XXX}` as the evaluation script depends on it.**
45
+
***Ensure the final output reliably contains `\boxed{XXX}` as the evaluation script depends on it.**
0 commit comments