Skip to content

Commit a6079f6

Browse files
committed
allow updating additional instructions from a file after the run started
1 parent 97f3a71 commit a6079f6

File tree

4 files changed

+19
-15
lines changed

4 files changed

+19
-15
lines changed

examples/prompt/eval.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
TOTAL_SAMPLES = 30 # how many problems to load
2323
NUM_WORKERS = 30 # concurrent LLM calls
2424
LOG_EVERY = 5 # print progress after this many
25-
MODEL_TO_USE = "gpt-4.1-mini" # Define the model to use HERE
25+
MODEL_TO_USE = "gpt-4.1" # Define the model to use HERE
2626
TASK_TIMEOUT = 300 # seconds per LLM call
2727
# ---------------------------------------------------------------------
2828

examples/prompt/optimize.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@
2222
Solution:
2323
"""
2424

25-
# Modify the function signature to accept model_name
2625
def solve(problem: str, model_name: str) -> str:
2726
"""Return the model's raw text answer for one problem using the specified model."""
2827
prompt = PROMPT_TEMPLATE.format(problem=problem)

examples/prompt/prompt_guide.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## 1. Goal
44

5-
Your objective is to modify the `PROMPT_TEMPLATE` string within the `optimize.py` file to improve the `accuracy` metric when solving AIME math problems. The modifications should leverage the capabilities of the target model, **GPT-4.1**.
5+
Your objective is to modify the the `optimize.py` file to improve the `accuracy` metric when solving AIME math problems. The modifications should leverage the capabilities of the target model, **GPT-4.1**.
66

77
## 2. Files and Workflow
88

@@ -25,21 +25,21 @@ You are optimizing the prompt for `gpt-4.1`. Based on its characteristics, consi
2525

2626
## 4. Optimization Strategies (Focus on `PROMPT_TEMPLATE` in `optimize.py`)
2727

28-
The primary goal is to enhance the model's reasoning process for these challenging math problems. Focus on **complex Chain-of-Thought (CoT)** designs within the `PROMPT_TEMPLATE`.
28+
The primary goal is to enhance the model's reasoning process for these challenging math problems. Focus on Chain-of-Thought (CoT) designs within the `PROMPT_TEMPLATE`.
2929

3030
**Ideas to Explore:**
3131
You don't have to implement all of them, but the following ideas might be helpful:
32-
* **Workflow Patterns:**
33-
* **Linear**: step-by-step thinking process could be a good starting point E.g., "1. Understand the problem constraints. 2. Identify relevant theorems/formulas. 3. Formulate a plan. 4. Execute calculations step-by-step. 5. Verify intermediate results. 6. State the final answer in the required format."
32+
* **Workflow Patterns** try to use some of the following patterns:
33+
* **Linear**: Linear workflow, standarded CoT E.g. considering the following thinking steps (you don't have to include all of them), "1. Understand the problem constraints. 2. Identify relevant theorems/formulas. 3. Formulate a plan. 4. Execute calculations step-by-step. 5. Verify intermediate results. 6. State the final answer in the required format."
3434
* **List Candidates**: You can ask the model to propose a few solutions in a particular step and pick the best solution. You can potentially also set the criterias in the prompt.
35-
* **Code** Write pesudo code to define even more complex workflows with loops, conditional statement, or go to statement.
35+
* **Code** Use pesudo code to define even more complex workflows with loops, conditional statement, or go to statement.
3636
* **Other CoT Techniques:**
3737
* Self-Correction/Reflection
3838
* Plan Generation
3939
* Debate, simulating multiple characters
4040
* Tree of thought
41-
* **Few-Shot Examples:** You *could* experiment with adding 1-2 high-quality AIME problem/solution examples directly into the `PROMPT_TEMPLATE` string (similar to how Weco attempted in one of the runs). Ensure the examples clearly show the desired reasoning style and the final `\boxed{XXX}` format. *Caution: This significantly increases prompt length and cost.*
41+
* **Few-Shot Examples:** You *could* experiment with adding 1-2 high-quality AIME problem/solution examples directly into the `PROMPT_TEMPLATE` string (similar to how Weco attempted in one of the runs). Ensure the examples clearly show the desired reasoning style and the final `\boxed{XXX}` format.
4242
* **Play with format:** The way you format the prompt. Markdown, xml, json, code or natural language. Similarly for the thinking tokens themselves you can also try out different formats.
4343

4444
## 5. Constraints
45-
* **Ensure the final output reliably contains `\boxed{XXX}` as the evaluation script depends on it.**
45+
* **Ensure the final output reliably contains `\boxed{XXX}` as the evaluation script depends on it.**

weco/cli.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -74,15 +74,16 @@ def main() -> None:
7474
"debug_prob": 0.5,
7575
"max_debug_depth": max(1, math.ceil(0.1 * steps)), # 10% of steps
7676
}
77+
# Read API keys
78+
api_keys = read_api_keys_from_env()
79+
# API request timeout
80+
timeout = 800
81+
7782
# Read additional instructions
7883
additional_instructions = read_additional_instructions(additional_instructions=args.additional_instructions)
7984
# Read source code
8085
source_fp = pathlib.Path(args.source)
8186
source_code = read_from_path(fp=source_fp, is_json=False)
82-
# Read API keys
83-
api_keys = read_api_keys_from_env()
84-
# API request timeout
85-
timeout = 800
8687

8788
# Initialize panels
8889
summary_panel = SummaryPanel(
@@ -193,12 +194,14 @@ def main() -> None:
193194
)
194195

195196
for step in range(1, steps):
197+
# Re-read instructions from the original source (file path or string) BEFORE each suggest call
198+
current_additional_instructions = read_additional_instructions(additional_instructions=args.additional_instructions)
196199
# Evaluate the current output and get the next solution
197200
eval_and_next_solution_response = evaluate_feedback_then_suggest_next_solution(
198201
console=console,
199202
session_id=session_id,
200203
execution_output=term_out,
201-
additional_instructions=additional_instructions,
204+
additional_instructions=current_additional_instructions,
202205
api_keys=api_keys,
203206
timeout=timeout,
204207
)
@@ -286,12 +289,14 @@ def main() -> None:
286289
transition_delay=0.1, # Slightly longer delay for evaluation results
287290
)
288291

292+
# Re-read instructions before the final feedback step
293+
current_additional_instructions = read_additional_instructions(additional_instructions=args.additional_instructions)
289294
# Ensure we pass evaluation results for the last step's generated solution
290295
eval_and_next_solution_response = evaluate_feedback_then_suggest_next_solution(
291296
console=console,
292297
session_id=session_id,
293298
execution_output=term_out,
294-
additional_instructions=additional_instructions,
299+
additional_instructions=current_additional_instructions,
295300
api_keys=api_keys,
296301
timeout=timeout,
297302
)

0 commit comments

Comments
 (0)