as

codelion · codelion · commit 0ff356c0b06d · 2025-08-09T12:13:32.000+08:00
diff --git a/examples/llm_prompt_optimization/templates/evaluation.txt b/examples/llm_prompt_optimization/templates/evaluation.txt
@@ -0,0 +1,31 @@
+Evaluate the following prompt designed for large language models on a scale of 0.0 to 1.0 for these metrics:
+
+1. **Clarity** (0.0-1.0): How clear and unambiguous are the instructions? Are there any confusing or contradictory elements?
+
+2. **Specificity** (0.0-1.0): Does the prompt provide appropriate detail and constraints without being overly restrictive? Does it guide the model effectively?
+
+3. **Robustness** (0.0-1.0): Will this prompt handle edge cases and varied inputs well? Is it resilient to different phrasings or unexpected scenarios?
+
+4. **Format_specification** (0.0-1.0): Is the expected output format clearly defined? Will the model know exactly how to structure its response?
+
+Prompt to evaluate:
+```
+{current_program}
+```
+
+Consider that this prompt is designed for a task involving mathematical problem-solving, classification, or similar structured tasks where accuracy and consistency are important.
+
+Evaluation guidelines:
+- A score of 1.0 means excellent/optimal for that dimension
+- A score of 0.5 means adequate but with room for improvement
+- A score of 0.0 means severely lacking in that dimension
+- Consider how well the prompt would work across different models and contexts
+
+Return your evaluation as a JSON object with the following format:
+{{
+    "clarity": [score],
+    "specificity": [score],
+    "robustness": [score],
+    "format_specification": [score],
+    "reasoning": "[brief explanation of scores, highlighting strengths and areas for improvement]"
+}}
diff --git a/examples/llm_prompt_optimization/templates/evaluator_system_message.txt b/examples/llm_prompt_optimization/templates/evaluator_system_message.txt
@@ -0,0 +1,13 @@
+You are an expert prompt engineer specializing in creating effective prompts for language models.
+
+Your task is to evolve and improve prompts to maximize their performance on specific tasks. When rewriting prompts:
+
+1. **Maintain the exact placeholder format**: Always use the same placeholder name as in the original prompt (e.g., {instruction}, {claim}, {context}, {question})
+2. **Keep it simple**: Avoid overly complex or verbose instructions unless necessary
+3. **Be specific**: Provide clear, actionable guidance to the model
+4. **Test-oriented**: Focus on what will improve accuracy on the given evaluation metrics
+5. **Format-aware**: Ensure the prompt works well with the expected input/output format
+
+**CRITICAL**: Your rewritten prompt must use EXACTLY the same placeholder names as the original. Do not change {instruction} to {input_text} or any other variation.
+
+Generate only the improved prompt text, nothing else.
diff --git a/examples/llm_prompt_optimization/templates/full_rewrite_user.txt b/examples/llm_prompt_optimization/templates/full_rewrite_user.txt
@@ -12,9 +12,15 @@
 
 # Task
 Rewrite the prompt to improve its performance on the specified metrics.
-Provide the complete new prompt text.
+Focus on clarity, specificity, and effectiveness for the target task.
 
-IMPORTANT: Make sure your rewritten prompt maintains the same input placeholder ({{input_text}})
-but with improved instructions for better LLM performance.
+CRITICAL REQUIREMENTS:
+1. Keep the EXACT same placeholder from the original prompt (e.g., {{instruction}}, {{claim}}, etc.)
+2. Do not add any new placeholders or change existing ones
+3. Make the instructions clearer and more specific
+4. Focus on what will improve accuracy and task performance
+5. Keep the prompt concise but effective
 
-Your improved prompt:
+Provide ONLY the complete new prompt text, with no additional commentary:
+
+NEW PROMPT: