-
Notifications
You must be signed in to change notification settings - Fork 658
add example [online judge programming] #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # Please save this file as .kattisrc in your home directory. | ||
| # This file includes a secret token that allows you to log in. | ||
| # DO NOT SHARE IT WITH ANYONE ELSE. | ||
| # If someone gets access to this token, please revoke it by changing your KATTIS password. | ||
|
|
||
| [user] | ||
| username: YOUR_USERNAME | ||
| token: YOUR_TOKEN | ||
|
|
||
| [kattis] | ||
| hostname: open.kattis.com | ||
| loginurl: https://open.kattis.com/login | ||
| submissionurl: https://open.kattis.com/submit | ||
| submissionsurl: https://open.kattis.com/submissions | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| # Online Judge Programming Example | ||
|
|
||
| This example demonstrates how OpenEvolve can solve programming problems and pass all test cases on [Kattis online judge](https://open.kattis.com/) starting from scratch. | ||
|
|
||
| ## Problem Description | ||
|
|
||
| We take the [Alphabet](https://open.kattis.com/problems/alphabet) problem from [Kattis](https://open.kattis.com/) as following: | ||
| ```markdown | ||
| A string of lowercase letters is called **alphabetical** if some of the letters can be deleted so that the only letters that remain are the letters from 'a' to 'z' in order. Given a string s, determine the minimum number of letters to add anywhere in the string to make it alphabetical. | ||
|
|
||
| Input: | ||
| Each input will consist of a single test case. Note that your program may be run multiple times on different inputs. The only line of input contains a string s (1 ≤ |s| ≤ 50) which contains only lowercase letters. | ||
|
|
||
| Output: | ||
| Output a single integer, which is the smallest number of letters needed to add to `s` to make it alphabetical. | ||
|
|
||
| Sample Input 1: | ||
| xyzabcdefghijklmnopqrstuvw | ||
|
|
||
| Sample Output 1: | ||
| 3 | ||
|
|
||
| Sample Input 2: | ||
| aiemckgobjfndlhp | ||
|
|
||
| Sample Output 2: | ||
| 20 | ||
| ``` | ||
|
|
||
| ## Getting Started | ||
|
|
||
| First, download your personal configuration file (must be logged in) from [Kattis](https://open.kattis.com/download/kattisrc) and save it as `.kittisrc`. | ||
|
|
||
| Then, to run this example: | ||
|
|
||
| ```bash | ||
| cd examples/online_judge_programming | ||
| python ../../openevolve-run.py initial_program.py evaluator.py --config config.yaml | ||
| ``` | ||
|
|
||
| ## Algorithm Evolution | ||
|
|
||
| ### Initial Algorithm (dummy output) | ||
|
|
||
| The initial implementation was a simple dummy output that returned 0 directly. | ||
|
|
||
| ```python | ||
| import sys | ||
| for line in sys.stdin: | ||
| s = line.strip() | ||
|
|
||
| ans = 0 | ||
| print(ans) | ||
| ``` | ||
|
|
||
| ### Evolved Algorithm (Dynamic Programming) | ||
|
|
||
| After running OpenEvolve for just 4 iterations, it discovered a dynamic programming algorithm that passes all test cases on Kattis: | ||
|
|
||
| ```python | ||
| import sys | ||
|
|
||
| for line in sys.stdin: | ||
| s = line.strip() | ||
|
|
||
| n = len(s) | ||
| dp = [1] * n | ||
|
|
||
| for i in range(1, n): | ||
| for j in range(i): | ||
| if s[i] > s[j]: | ||
| dp[i] = max(dp[i], dp[j] + 1) | ||
|
|
||
| longest_alphabetical_subsequence_length = max(dp) | ||
| ans = 26 - longest_alphabetical_subsequence_length | ||
| print(ans) | ||
| ``` | ||
|
|
||
| ## Next Steps | ||
|
|
||
| Try modifying the config.yaml file to: | ||
| - Change the programming problem in system prompt | ||
| - Change the LLM model configuration |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| # Configuration for function minimization example | ||
| max_iterations: 20 | ||
| checkpoint_interval: 1 | ||
| log_level: "INFO" | ||
|
|
||
| # LLM configuration | ||
| llm: | ||
| primary_model: "gemini-2.0-flash" | ||
| primary_model_weight: 0.6 | ||
| secondary_model: "gemini-2.5-flash-preview-05-20" | ||
| secondary_model_weight: 0.4 | ||
| api_base: "https://generativelanguage.googleapis.com/v1beta/openai/" | ||
| api_key: YOUR_API_KEY | ||
| temperature: 0.7 | ||
| top_p: 0.95 | ||
| max_tokens: 4096 | ||
|
|
||
| # Prompt configuration | ||
| prompt: | ||
| system_message: | | ||
| You are an expert programmer. Your task is to implement an algorithm in Python to pass all the test cases. The problem is as follows: | ||
|
|
||
| A string of lowercase letters is called alphabetical if some of the letters can be deleted so that the only letters that remain are the letters from a to z in order. Given a string s, determine the minimum number of letters to add anywhere in the string to make it alphabetical. | ||
|
|
||
| Input: | ||
| Each input will consist of a single test case. Note that your program may be run multiple times on different inputs. The only line of input contains a string s (1 ≤ |s| ≤ 50) which contains only lowercase letters. | ||
| Output: | ||
| Output a single integer, which is the smallest number of letters needed to add to s to make it alphabetical. | ||
|
|
||
| Sample Input 1: | ||
| xyzabcdefghijklmnopqrstuvw | ||
| Sample Output 1: | ||
| 3 | ||
|
|
||
| Sample Input 2: | ||
| aiemckgobjfndlhp | ||
| Sample Output 2: | ||
| 20 | ||
|
|
||
| Your program should always read/write to STDIN/STDOUT. For example, to handle integer input, use the following format: | ||
| ``` | ||
| import sys | ||
| for line in sys.stdin: | ||
| data = int(line) | ||
| ``` | ||
| Use print() for output. For example: | ||
| ``` | ||
| print("Hello, World!") | ||
| ``` | ||
| num_top_programs: 3 | ||
| use_template_stochasticity: true | ||
|
|
||
| # Database configuration | ||
| database: | ||
| population_size: 50 | ||
| archive_size: 20 | ||
| num_islands: 3 | ||
| elite_selection_ratio: 0.2 | ||
| exploitation_ratio: 0.7 | ||
|
|
||
| # Evaluator configuration | ||
| evaluator: | ||
| timeout: 60 | ||
| cascade_evaluation: false | ||
| cascade_thresholds: [1.0] | ||
| parallel_evaluations: 4 | ||
| use_llm_feedback: false | ||
|
|
||
| # Evolution settings | ||
| diff_based_evolution: true | ||
| allow_full_rewrites: false |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| """ | ||
| Evaluator for the function minimization example | ||
| """ | ||
|
|
||
| import re | ||
| import subprocess | ||
| import time | ||
| import traceback | ||
|
|
||
|
|
||
| def run_with_timeout(program_path, timeout_seconds=60): | ||
| """ | ||
| Run a function with a timeout using subprocess. | ||
|
|
||
| Args: | ||
| func: Function to run | ||
| args: Arguments to pass to the function | ||
| kwargs: Keyword arguments to pass to the function | ||
| timeout_seconds: Timeout in seconds | ||
|
|
||
| Returns: | ||
| Result of the function or raises TimeoutError | ||
| """ | ||
| cmd = ["python", "submit.py", program_path, "-p", "alphabet", "-l", "Python 3", "-f"] | ||
|
|
||
| try: | ||
| # Run the command and grab its output using subprocess.Popen | ||
| proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) | ||
| stdout, stderr = proc.communicate(timeout=timeout_seconds) | ||
| exit_code = proc.returncode | ||
| if exit_code != 0: | ||
| print(stderr) # Print the error output if the command failed | ||
| raise RuntimeError(f"Process exited with code {exit_code}") | ||
| except subprocess.TimeoutExpired: | ||
| # Kill the process if it times out | ||
| proc.kill() | ||
| raise TimeoutError(f"Process timed out after {timeout_seconds} seconds") | ||
|
|
||
| pattern = ( | ||
| r"Score:\s*(\d+)\s*" | ||
| r"Test cases done:\s*(\d+)\s*" | ||
| r"Test cases correct:\s*(\d+)\s*" | ||
| r"Test cases total:\s*(\d+)" | ||
| ) | ||
| match = re.search(pattern, stdout) | ||
| if not match: | ||
| raise ValueError("Expected summary lines not found") | ||
|
|
||
| score, done, correct, total = map(int, match.groups()) | ||
| return score, done, correct, total | ||
|
|
||
|
|
||
| def evaluate(program_path): | ||
| """ | ||
| Evaluate the program by submitting it to OJ and fetching metrics based on how well it performs. | ||
|
|
||
| Args: | ||
| program_path: Path to the program file | ||
|
|
||
| Returns: | ||
| Dictionary of metrics | ||
| """ | ||
| try: | ||
| # For constructor-based approaches, a single evaluation is sufficient | ||
| # since the result is deterministic | ||
| start_time = time.time() | ||
|
|
||
| # Use subprocess to run with timeout | ||
| score, done, correct, total = run_with_timeout( | ||
| program_path, timeout_seconds=60 # Single timeout | ||
| ) | ||
|
|
||
| end_time = time.time() | ||
| eval_time = end_time - start_time | ||
|
|
||
| # Combined score - higher is better | ||
| combined_score = correct / total if total > 0 else 0.0 | ||
|
|
||
| print( | ||
| f"Evaluation: Score={score}, Done={done}, Correct={correct}, Total={total}, Combined={combined_score:.2f}" | ||
| ) | ||
|
|
||
| return { | ||
| "score": score, | ||
| "done": done, | ||
| "correct": correct, | ||
| "total": total, | ||
| "eval_time": eval_time, | ||
| "combined_score": float(combined_score), | ||
| } | ||
|
|
||
| except Exception as e: | ||
| print(f"Evaluation failed completely: {str(e)}") | ||
| traceback.print_exc() | ||
| return { | ||
| "score": 0, | ||
| "done": 0, | ||
| "correct": 0, | ||
| "total": 0, | ||
| "eval_time": 0.0, | ||
| "combined_score": 0.0, | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| """Online judge programming example for OpenEvolve""" | ||
|
|
||
| # EVOLVE-BLOCK-START | ||
| import sys | ||
|
|
||
| for line in sys.stdin: | ||
| s = line.strip() | ||
|
|
||
| ans = 0 | ||
| print(ans) | ||
|
|
||
| # EVOLVE-BLOCK-END |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| lxml | ||
| requests |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can rename this file as example.kattisrc and add a note in readme to fill and save it as .kattisrc if the filename starts with . it may not even show up on the folder for users and they may not realize they need to set it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I've renamed the file to example.kattisrc and added a note to the README explaining that users should fill it out and save it as .kattisrc.