Skip to content

Commit 36cf197

Browse files
Update readme
1 parent de308a4 commit 36cf197

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -490,12 +490,16 @@ optillm supports various command-line arguments for configuration. When using Do
490490
| `--cepo_planning_m` | Number of attempts to generate n plans in planning stage | 6 |
491491
| `--cepo_planning_temperature_step1` | Temperature for generator in step 1 of planning stage | 0.55 |
492492
| `--cepo_planning_temperature_step2` | Temperature for generator in step 2 of planning stage | 0.25 |
493+
| `--cepo_planning_temperature_direct_resp` | Temperature for generator after step 2 if planning fails and answer directly | 0.1 |
493494
| `--cepo_planning_temperature_step3` | Temperature for generator in step 3 of planning stage | 0.1 |
494495
| `--cepo_planning_temperature_step4` | Temperature for generator in step 4 of planning stage | 0 |
495496
| `--cepo_planning_max_tokens_step1` | Maximum number of tokens in step 1 of planning stage | 4096 |
496497
| `--cepo_planning_max_tokens_step2` | Maximum number of tokens in step 2 of planning stage | 4096 |
498+
| `--cepo_planning_max_tokens_direct_resp` | Maximum number of tokens after step 2 if planning fails and answer directly | 4096 |
497499
| `--cepo_planning_max_tokens_step3` | Maximum number of tokens in step 3 of planning stage | 4096 |
498500
| `--cepo_planning_max_tokens_step4` | Maximum number of tokens in step 4 of planning stage | 4096 |
501+
| `--cepo_use_reasoning_fallback` | Whether to fallback to lower levels of reasoning when higher level fails | False |
502+
| `--cepo_num_of_retries` | Number of retries if llm call fails, 0 for no retries | 0 |
499503
| `--cepo_print_output` | Whether to print the output of each stage | `False` |
500504
| `--cepo_config_file` | Path to CePO configuration file | `None` |
501505
| `--cepo_use_plan_diversity` | Use additional plan diversity step | `False` |
@@ -584,6 +588,19 @@ Authorization: Bearer your_secret_api_key
584588

585589
¹ Numbers in parentheses for LongCePO indicate accuracy of majority voting from 5 runs.
586590

591+
### CePO on math and code benchmarks (Sep 2025)
592+
593+
| Method | AIME 2024 | AIME 2025 | GPQA | LiveCodeBench |
594+
| ----------------------: | :-------: | :-------: | :----: | :-----------: |
595+
| Qwen3 8B | 74.0 | 68.3 | 59.3 | 55.7 |
596+
| CePO (using Qwen3 8B) | 86.7 | 80.0 | 62.5 | 60.5 |
597+
| Qwen3 32B | 81.4 | 72.9 | 66.8 | 65.7 |
598+
| CePO (using Qwen3 32B) | **90.7** | **83.3** | 70.0 | **71.9** |
599+
| Qwen3 235B | 85.7 | 81.5 | 71.1 | 70.7 |
600+
| DeepSeek R1 | 79.8 | 70.0 | 71.5 | 64.3 |
601+
| OpenAI o3-mini | 79.6 | 74.8 | 76.8 | 66.3 |
602+
| Grok3 Think | 83.9 | 77.3 |**80.2**| 70.6 |
603+
587604
### CePO on math and code benchmarks (Mar 2025)
588605

589606
| Method | Math-L5 | MMLU-Pro (Math) | CRUX | LiveCodeBench (pass@1) | Simple QA |

0 commit comments

Comments
 (0)