You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduces the GenSelect plugin implementing generative solution selection based on the AIMO-2 winning solution paper. Updates README with plugin documentation and reference, bumps version to 0.1.23, and adds a GenSelect math test case.
Copy file name to clipboardExpand all lines: README.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -377,6 +377,7 @@ Check this log file for connection issues, tool execution errors, and other diag
377
377
| Read URLs |`readurls`| Reads all URLs found in the request, fetches the content at the URL and adds it to the context |
378
378
| Execute Code |`executecode`| Enables use of code interpreter to execute python code in requests and LLM generated responses |
379
379
| JSON |`json`| Enables structured outputs using the outlines library, supports pydantic types and JSON schema |
380
+
| GenSelect |`genselect`| Generative Solution Selection - generates multiple candidates and selects the best based on quality criteria |
380
381
381
382
## Available parameters
382
383
@@ -587,6 +588,7 @@ called patchflows. We saw huge performance gains across all the supported patchf
587
588
-[Unsupervised Evaluation of Code LLMs with Round-Trip Correctness](https://arxiv.org/abs/2402.08699) - [Inspired the implementation of rto](optillm/rto.py)
588
589
-[Patched MOA: optimizing inference for diverse software development tasks](https://arxiv.org/abs/2407.18521) - [Implementation](optillm/moa.py)
589
590
-[Patched RTC: evaluating LLMs for diverse software development tasks](https://arxiv.org/abs/2407.16557) - [Implementation](ptillm/rto.py)
591
+
-[AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset](https://arxiv.org/abs/2504.16891) - [Implementation](optillm/plugins/genselect_plugin.py)
Create a prompt for comparing candidate solutions.
32
+
33
+
Args:
34
+
candidates: List of candidate responses
35
+
query: The original user query
36
+
comparison_mode: "batch" for all at once, "tournament" for pairwise
37
+
38
+
Returns:
39
+
The comparison prompt
40
+
"""
41
+
ifcomparison_mode=="batch":
42
+
prompt=f"""You are an expert evaluator tasked with selecting the best response to the following query:
43
+
44
+
Query: {query}
45
+
46
+
I will provide you with {len(candidates)} different candidate responses. Please analyze each one carefully and select the best response based on the following criteria:
47
+
48
+
1. **Correctness and Accuracy**: Is the response factually correct and accurate?
49
+
2. **Completeness**: Does it fully address all aspects of the query?
50
+
3. **Clarity**: Is the explanation clear and easy to understand?
51
+
4. **Logical Coherence**: Is the reasoning sound and well-structured?
52
+
5. **Practical Value**: Does it provide useful, actionable information?
53
+
54
+
For coding problems, also consider:
55
+
- Code correctness and efficiency
56
+
- Best practices and style
57
+
- Error handling
58
+
59
+
Here are the candidate responses:
60
+
61
+
"""
62
+
fori, candidateinenumerate(candidates, 1):
63
+
prompt+=f"=== Candidate {i} ===\n{candidate}\n\n"
64
+
65
+
prompt+="""Please analyze all candidates and provide:
66
+
1. A brief comparison highlighting the strengths and weaknesses of each candidate
67
+
2. Your selection of the best candidate (specify the number)
68
+
3. A clear explanation of why you selected that candidate
0 commit comments