Skip to content

Commit 25d56bc

Browse files
committed
update
1 parent 91834b9 commit 25d56bc

File tree

2 files changed

+10
-10
lines changed

2 files changed

+10
-10
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -97,10 +97,10 @@ In the model names, the "+" after indicates a prompt, while the rest are unpromp
9797
| 4o | 30 | 9 | 39 | 67% |
9898
| 4omini | 30 | 9 | 39 | 67% |
9999
| sonnet | 30 | 12 | 42 | 72% |
100-
| sonnet + so1 | 35 | 10 | 45 | 77%🥉 |
100+
| **sonnet + so1** | 35 | 10 | 45 | **77%🥉** |
101101
| sonnet + g1 * | 30 | 5 | 35 | 60% |
102-
| o1 mini | 37 | 16 | 53 | 91%🥇 |
103-
| o1 preview | 38 | 12 | 50 | 86%🥈|
102+
| **o1 mini** | 37 | 16 | 53 | **91%🥇** |
103+
| **o1 preview** | 38 | 12 | 50 | **86%🥈**|
104104

105105
> Note: sonnet+g1 tends to stop after giving only the first step of reasoning, marked as ⚠️. In scoring, it is simply counted as incorrect, but its actual performance is similar to so1.
106106
@@ -110,7 +110,7 @@ In the model names, the "+" after indicates a prompt, while the rest are unpromp
110110
| 4o | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅✅✅ | ❌✅❌ | ✅❌❌ |
111111
| 4omini | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌✅✅ | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌❌❌ |
112112
| sonnet | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌❌❌ |
113-
| sonnet + so1 | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ❌❌✅ |
113+
| **sonnet + so1** | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ❌❌✅ |
114114
| sonnet + g1 | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌✅ | ✅✅⚠️ | ⚠️✅❌ | ✅✅✅ | ❌✅❌ |
115115
| o1 mini | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅✅✅ |
116116
| o1 preview | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ✅✅✅ |
@@ -121,7 +121,7 @@ In the model names, the "+" after indicates a prompt, while the rest are unpromp
121121
| 4o | ✅✅✅ | 👍👍❌ | ❌❌👍 |
122122
| 4omini | ✅✅✅ | ❌👍👍 | ❌❌👍 |
123123
| sonnet | ✅✅✅ | 👍👍❌ | 👍✅👍 |
124-
| sonnet + so1 | ✅✅✅ | ❌❌👍 | 👍👍👍 |
124+
| **sonnet + so1** | ✅✅✅ | ❌❌👍 | 👍👍👍 |
125125
| sonnet + g1 | ✅❌⚠️ | ⚠️❌✅ | ⚠️❌👍 |
126126
| o1 mini | ✅✅✅ | ✅✅✅ | ❌✅✅ |
127127
| o1 preview | ✅✅✅ | ✅✅✅ | ❌❌❌ |

README.zh.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -99,10 +99,10 @@ Prompt参考:[g1](https://github.com/bklieger-groq/g1)
9999
| 4o | 30 | 9 | 39 | 67% |
100100
| 4omini | 30 | 9 | 39 | 67% |
101101
| sonnet | 30 | 12 | 42 | 72% |
102-
| sonnet + so1 | 35 | 10 | 45 | 77%🥉 |
102+
| **sonnet + so1** | 35 | 10 | 45 | **77%🥉** |
103103
| sonnet + g1 * | 30 | 5 | 35 | 60% |
104-
| o1 mini | 37 | 16 | 53 | 91%🥇 |
105-
| o1 preview | 38 | 12 | 50 | 86%🥈|
104+
| **o1 mini** | 37 | 16 | 53 | **91%🥇** |
105+
| **o1 preview** | 38 | 12 | 50 | **86%🥈**|
106106

107107
> 注意:sonnet+g1 容易在回答时只给出第一步推理就停止,标为⚠️,在记分时简单算作错误,实际性能近似于so1.
108108
@@ -112,7 +112,7 @@ Prompt参考:[g1](https://github.com/bklieger-groq/g1)
112112
| 4o | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅✅✅ | ❌✅❌ | ✅❌❌ |
113113
| 4omini | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌✅✅ | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌❌❌ |
114114
| sonnet | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅❌✅ | ✅✅✅ | ❌❌❌ |
115-
| sonnet + so1 | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ❌❌✅ |
115+
| **sonnet + so1** | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ❌❌✅ |
116116
| sonnet + g1 | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌✅ | ✅✅⚠️ | ⚠️✅❌ | ✅✅✅ | ❌✅❌ |
117117
| o1 mini | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅❌❌ | ✅✅✅ | ✅✅✅ |
118118
| o1 preview | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅✅ | ✅✅❌ | ✅✅✅ | ✅✅✅ |
@@ -123,7 +123,7 @@ Prompt参考:[g1](https://github.com/bklieger-groq/g1)
123123
| 4o | ✅✅✅ | 👍👍❌ | ❌❌👍 |
124124
| 4omini | ✅✅✅ | ❌👍👍 | ❌❌👍 |
125125
| sonnet | ✅✅✅ | 👍👍❌ | 👍✅👍 |
126-
| sonnet + so1 | ✅✅✅ | ❌❌👍 | 👍👍👍 |
126+
| **sonnet + so1** | ✅✅✅ | ❌❌👍 | 👍👍👍 |
127127
| sonnet + g1 | ✅❌⚠️ | ⚠️❌✅ | ⚠️❌👍 |
128128
| o1 mini | ✅✅✅ | ✅✅✅ | ❌✅✅ |
129129
| o1 preview | ✅✅✅ | ✅✅✅ | ❌❌❌ |

0 commit comments

Comments
 (0)