|
1 | 1 | # Math Evaluation Leaderboard |
2 | 2 |
|
3 | | -Last updated: 2025-04-09 09:53:17 |
4 | | - |
5 | | -| Model | Score | Tokens Used | System Prompt | Evaluation Time | Details | Model Info | |
6 | | -|-------|--------|-------------|---------------|----------------|----------|------------| |
7 | | -| o3-mini-2025-01-31 | 0.442 | 0 | You are a helpful math assi... | 595.1s | [Details](details/o3-mini-2025-01-31/details_20250408_121611.md) | | |
8 | | -| meta-llama/llama-4-scout | 0.374 | 0 | You are a helpful math assi... | 648.5s | [Details](details/meta-llama/llama-4-scout/details_20250409_073324.md) | | |
9 | | -| gpt-4o | 0.353 | 0 | You are a helpful math assi... | 515.1s | [Details](details/gpt-4o/details_20250408_115501.md) | | |
10 | | -| gpt-4o-mini | 0.311 | 0 | You are a helpful math assi... | 682.2s | [Details](details/gpt-4o-mini/details_20250408_115501.md) | | |
11 | | -| deepseek/deepseek-chat-v3-0324 | 0.100 | 0 | You are a helpful math assi... | 6813.4s | [Details](details/deepseek/deepseek-chat-v3-0324/details_20250409_075941.md) | | |
| 3 | +| Model | Combined Score | RussianMath Score | MathDemon Score | Tokens Used | System Prompt | Evaluation Time | Dataset | Details | |
| 4 | +|-------|---------------|------------------|----------------|-------------|---------------|----------------|---------|----------| |
| 5 | +| gpt-4o-mini | 0.321 | 0.321 | 0.173 | 251078 | Вы - полезный помощник по м... | 950.2s | RussianMath, MathDemon | [RussianMath](details/gpt-4o-mini/details_20250413_204220.md), [MathDemon](details/gpt-4o-mini/details_20250413_205901.md) | |
| 6 | +| └─ Approximation_by_Polynomials | - | - | 0.429 | 5191 | - | 27.2s | MathDemon/Approximation_by_Polynomials | [Details](details/gpt-4o-mini/details_20250413_205329.md) | |
| 7 | +| └─ Continuous_Functions | - | - | 0.143 | 5994 | - | 22.2s | MathDemon/Continuous_Functions | [Details](details/gpt-4o-mini/details_20250413_205356.md) | |
| 8 | +| └─ Convex_Functions | - | - | 0.182 | 8705 | - | 29.9s | MathDemon/Convex_Functions | [Details](details/gpt-4o-mini/details_20250413_205430.md) | |
| 9 | +| └─ Differentiation | - | - | 0.111 | 8561 | - | 31.3s | MathDemon/Differentiation | [Details](details/gpt-4o-mini/details_20250413_205505.md) | |
| 10 | +| └─ Improper_Integrals | - | - | 0.111 | 8269 | - | 43.7s | MathDemon/Improper_Integrals | [Details](details/gpt-4o-mini/details_20250413_205553.md) | |
| 11 | +| └─ Infinite_Series | - | - | 0.154 | 10342 | - | 56.0s | MathDemon/Infinite_Series | [Details](details/gpt-4o-mini/details_20250413_205652.md) | |
| 12 | +| └─ Integration | - | - | 0.091 | 9718 | - | 44.0s | MathDemon/Integration | [Details](details/gpt-4o-mini/details_20250413_205740.md) | |
| 13 | +| └─ Sequences_and_Limits | - | - | 0.000 | 7275 | - | 28.3s | MathDemon/Sequences_and_Limits | [Details](details/gpt-4o-mini/details_20250413_205824.md) | |
| 14 | +| └─ Series_of_Functions | - | - | 0.333 | 9428 | - | 32.7s | MathDemon/Series_of_Functions | [Details](details/gpt-4o-mini/details_20250413_205901.md) | |
| 15 | +| GigaChat-2-Max | 0.195 | 0.195 | 0.095 | 123361 | Вы - полезный помощник по м... | 588.0s | RussianMath, MathDemon | [RussianMath](details/GigaChat-2-Max/details_20250413_204220.md), [MathDemon](details/GigaChat-2-Max/details_20250413_205901.md) | |
| 16 | +| └─ Approximation_by_Polynomials | - | - | 0.143 | 3942 | - | 17.8s | MathDemon/Approximation_by_Polynomials | [Details](details/GigaChat-2-Max/details_20250413_205319.md) | |
| 17 | +| └─ Continuous_Functions | - | - | 0.143 | 4018 | - | 17.6s | MathDemon/Continuous_Functions | [Details](details/GigaChat-2-Max/details_20250413_205350.md) | |
| 18 | +| └─ Convex_Functions | - | - | 0.000 | 2682 | - | 21.3s | MathDemon/Convex_Functions | [Details](details/GigaChat-2-Max/details_20250413_205420.md) | |
| 19 | +| └─ Differentiation | - | - | 0.111 | 5177 | - | 20.3s | MathDemon/Differentiation | [Details](details/GigaChat-2-Max/details_20250413_205454.md) | |
| 20 | +| └─ Improper_Integrals | - | - | 0.111 | 2988 | - | 23.5s | MathDemon/Improper_Integrals | [Details](details/GigaChat-2-Max/details_20250413_205532.md) | |
| 21 | +| └─ Infinite_Series | - | - | 0.154 | 6052 | - | 27.1s | MathDemon/Infinite_Series | [Details](details/GigaChat-2-Max/details_20250413_205624.md) | |
| 22 | +| └─ Integration | - | - | 0.000 | 3960 | - | 25.8s | MathDemon/Integration | [Details](details/GigaChat-2-Max/details_20250413_205722.md) | |
| 23 | +| └─ Sequences_and_Limits | - | - | 0.111 | 4647 | - | 20.5s | MathDemon/Sequences_and_Limits | [Details](details/GigaChat-2-Max/details_20250413_205816.md) | |
| 24 | +| └─ Series_of_Functions | - | - | 0.083 | 5043 | - | 28.0s | MathDemon/Series_of_Functions | [Details](details/GigaChat-2-Max/details_20250413_205855.md) | |
| 25 | +| GigaChat-2-Pro | 0.179 | 0.179 | 0.099 | 133525 | Вы - полезный помощник по м... | 578.7s | RussianMath, MathDemon | [RussianMath](details/GigaChat-2-Pro/details_20250413_204220.md), [MathDemon](details/GigaChat-2-Pro/details_20250413_205901.md) | |
| 26 | +| └─ Approximation_by_Polynomials | - | - | 0.000 | 1242 | - | 14.8s | MathDemon/Approximation_by_Polynomials | [Details](details/GigaChat-2-Pro/details_20250413_205316.md) | |
| 27 | +| └─ Continuous_Functions | - | - | 0.143 | 3801 | - | 18.2s | MathDemon/Continuous_Functions | [Details](details/GigaChat-2-Pro/details_20250413_205351.md) | |
| 28 | +| └─ Convex_Functions | - | - | 0.091 | 6691 | - | 23.2s | MathDemon/Convex_Functions | [Details](details/GigaChat-2-Pro/details_20250413_205422.md) | |
| 29 | +| └─ Differentiation | - | - | 0.333 | 4747 | - | 25.5s | MathDemon/Differentiation | [Details](details/GigaChat-2-Pro/details_20250413_205459.md) | |
| 30 | +| └─ Improper_Integrals | - | - | 0.000 | 4302 | - | 18.8s | MathDemon/Improper_Integrals | [Details](details/GigaChat-2-Pro/details_20250413_205527.md) | |
| 31 | +| └─ Infinite_Series | - | - | 0.154 | 7120 | - | 26.9s | MathDemon/Infinite_Series | [Details](details/GigaChat-2-Pro/details_20250413_205623.md) | |
| 32 | +| └─ Integration | - | - | 0.000 | 7730 | - | 30.9s | MathDemon/Integration | [Details](details/GigaChat-2-Pro/details_20250413_205727.md) | |
| 33 | +| └─ Sequences_and_Limits | - | - | 0.000 | 3723 | - | 17.9s | MathDemon/Sequences_and_Limits | [Details](details/GigaChat-2-Pro/details_20250413_205814.md) | |
| 34 | +| └─ Series_of_Functions | - | - | 0.167 | 6887 | - | 27.1s | MathDemon/Series_of_Functions | [Details](details/GigaChat-2-Pro/details_20250413_205854.md) | |
| 35 | +| GigaChat-Max | 0.168 | 0.168 | 0.065 | 147924 | Вы - полезный помощник по м... | 784.1s | RussianMath, MathDemon | [RussianMath](details/GigaChat-Max/details_20250413_204220.md), [MathDemon](details/GigaChat-Max/details_20250413_205901.md) | |
| 36 | +| └─ Approximation_by_Polynomials | - | - | 0.000 | 5190 | - | 27.9s | MathDemon/Approximation_by_Polynomials | [Details](details/GigaChat-Max/details_20250413_205329.md) | |
| 37 | +| └─ Continuous_Functions | - | - | 0.143 | 2743 | - | 18.2s | MathDemon/Continuous_Functions | [Details](details/GigaChat-Max/details_20250413_205351.md) | |
| 38 | +| └─ Convex_Functions | - | - | 0.091 | 4103 | - | 28.1s | MathDemon/Convex_Functions | [Details](details/GigaChat-Max/details_20250413_205427.md) | |
| 39 | +| └─ Differentiation | - | - | 0.000 | 2079 | - | 21.0s | MathDemon/Differentiation | [Details](details/GigaChat-Max/details_20250413_205455.md) | |
| 40 | +| └─ Improper_Integrals | - | - | 0.111 | 6634 | - | 38.6s | MathDemon/Improper_Integrals | [Details](details/GigaChat-Max/details_20250413_205547.md) | |
| 41 | +| └─ Infinite_Series | - | - | 0.154 | 5528 | - | 32.2s | MathDemon/Infinite_Series | [Details](details/GigaChat-Max/details_20250413_205629.md) | |
| 42 | +| └─ Integration | - | - | 0.000 | 10020 | - | 56.3s | MathDemon/Integration | [Details](details/GigaChat-Max/details_20250413_205752.md) | |
| 43 | +| └─ Sequences_and_Limits | - | - | 0.000 | 4030 | - | 23.7s | MathDemon/Sequences_and_Limits | [Details](details/GigaChat-Max/details_20250413_205820.md) | |
| 44 | +| └─ Series_of_Functions | - | - | 0.083 | 4087 | - | 30.8s | MathDemon/Series_of_Functions | [Details](details/GigaChat-Max/details_20250413_205858.md) | |
| 45 | +| GigaChat-2 | 0.116 | 0.116 | 0.042 | 103214 | Вы - полезный помощник по м... | 337.6s | RussianMath, MathDemon | [RussianMath](details/GigaChat-2/details_20250413_204220.md), [MathDemon](details/GigaChat-2/details_20250413_205901.md) | |
| 46 | +| └─ Approximation_by_Polynomials | - | - | 0.000 | 2307 | - | 6.3s | MathDemon/Approximation_by_Polynomials | [Details](details/GigaChat-2/details_20250413_205308.md) | |
| 47 | +| └─ Continuous_Functions | - | - | 0.000 | 3390 | - | 7.9s | MathDemon/Continuous_Functions | [Details](details/GigaChat-2/details_20250413_205340.md) | |
| 48 | +| └─ Convex_Functions | - | - | 0.000 | 5135 | - | 13.2s | MathDemon/Convex_Functions | [Details](details/GigaChat-2/details_20250413_205412.md) | |
| 49 | +| └─ Differentiation | - | - | 0.111 | 4267 | - | 10.4s | MathDemon/Differentiation | [Details](details/GigaChat-2/details_20250413_205444.md) | |
| 50 | +| └─ Improper_Integrals | - | - | 0.111 | 4432 | - | 9.0s | MathDemon/Improper_Integrals | [Details](details/GigaChat-2/details_20250413_205517.md) | |
| 51 | +| └─ Infinite_Series | - | - | 0.154 | 4210 | - | 15.2s | MathDemon/Infinite_Series | [Details](details/GigaChat-2/details_20250413_205612.md) | |
| 52 | +| └─ Integration | - | - | 0.000 | 424 | - | 18.9s | MathDemon/Integration | [Details](details/GigaChat-2/details_20250413_205715.md) | |
| 53 | +| └─ Sequences_and_Limits | - | - | 0.000 | 4084 | - | 8.6s | MathDemon/Sequences_and_Limits | [Details](details/GigaChat-2/details_20250413_205804.md) | |
| 54 | +| └─ Series_of_Functions | - | - | 0.000 | 3575 | - | 15.7s | MathDemon/Series_of_Functions | [Details](details/GigaChat-2/details_20250413_205843.md) | |
| 55 | +======= |
0 commit comments