Commit ef0fab1
committed
e2e eval works with gsm8k
[12/08/2025-06:05:20] [TRT-LLM] [I] lm-eval gsm8k results (scores normalized to range 0~100):
|Tasks|Version| Filter |n-shot| Metric | | Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|------:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |63.6088|± |1.3253|
| | |strict-match | 5|exact_match|↑ |63.6088|± |1.3253|
[12/08/2025-06:05:20] [TRT-LLM] [I] lm-eval gsm8k average accuracy: 63.61
[12/08/2025-06:05:20] [TRT-LLM] [I] Hypothesis testing report:
===========================================================
= ACCURACY HYPOTHESIS TESTING
===========================================================
Alpha (Type I: False Positive): 0.050
Beta (Type II: False Negative): 0.200
Sigma (Standard deviation): 50.000
Higher is better: True
Theta (Minimum detectable effect): 4.841
Reference accuracy: 64.740
Threshold: 61.537
===========================================================
Evaluated accuracy: 63.609
===========================================================1 parent 0fbebf6 commit ef0fab1
File tree
6 files changed
+21
-19
lines changed- tensorrt_llm/_torch
- models
- modules
- pyexecutor
- tests/integration/defs/accuracy
6 files changed
+21
-19
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1131 | 1131 | | |
1132 | 1132 | | |
1133 | 1133 | | |
1134 | | - | |
| 1134 | + | |
| 1135 | + | |
1135 | 1136 | | |
1136 | 1137 | | |
1137 | 1138 | | |
| |||
1624 | 1625 | | |
1625 | 1626 | | |
1626 | 1627 | | |
1627 | | - | |
1628 | | - | |
1629 | | - | |
1630 | | - | |
1631 | | - | |
1632 | | - | |
1633 | | - | |
1634 | | - | |
1635 | | - | |
| 1628 | + | |
| 1629 | + | |
| 1630 | + | |
| 1631 | + | |
| 1632 | + | |
| 1633 | + | |
| 1634 | + | |
| 1635 | + | |
| 1636 | + | |
1636 | 1637 | | |
1637 | 1638 | | |
1638 | 1639 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2191 | 2191 | | |
2192 | 2192 | | |
2193 | 2193 | | |
2194 | | - | |
| 2194 | + | |
| 2195 | + | |
2195 | 2196 | | |
2196 | 2197 | | |
2197 | 2198 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
684 | 684 | | |
685 | 685 | | |
686 | 686 | | |
687 | | - | |
688 | | - | |
| 687 | + | |
| 688 | + | |
689 | 689 | | |
690 | 690 | | |
691 | 691 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2030 | 2030 | | |
2031 | 2031 | | |
2032 | 2032 | | |
2033 | | - | |
| 2033 | + | |
2034 | 2034 | | |
2035 | 2035 | | |
2036 | 2036 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
474 | 474 | | |
475 | 475 | | |
476 | 476 | | |
477 | | - | |
| 477 | + | |
478 | 478 | | |
479 | | - | |
| 479 | + | |
480 | 480 | | |
481 | 481 | | |
482 | 482 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
892 | 892 | | |
893 | 893 | | |
894 | 894 | | |
895 | | - | |
896 | | - | |
897 | | - | |
| 895 | + | |
898 | 896 | | |
| 897 | + | |
| 898 | + | |
899 | 899 | | |
900 | 900 | | |
901 | 901 | | |
| |||
0 commit comments