Commit 400f1c1
committed
e2e eval works with gsm8k
[12/08/2025-06:05:20] [TRT-LLM] [I] lm-eval gsm8k results (scores normalized to range 0~100):
|Tasks|Version| Filter |n-shot| Metric | | Value | |Stderr|
|-----|------:|----------------|-----:|-----------|---|------:|---|-----:|
|gsm8k| 3|flexible-extract| 5|exact_match|↑ |63.6088|± |1.3253|
| | |strict-match | 5|exact_match|↑ |63.6088|± |1.3253|
[12/08/2025-06:05:20] [TRT-LLM] [I] lm-eval gsm8k average accuracy: 63.61
[12/08/2025-06:05:20] [TRT-LLM] [I] Hypothesis testing report:
===========================================================
= ACCURACY HYPOTHESIS TESTING
===========================================================
Alpha (Type I: False Positive): 0.050
Beta (Type II: False Negative): 0.200
Sigma (Standard deviation): 50.000
Higher is better: True
Theta (Minimum detectable effect): 4.841
Reference accuracy: 64.740
Threshold: 61.537
===========================================================
Evaluated accuracy: 63.609
===========================================================1 parent 4e1a7d7 commit 400f1c1
File tree
6 files changed
+21
-19
lines changed- tensorrt_llm/_torch
- models
- modules
- pyexecutor
- tests/integration/defs/accuracy
6 files changed
+21
-19
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1129 | 1129 | | |
1130 | 1130 | | |
1131 | 1131 | | |
1132 | | - | |
| 1132 | + | |
| 1133 | + | |
1133 | 1134 | | |
1134 | 1135 | | |
1135 | 1136 | | |
| |||
1617 | 1618 | | |
1618 | 1619 | | |
1619 | 1620 | | |
1620 | | - | |
1621 | | - | |
1622 | | - | |
1623 | | - | |
1624 | | - | |
1625 | | - | |
1626 | | - | |
1627 | | - | |
1628 | | - | |
| 1621 | + | |
| 1622 | + | |
| 1623 | + | |
| 1624 | + | |
| 1625 | + | |
| 1626 | + | |
| 1627 | + | |
| 1628 | + | |
| 1629 | + | |
1629 | 1630 | | |
1630 | 1631 | | |
1631 | 1632 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2163 | 2163 | | |
2164 | 2164 | | |
2165 | 2165 | | |
2166 | | - | |
| 2166 | + | |
| 2167 | + | |
2167 | 2168 | | |
2168 | 2169 | | |
2169 | 2170 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
684 | 684 | | |
685 | 685 | | |
686 | 686 | | |
687 | | - | |
688 | | - | |
| 687 | + | |
| 688 | + | |
689 | 689 | | |
690 | 690 | | |
691 | 691 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1905 | 1905 | | |
1906 | 1906 | | |
1907 | 1907 | | |
1908 | | - | |
| 1908 | + | |
1909 | 1909 | | |
1910 | 1910 | | |
1911 | 1911 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
474 | 474 | | |
475 | 475 | | |
476 | 476 | | |
477 | | - | |
| 477 | + | |
478 | 478 | | |
479 | | - | |
| 479 | + | |
480 | 480 | | |
481 | 481 | | |
482 | 482 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
907 | 907 | | |
908 | 908 | | |
909 | 909 | | |
910 | | - | |
911 | | - | |
912 | | - | |
| 910 | + | |
913 | 911 | | |
| 912 | + | |
| 913 | + | |
914 | 914 | | |
915 | 915 | | |
916 | 916 | | |
| |||
0 commit comments