Commit 53a159e
feat(interceptors): add reasoning ratio stats (#618)
- Introduced a new statistic, `reasoning_unfinished_count`,
`reasoning_finished_ratio`, to track responses where reasoning started
but did not complete and finished ratio to all reasoning responses.
- Updated the logic in `ResponseReasoningInterceptor` to increment this
count appropriately.
- Added unit tests to validate the correct tracking of reasoning states,
ensuring the mathematical invariant between started and finished counts
is maintained.
- Updated documentation to reflect the new statistic and its
significance in evaluating reasoning performance.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added two reasoning metrics: reasoning_unfinished_count (counts
started-but-incomplete reasoning) and reasoning_finished_ratio (fraction
of completed reasoning).
* **Documentation**
* Updated evaluation, interceptor, and tutorial docs to include the new
metrics in examples, metric tables, and artifact descriptions.
* **Tests**
* Added parameterized tests covering finished, unfinished, not-started,
explicit-content, and edge-case reasoning scenarios to validate counts
and ratio.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Grzegorz Chlebus <gchlebus@nvidia.com>
Signed-off-by: Tomasz Grzegorzek <tgrzegorzek@nvidia.com>
Co-authored-by: Tomasz Grzegorzek <tgrzegorzek@nvidia.com>1 parent 94f7b5c commit 53a159e
File tree
5 files changed
+183
-5
lines changed- docs
- evaluation/run-evals
- libraries/nemo-evaluator/interceptors
- tutorials/how-to
- packages/nemo-evaluator
- src/nemo_evaluator/adapters/interceptors
- tests/unit_tests/adapters/interceptors
5 files changed
+183
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
227 | 227 | | |
228 | 228 | | |
229 | 229 | | |
| 230 | + | |
230 | 231 | | |
| 232 | + | |
231 | 233 | | |
232 | 234 | | |
233 | 235 | | |
| |||
248 | 250 | | |
249 | 251 | | |
250 | 252 | | |
251 | | - | |
| 253 | + | |
252 | 254 | | |
253 | 255 | | |
254 | 256 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
78 | 79 | | |
| 80 | + | |
79 | 81 | | |
80 | 82 | | |
81 | 83 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
226 | 226 | | |
227 | 227 | | |
228 | 228 | | |
| 229 | + | |
| 230 | + | |
229 | 231 | | |
230 | 232 | | |
231 | 233 | | |
| |||
Lines changed: 19 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
| 130 | + | |
| 131 | + | |
130 | 132 | | |
131 | 133 | | |
132 | 134 | | |
| |||
281 | 283 | | |
282 | 284 | | |
283 | 285 | | |
284 | | - | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
285 | 291 | | |
286 | | - | |
| 292 | + | |
287 | 293 | | |
288 | | - | |
289 | | - | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
290 | 298 | | |
291 | 299 | | |
292 | 300 | | |
| |||
340 | 348 | | |
341 | 349 | | |
342 | 350 | | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
343 | 358 | | |
344 | 359 | | |
345 | 360 | | |
| |||
Lines changed: 157 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
449 | 449 | | |
450 | 450 | | |
451 | 451 | | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
452 | 561 | | |
453 | 562 | | |
454 | 563 | | |
| |||
499 | 608 | | |
500 | 609 | | |
501 | 610 | | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
502 | 659 | | |
503 | 660 | | |
504 | 661 | | |
| |||
0 commit comments