Commit b4de398
committed
feat: implement ES(t) macro/micro cross-validation and refactor analysis utilities
This commit implements the Error-aware Speedup Score (ES_t) metric from
Section 3.2.2 of the technical report (arXiv:2510.24035), along with the
mathematical proofs from Appendix B and C that establish the sample-level
validity of both S_t and ES_t metrics.
Key Features:
=============
1. Appendix B Implementation - Sample-level proof for S_t:
- Micro-level calculation: geometric mean of rectified speedups for all samples
- Macro-level calculation: S_t = α^λ · β^(ληp) · b^(1-λ)
- Cross-validation: both methods produce identical results, proving S_t
is equivalent to the geometric mean of sample-level rectified speedups
2. Appendix C Implementation - Sample-level proof for ES_t:
- Micro-level calculation: geometric mean of error-aware rectified speedups
- Macro-level calculation: ES_t = α^λ · β^(ληp) · γ_t^(1-λ)
- Dynamic penalty factor: γ_t = b^(sum(π_c * indicator(t < c)))
- Cross-validation: validates that ES_t is the geometric mean of
error-aware rectified speedups, where failure samples use type-specific
dynamic penalties instead of fixed penalty b
3. Error-aware design (Section 3.2.2):
- Error type classification: c=1 (accuracy), c=2 (runtime crash), c=3 (compile failure)
- Tiered tolerance rules: t≥1 tolerates accuracy errors, t≥2 tolerates
runtime crashes, t≥3 tolerates all errors
- Dynamic penalty γ_t adapts based on error type distribution and tolerance level
4. Independent verification script:
- verify_macro_params.py: calculates and prints all macro parameters
(alpha, beta, gamma, lambda, eta, pi) independently
- Enables validation of plot_ESt results by computing each parameter separately
5. Mandatory validation mechanism:
- plot_ESt.py: enforces macro/micro result matching before adoption
- Rejects results if validation fails, ensuring calculation correctness
6. Code refactoring for maintainability:
- macro_statistics.py: dedicated module for macro parameter calculations
- Each parameter has independent function (alpha, beta, gamma, lambda, eta, pi)
- Reduced nesting levels in analysis_util.py by extracting helper functions
- Simplified scan_all_folders and added .txt file support
- Improved code organization following software engineering best practices
Technical Details:
==================
- Micro calculation: processes each sample individually, applies rectified
speedup rules, then computes geometric mean
- Macro calculation: uses aggregated statistics (correct count, speedup
distributions, error type proportions) to compute expected values
- Validation: compares micro and macro results with tolerance threshold (1e-6)
- All calculations verified against real benchmark data (118 samples)
Files Changed:
==============
- graph_net/analysis_util.py: refactored with helper functions, integrated
macro_statistics module, reduced nesting, simplified scan_all_folders
- graph_net/macro_statistics.py: new module for macro parameter calculations
- graph_net/plot_ESt.py: added mandatory macro/micro validation
- graph_net/verify_macro_params.py: new independent verification script
All code passes pre-commit checks, compiles successfully, and has been
validated with real benchmark data.1 parent 9849633 commit b4de398
File tree
4 files changed
+721
-86
lines changed- graph_net
4 files changed
+721
-86
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
| 9 | + | |
8 | 10 | | |
9 | 11 | | |
10 | 12 | | |
| |||
414 | 416 | | |
415 | 417 | | |
416 | 418 | | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
417 | 527 | | |
418 | 528 | | |
419 | 529 | | |
| |||
445 | 555 | | |
446 | 556 | | |
447 | 557 | | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
448 | 561 | | |
449 | 562 | | |
450 | 563 | | |
| |||
462 | 575 | | |
463 | 576 | | |
464 | 577 | | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
465 | 583 | | |
466 | 584 | | |
467 | | - | |
468 | | - | |
469 | | - | |
470 | | - | |
471 | | - | |
472 | | - | |
473 | | - | |
474 | | - | |
475 | | - | |
476 | | - | |
477 | | - | |
478 | | - | |
479 | | - | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
480 | 597 | | |
481 | 598 | | |
482 | | - | |
483 | | - | |
484 | | - | |
485 | | - | |
486 | | - | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
491 | | - | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
492 | 606 | | |
493 | 607 | | |
494 | 608 | | |
| |||
501 | 615 | | |
502 | 616 | | |
503 | 617 | | |
| 618 | + | |
| 619 | + | |
504 | 620 | | |
505 | 621 | | |
506 | 622 | | |
507 | | - | |
508 | | - | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
509 | 626 | | |
510 | 627 | | |
511 | 628 | | |
| |||
525 | 642 | | |
526 | 643 | | |
527 | 644 | | |
| 645 | + | |
528 | 646 | | |
529 | 647 | | |
530 | | - | |
531 | 648 | | |
532 | 649 | | |
533 | | - | |
534 | | - | |
535 | | - | |
536 | | - | |
537 | | - | |
538 | | - | |
539 | | - | |
540 | | - | |
541 | | - | |
542 | | - | |
543 | | - | |
544 | | - | |
545 | | - | |
546 | | - | |
547 | | - | |
548 | | - | |
549 | | - | |
550 | | - | |
551 | | - | |
552 | | - | |
| 650 | + | |
| 651 | + | |
553 | 652 | | |
554 | 653 | | |
555 | 654 | | |
| |||
563 | 662 | | |
564 | 663 | | |
565 | 664 | | |
| 665 | + | |
566 | 666 | | |
567 | 667 | | |
568 | 668 | | |
569 | 669 | | |
570 | 670 | | |
571 | | - | |
572 | | - | |
573 | | - | |
574 | | - | |
575 | | - | |
576 | | - | |
577 | | - | |
578 | | - | |
579 | | - | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
580 | 675 | | |
581 | 676 | | |
582 | | - | |
583 | | - | |
584 | | - | |
585 | | - | |
586 | | - | |
587 | | - | |
588 | | - | |
589 | | - | |
590 | | - | |
591 | | - | |
592 | | - | |
593 | | - | |
594 | | - | |
595 | | - | |
596 | | - | |
597 | | - | |
598 | | - | |
599 | | - | |
600 | | - | |
601 | | - | |
602 | | - | |
603 | | - | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
604 | 687 | | |
605 | 688 | | |
606 | 689 | | |
607 | | - | |
608 | | - | |
609 | | - | |
610 | | - | |
611 | | - | |
612 | | - | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
613 | 694 | | |
614 | 695 | | |
615 | 696 | | |
| |||
644 | 725 | | |
645 | 726 | | |
646 | 727 | | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
647 | 731 | | |
648 | | - | |
| 732 | + | |
649 | 733 | | |
650 | 734 | | |
0 commit comments