Skip to content

Conversation

@aseembits93
Copy link
Contributor

@aseembits93 aseembits93 commented Apr 25, 2025

User description

These PRs were created by an erroneous comparison between baseline behavior tests and candidate behavior tests.


PR Type

Enhancement, Tests


Description

  • Extend comparison to regression test types

  • Extend comparison to replay test types

  • Add tests for regression and replay tests

  • Add TODO comments for future improvements


Changes walkthrough 📝

Relevant files
Documentation
models.py
Note missing test intersection logic                                         

codeflash/models/models.py

  • Added TODO about missing test intersection logic
+1/-0     
Enhancement
equivalence.py
Extend equivalence check for new test types                           

codeflash/verification/equivalence.py

  • Included GENERATED_REGRESSION and REPLAY_TEST in compare condition
  • Added TODO to expand test type coverage
  • +2/-1     
    Tests
    test_comparator.py
    Add regression and replay test cases                                         

    tests/test_comparator.py

  • Added tests for GENERATED_REGRESSION comparisons
  • Added tests for REPLAY_TEST comparisons
  • +88/-0   

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Missing Intersection Logic

    The TODO in total_passed_runtime indicates it doesn’t account for the intersection of tests between baseline and candidate, which may cause incorrect runtime aggregation.

    #TODO this doesn't look at the intersection of tests of baseline and original
    return sum(
        [min(usable_runtime_data) for _, usable_runtime_data in self.usable_runtime_data_by_test_case().items()]
    )
    Indentation Inconsistency

    The TODO comment for extending test comparisons is not indented within the function scope, reducing readability and potentially confusing maintainers.

    # TODO: add more tests, regression and replay
            if original_test_result.test_type in {TestType.EXISTING_UNIT_TEST, TestType.CONCOLIC_COVERAGE_TEST, TestType.GENERATED_REGRESSION, TestType.REPLAY_TEST} and (
    Missing Positive Test Cases

    New tests for GENERATED_REGRESSION and REPLAY_TEST only cover mismatched pass/fail scenarios. Add cases where did_pass matches to verify correct equality logic.

    new_results_5_baseline = TestResults()
    new_results_5_baseline.add(
        FunctionTestInvocation(
            id=InvocationId(
                test_module_path="test_module_path",
                test_class_name="test_class_name",
                test_function_name="test_function_name",
                function_getting_tested="function_getting_tested",
                iteration_id="0",
            ),
            file_name=Path("file_name"),
            did_pass=True,
            runtime=5,
            test_framework="unittest",
            test_type=TestType.GENERATED_REGRESSION,
            return_value=5,
            timed_out=False,
            loop_index=1,
        )
    )
    
    new_results_5_opt = TestResults()
    new_results_5_opt.add(
        FunctionTestInvocation(
            id=InvocationId(
                test_module_path="test_module_path",
                test_class_name="test_class_name",
                test_function_name="test_function_name",
                function_getting_tested="function_getting_tested",
                iteration_id="0",
            ),
            file_name=Path("file_name"),
            did_pass=False,
            runtime=5,
            test_framework="unittest",
            test_type=TestType.GENERATED_REGRESSION,
            return_value=5,
            timed_out=False,
            loop_index=1,
        )
    )
    
    assert  not compare_test_results(new_results_5_baseline, new_results_5_opt)
    
    new_results_6_baseline = TestResults()
    new_results_6_baseline.add(
        FunctionTestInvocation(
            id=InvocationId(
                test_module_path="test_module_path",
                test_class_name="test_class_name",
                test_function_name="test_function_name",
                function_getting_tested="function_getting_tested",
                iteration_id="0",
            ),
            file_name=Path("file_name"),
            did_pass=True,
            runtime=5,
            test_framework="unittest",
            test_type=TestType.REPLAY_TEST,
            return_value=5,
            timed_out=False,
            loop_index=1,
        )
    )
    
    new_results_6_opt = TestResults()
    new_results_6_opt.add(
        FunctionTestInvocation(
            id=InvocationId(
                test_module_path="test_module_path",
                test_class_name="test_class_name",
                test_function_name="test_function_name",
                function_getting_tested="function_getting_tested",
                iteration_id="0",
            ),
            file_name=Path("file_name"),
            did_pass=False,
            runtime=5,
            test_framework="unittest",
            test_type=TestType.REPLAY_TEST,
            return_value=5,
            timed_out=False,
            loop_index=1,
        )
    )
    
    assert  not compare_test_results(new_results_6_baseline, new_results_6_opt)

    @github-actions
    Copy link

    PR Code Suggestions ✨

    No code suggestions found for the PR.

    @misrasaurabh1 misrasaurabh1 enabled auto-merge April 25, 2025 02:30
    @misrasaurabh1 misrasaurabh1 disabled auto-merge April 25, 2025 02:54
    @misrasaurabh1 misrasaurabh1 merged commit 22560c0 into main Apr 25, 2025
    15 checks passed
    @aseembits93 aseembits93 deleted the fix-high-speedups branch April 28, 2025 19:19
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants