Skip to content

Conversation

@KRRT7
Copy link
Contributor

@KRRT7 KRRT7 commented Jun 20, 2025

PR Type

Bug fix, Tests, Enhancement


Description

  • Refactor import discovery logic for nested and aliased functions

  • Increase expected traced count in end-to-end tests

  • Add AlexNet and SimpleModel classes with forward/predict

  • Introduce test_models function and call under main


Changes walkthrough 📝

Relevant files
Tests
workload.py
Add demo model classes and tests                                                 

code_to_optimize/code_directories/simple_tracer_e2e/workload.py

  • Added AlexNet model class with init, forward, extract, classify
  • Added SimpleModel class with predict and create_default
  • Introduced test_models function using new model classes
  • Invoked test_models under the __main__ guard
  • +39/-0   
    end_to_end_test_utilities.py
    Update expected traced functions count                                     

    tests/scripts/end_to_end_test_utilities.py

  • Updated expected traced functions count from 4 to 13
  • Adjusted error logging message accordingly
  • +2/-2     
    Bug fix
    discover_unit_tests.py
    Refactor import function discovery logic                                 

    codeflash/discovery/discover_unit_tests.py

  • Consolidated alias and qualified name checks into a set-based match
  • Added prefix matching for nested target functions
  • Simplified assignment of found_any_target_function and
    found_qualified_name
  • +19/-11 

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Incomplete Implementation

    The _extract_features method in AlexNet contains a pass in the loop, so features will always be empty, leading forward to return an empty list. Ensure the feature extraction logic is implemented correctly.

    def _extract_features(self, x):
        result = []
        for i in range(len(x)):
            pass
    
        return result
    None Module Handling

    The construction of qualified_name uses node.module without guarding for None, which may occur for relative imports. Add a check to avoid building malformed qualified names when node.module is None.

    qualified_name = f"{node.module}.{alias.name}"
    potential_matches = {alias.name, qualified_name}
    
    if any(name in self.function_names_to_find for name in potential_matches):
        self.found_any_target_function = True
        self.found_qualified_name = next(
            name for name in potential_matches if name in self.function_names_to_find
    Hardcoded Count

    The expected traced function count is hardcoded to 13, making the test brittle. Consider parameterizing this value or deriving it from the test configuration to keep it in sync with actual behavior.

    if not functions_traced or int(functions_traced.group(1)) != 13:
        logging.error("Expected 13 traced functions")
        return False

    @github-actions
    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Populate features list in loop

    The _extract_features method currently does nothing and always returns an empty
    list, causing downstream classification to produce empty results. You should append
    each input element (or a transformed version) to result inside the loop.

    code_to_optimize/code_directories/simple_tracer_e2e/workload.py [37-38]

     for i in range(len(x)):
    -    pass
    +    result.append(x[i])
    Suggestion importance[1-10]: 8

    __

    Why: The _extract_features method currently returns an empty list, breaking forward output; appending x[i] fixes the core functionality.

    Medium
    Print model outputs in test

    test_models never outputs its results, so no calls will be traced or visible. Add
    print statements after computing result and prediction to ensure they are executed
    and captured by the tracer.

    code_to_optimize/code_directories/simple_tracer_e2e/workload.py [55-61]

     def test_models():
         model = AlexNet(num_classes=10)
         input_data = [1, 2, 3, 4, 5]
         result = model.forward(input_data)
    +    print(result)
     
         model2 = SimpleModel.create_default()
         prediction = model2.predict(input_data)
    +    print(prediction)
    Suggestion importance[1-10]: 5

    __

    Why: Adding print statements makes test_models outputs visible and ensures calls are actually executed and traced, improving test observability.

    Low
    Use ordered list for matching

    Using a set for potential_matches can lead to non-deterministic ordering when
    picking the first match. Switch to a list to preserve a consistent check order
    (alias.name before qualified_name).

    codeflash/discovery/discover_unit_tests.py [199]

    -potential_matches = {alias.name, qualified_name}
    +potential_matches = [alias.name, qualified_name]
    Suggestion importance[1-10]: 5

    __

    Why: Switching potential_matches from a set to a list preserves iteration order, yielding deterministic behavior for the next() lookup.

    Low

    … (`filter_test_files_by_imports_bug_fix`)
    
    Here is an optimized version of your program, focusing on the key bottlenecks identified in the profiler.
    
    **Major improvements:**
    - **Use set lookup and precomputed data:** To avoid repeated work in `any(...)` calls, we build sets/maps to batch-check function names needing exact and prefix matching.
    - **Flatten loop logic:** We reduce string concatenation and duplicate calculation.
    - **Short-circuit loop on match:** As soon as a match is found, break out of loops ASAP.
    - **Precompute most-used string to minimize per-iteration computation.**
    
    
    
    **Summary of changes:**
    - We pre-group full match and dotted-prefix match targets.
    - We remove two `any()` generator expressions over a set in favor of direct set lookups and for-loops over a prefiltered small candidate list.
    - All string concatenations and attribute accesses are done at most once per iteration.
    - Early returns are used to short-circuit unnecessary further work.
    
    This should be **significantly faster**, especially when the set of names is large and there are many aliases per import.
    @codeflash-ai
    Copy link
    Contributor

    codeflash-ai bot commented Jun 20, 2025

    ⚡️ Codeflash found optimizations for this PR

    📄 240% (2.40x) speedup for ImportAnalyzer.visit_ImportFrom in codeflash/discovery/discover_unit_tests.py

    ⏱️ Runtime : 9.87 milliseconds 2.90 milliseconds (best of 189 runs)

    I created a new dependent PR with the suggested changes. Please review:

    If you approve, it will be merged into this PR (branch filter_test_files_by_imports_bug_fix).

    …25-06-20T22.22.32
    
    ⚡️ Speed up method `ImportAnalyzer.visit_ImportFrom` by 240% in PR #355 (`filter_test_files_by_imports_bug_fix`)
    @codeflash-ai
    Copy link
    Contributor

    codeflash-ai bot commented Jun 20, 2025

    @KRRT7 KRRT7 requested a review from a team June 20, 2025 22:57
    @misrasaurabh1 misrasaurabh1 merged commit a8a591b into main Jun 20, 2025
    16 of 17 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants