Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Oct 31, 2025

⚡️ This pull request contains optimizations for PR #868

If you approve this dependent PR, these changes will be merged into the original PR branch import-analyser-fix.

This PR will be automatically closed if the original PR is merged.


📄 38% (0.38x) speedup for ImportAnalyzer.visit_Attribute in codeflash/discovery/discover_unit_tests.py

⏱️ Runtime : 114 microseconds 82.6 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 38% speedup by eliminating expensive repeated string operations and set iterations within the hot path of visit_Attribute().

Key optimizations:

  1. Precomputed lookup structures: During initialization, the code now builds three efficient lookup structures:

    • _dot_methods: Maps method names to sets of possible class names (e.g., "my_method" → {"MyClass", "OtherClass"})
    • _class_method_to_target: Maps (class, method) tuples to full target names for O(1) reconstruction
    • These replace the expensive loop that called target_func.rsplit(".", 1) on every function name for every attribute node
  2. Eliminated expensive loops: The original code had nested loops iterating through all function_names_to_find for each attribute access. The optimized version uses fast hash table lookups (self._dot_methods.get(node_attr)) followed by set membership tests.

  3. Reduced attribute access overhead: Local variables node_value and node_attr cache the attribute lookups to avoid repeated property access.

Performance impact by test case type:

  • Large alias mappings: Up to 985% faster (23.4μs → 2.15μs) - most dramatic improvement when many aliases need checking
  • Large instance mappings: 342% faster (9.35μs → 2.11μs) - significant gains with many instance variables
  • Class method access: 24-27% faster - consistent improvement for dotted name resolution
  • Basic cases: 7-15% faster - modest but consistent gains even for simple scenarios

The optimization is most effective for codebases with many qualified names (e.g., "Class.method" patterns) and particularly shines when the analyzer needs to check large sets of potential matches, which is common in real-world code discovery scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 65 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import ast

# imports
import pytest  # used for our unit tests
from codeflash.discovery.discover_unit_tests import ImportAnalyzer

# function to test
# (see above for the definition of ImportAnalyzer and visit_Attribute)

# Helper to run visit_Attribute on a given AST node and analyzer
def run_visit_Attribute(analyzer, node):
    # The function is a method of ImportAnalyzer, so call it directly
    analyzer.visit_Attribute(node)

# Helper to create an ast.Attribute node for testing
def make_attribute(value_name, attr_name):
    """Creates an ast.Attribute node with value=ast.Name(id=value_name) and attr=attr_name."""
    return ast.Attribute(
        value=ast.Name(id=value_name, ctx=ast.Load()),
        attr=attr_name,
        ctx=ast.Load()
    )

# Helper to create an ImportAnalyzer with some state
def make_analyzer(function_names_to_find, imported_modules=None, alias_mapping=None, instance_mapping=None, has_dynamic_imports=False):
    analyzer = ImportAnalyzer(function_names_to_find)
    if imported_modules:
        analyzer.imported_modules = set(imported_modules)
    if alias_mapping:
        analyzer.alias_mapping = dict(alias_mapping)
    if instance_mapping:
        analyzer.instance_mapping = dict(instance_mapping)
    analyzer.has_dynamic_imports = has_dynamic_imports
    return analyzer

# ================= BASIC TEST CASES =================

def test_basic_module_function_access():
    # Test: module.function_name, where module is imported and function_name is in function_names_to_find
    analyzer = make_analyzer({"foo"}, imported_modules=["mod"])
    node = make_attribute("mod", "foo")
    run_visit_Attribute(analyzer, node)

def test_basic_module_function_access_not_found():
    # Test: module.function_name, function_name not in function_names_to_find
    analyzer = make_analyzer({"bar"}, imported_modules=["mod"])
    node = make_attribute("mod", "foo")
    run_visit_Attribute(analyzer, node)

def test_basic_class_method_access_via_alias():
    # Test: module alias, function_names_to_find is "MyClass.my_method", module imported as alias
    analyzer = make_analyzer({"MyClass.my_method"}, imported_modules=["mc"], alias_mapping={"mc": "MyClass"})
    node = make_attribute("mc", "my_method")
    run_visit_Attribute(analyzer, node)

def test_basic_instance_method_access():
    # Test: instance variable, instance_mapping, function_names_to_find is "MyClass.my_method"
    analyzer = make_analyzer({"MyClass.my_method"}, instance_mapping={"inst": "MyClass"})
    node = make_attribute("inst", "my_method")
    run_visit_Attribute(analyzer, node)

def test_basic_dynamic_import_access():
    # Test: dynamic import, has_dynamic_imports True, attr in function_names_to_find
    analyzer = make_analyzer({"dynamic_func"}, has_dynamic_imports=True)
    node = make_attribute("anymod", "dynamic_func")
    run_visit_Attribute(analyzer, node)

def test_basic_no_match():
    # Test: attribute that does not match any condition
    analyzer = make_analyzer({"foo"}, imported_modules=["mod"])
    node = make_attribute("othermod", "foo")
    run_visit_Attribute(analyzer, node)

# ================= EDGE TEST CASES =================

def test_edge_found_any_target_function_short_circuit():
    # Test: found_any_target_function is already True, should short-circuit and not change result
    analyzer = make_analyzer({"foo"}, imported_modules=["mod"])
    analyzer.found_any_target_function = True
    analyzer.found_qualified_name = "foo"
    node = make_attribute("mod", "foo")
    run_visit_Attribute(analyzer, node)

def test_edge_alias_mapping_mismatch():
    # Test: alias mapping does not match class, should not set found_any_target_function
    analyzer = make_analyzer({"MyClass.my_method"}, imported_modules=["mc"], alias_mapping={"mc": "OtherClass"})
    node = make_attribute("mc", "my_method")
    run_visit_Attribute(analyzer, node)

def test_edge_instance_mapping_mismatch():
    # Test: instance_mapping does not match class, should not set found_any_target_function
    analyzer = make_analyzer({"MyClass.my_method"}, instance_mapping={"inst": "OtherClass"})
    node = make_attribute("inst", "my_method")
    run_visit_Attribute(analyzer, node)

def test_edge_function_names_to_find_with_dot_and_no_match():
    # Test: function_names_to_find has dot, but attribute does not match method name
    analyzer = make_analyzer({"MyClass.my_method"}, imported_modules=["mc"], alias_mapping={"mc": "MyClass"})
    node = make_attribute("mc", "other_method")
    run_visit_Attribute(analyzer, node)

def test_edge_has_dynamic_imports_false():
    # Test: has_dynamic_imports is False, should not match dynamic import case
    analyzer = make_analyzer({"dynamic_func"}, has_dynamic_imports=False)
    node = make_attribute("anymod", "dynamic_func")
    run_visit_Attribute(analyzer, node)

def test_edge_multiple_function_names_to_find():
    # Test: multiple function_names_to_find, ensure correct one is set
    analyzer = make_analyzer({"foo", "bar"}, imported_modules=["mod"])
    node = make_attribute("mod", "bar")
    run_visit_Attribute(analyzer, node)

def test_edge_attribute_value_not_name():
    # Test: attribute value is not ast.Name, should not match
    node = ast.Attribute(
        value=ast.Constant(value=123),
        attr="foo",
        ctx=ast.Load()
    )
    analyzer = make_analyzer({"foo"}, imported_modules=["mod"])
    run_visit_Attribute(analyzer, node)

def test_edge_function_names_to_find_with_multiple_dots():
    # Test: function_names_to_find has multiple dots, ensure correct parsing
    analyzer = make_analyzer({"Pkg.Class.method"}, imported_modules=["Class"], alias_mapping={"Class": "Pkg.Class"})
    node = make_attribute("Class", "method")
    run_visit_Attribute(analyzer, node)

def test_edge_function_names_to_find_empty():
    # Test: function_names_to_find is empty
    analyzer = make_analyzer(set(), imported_modules=["mod"])
    node = make_attribute("mod", "foo")
    run_visit_Attribute(analyzer, node)

def test_edge_imported_modules_empty():
    # Test: imported_modules is empty, should not match
    analyzer = make_analyzer({"foo"}, imported_modules=[])
    node = make_attribute("mod", "foo")
    run_visit_Attribute(analyzer, node)

def test_edge_instance_mapping_empty():
    # Test: instance_mapping is empty, should not match
    analyzer = make_analyzer({"MyClass.my_method"}, instance_mapping={})
    node = make_attribute("inst", "my_method")
    run_visit_Attribute(analyzer, node)

def test_edge_alias_mapping_empty():
    # Test: alias_mapping is empty, should not match class method
    analyzer = make_analyzer({"MyClass.my_method"}, imported_modules=["mc"], alias_mapping={})
    node = make_attribute("mc", "my_method")
    run_visit_Attribute(analyzer, node)

# ================= LARGE SCALE TEST CASES =================

def test_large_scale_many_function_names_to_find():
    # Test: Large set of function_names_to_find, ensure correct match
    funcs = {f"mod{n}" for n in range(500)}
    analyzer = make_analyzer(funcs, imported_modules=["mod250"])
    node = make_attribute("mod250", "mod250")
    run_visit_Attribute(analyzer, node)

def test_large_scale_many_imported_modules():
    # Test: Large set of imported_modules, ensure correct match
    mods = [f"mod{n}" for n in range(500)]
    analyzer = make_analyzer({"foo"}, imported_modules=mods)
    node = make_attribute("mod123", "foo")
    run_visit_Attribute(analyzer, node)

def test_large_scale_many_aliases():
    # Test: Large alias_mapping, ensure correct alias is used
    alias_mapping = {f"alias{n}": f"Class{n}" for n in range(500)}
    target_func = "Class123.method"
    analyzer = make_analyzer({target_func}, imported_modules=["alias123"], alias_mapping=alias_mapping)
    node = make_attribute("alias123", "method")
    run_visit_Attribute(analyzer, node)

def test_large_scale_many_instances():
    # Test: Large instance_mapping, ensure correct instance is used
    instance_mapping = {f"inst{n}": f"Class{n}" for n in range(500)}
    target_func = "Class321.method"
    analyzer = make_analyzer({target_func}, instance_mapping=instance_mapping)
    node = make_attribute("inst321", "method")
    run_visit_Attribute(analyzer, node)

def test_large_scale_no_match():
    # Test: Large data, no match should be found
    funcs = {f"func{n}" for n in range(500)}
    mods = [f"mod{n}" for n in range(500)]
    analyzer = make_analyzer(funcs, imported_modules=mods)
    node = make_attribute("othermod", "otherfunc")
    run_visit_Attribute(analyzer, node)

def test_large_scale_dynamic_imports():
    # Test: Large function_names_to_find, dynamic import case
    funcs = {f"dyn_func{n}" for n in range(500)}
    analyzer = make_analyzer(funcs, has_dynamic_imports=True)
    node = make_attribute("anymod", "dyn_func123")
    run_visit_Attribute(analyzer, node)

def test_large_scale_multiple_calls_short_circuit():
    # Test: Large scale, visit_Attribute called multiple times, short-circuit after first match
    analyzer = make_analyzer({"foo"}, imported_modules=["mod"])
    node = make_attribute("mod", "foo")
    # First call should set found_any_target_function
    run_visit_Attribute(analyzer, node)
    # Second call should not change anything
    run_visit_Attribute(analyzer, node)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import ast

# imports
import pytest
from codeflash.discovery.discover_unit_tests import ImportAnalyzer

# unit tests

# Helper to create an ast.Attribute node
def make_attribute_node(value_name, attr_name):
    return ast.Attribute(value=ast.Name(id=value_name, ctx=ast.Load()), attr=attr_name, ctx=ast.Load())

# ------------------ Basic Test Cases ------------------

def test_module_function_direct_match():
    # module.func is in imported_modules and function_names_to_find
    analyzer = ImportAnalyzer({"func"})
    analyzer.imported_modules = {"module"}
    node = make_attribute_node("module", "func")
    analyzer.visit_Attribute(node) # 952ns -> 951ns (0.105% faster)

def test_module_function_no_match():
    # module.other_func is not in function_names_to_find
    analyzer = ImportAnalyzer({"func"})
    analyzer.imported_modules = {"module"}
    node = make_attribute_node("module", "other_func")
    analyzer.visit_Attribute(node) # 8.58μs -> 8.49μs (1.06% faster)

def test_class_method_imported_module():
    # module.MyClass.my_method, with alias mapping
    analyzer = ImportAnalyzer({"MyClass.my_method"})
    analyzer.imported_modules = {"mod_alias"}
    analyzer.alias_mapping = {"mod_alias": "MyClass"}
    node = make_attribute_node("mod_alias", "my_method")
    analyzer.visit_Attribute(node) # 2.11μs -> 1.70μs (24.1% faster)

def test_class_method_imported_module_no_alias():
    # module.MyClass.my_method, no alias mapping, but imported_modules contains class name
    analyzer = ImportAnalyzer({"MyClass.my_method"})
    analyzer.imported_modules = {"MyClass"}
    node = make_attribute_node("MyClass", "my_method")
    analyzer.visit_Attribute(node) # 1.98μs -> 1.56μs (26.9% faster)

def test_instance_method_access():
    # instance.my_method, instance is mapped to MyClass, looking for MyClass.my_method
    analyzer = ImportAnalyzer({"MyClass.my_method"})
    analyzer.instance_mapping = {"inst": "MyClass"}
    node = make_attribute_node("inst", "my_method")
    analyzer.visit_Attribute(node) # 2.01μs -> 1.63μs (23.3% faster)

def test_no_match_instance_method():
    # instance.other_method, instance is mapped to MyClass, but target is MyClass.my_method
    analyzer = ImportAnalyzer({"MyClass.my_method"})
    analyzer.instance_mapping = {"inst": "MyClass"}
    node = make_attribute_node("inst", "other_method")
    analyzer.visit_Attribute(node) # 8.89μs -> 8.36μs (6.37% faster)

def test_dynamic_imports():
    # Dynamic import, has_dynamic_imports is True and attr matches
    analyzer = ImportAnalyzer({"func"})
    analyzer.has_dynamic_imports = True
    node = make_attribute_node("anyobj", "func")
    analyzer.visit_Attribute(node) # 1.32μs -> 1.15μs (14.8% faster)

def test_dynamic_imports_no_match():
    # Dynamic import, has_dynamic_imports is True, attr does not match
    analyzer = ImportAnalyzer({"func"})
    analyzer.has_dynamic_imports = True
    node = make_attribute_node("anyobj", "other_func")
    analyzer.visit_Attribute(node) # 7.76μs -> 7.80μs (0.512% slower)

# ------------------ Edge Test Cases ------------------

def test_already_found_target_function():
    # If found_any_target_function is already True, should not process further
    analyzer = ImportAnalyzer({"func"})
    analyzer.found_any_target_function = True
    node = make_attribute_node("module", "func")
    analyzer.visit_Attribute(node) # 421ns -> 391ns (7.67% faster)

def test_imported_module_with_wrong_alias():
    # Alias does not match class name
    analyzer = ImportAnalyzer({"MyClass.my_method"})
    analyzer.imported_modules = {"mod_alias"}
    analyzer.alias_mapping = {"mod_alias": "OtherClass"}
    node = make_attribute_node("mod_alias", "my_method")
    analyzer.visit_Attribute(node) # 8.86μs -> 8.33μs (6.38% faster)

def test_instance_mapping_wrong_class():
    # instance mapped to wrong class
    analyzer = ImportAnalyzer({"MyClass.my_method"})
    analyzer.instance_mapping = {"inst": "OtherClass"}
    node = make_attribute_node("inst", "my_method")
    analyzer.visit_Attribute(node) # 8.35μs -> 7.93μs (5.19% faster)

def test_imported_module_not_in_function_names():
    # imported_modules set, but attr not in function_names_to_find
    analyzer = ImportAnalyzer({"func"})
    analyzer.imported_modules = {"module"}
    node = make_attribute_node("module", "not_func")
    analyzer.visit_Attribute(node) # 7.97μs -> 7.80μs (2.18% faster)

def test_attribute_value_not_name():
    # value is not ast.Name, should not match any rule
    analyzer = ImportAnalyzer({"func"})
    node = ast.Attribute(value=ast.Constant(value=123), attr="func", ctx=ast.Load())
    analyzer.visit_Attribute(node) # 8.96μs -> 8.93μs (0.325% faster)

def test_multiple_targets_priority():
    # Should match the first matching rule and not continue
    analyzer = ImportAnalyzer({"func", "MyClass.my_method"})
    analyzer.imported_modules = {"module"}
    analyzer.instance_mapping = {"inst": "MyClass"}
    node = make_attribute_node("module", "func")
    analyzer.visit_Attribute(node) # 902ns -> 811ns (11.2% faster)

def test_generic_visit_called_when_no_match(monkeypatch):
    # If no match, generic_visit should be called (simulate by patching)
    analyzer = ImportAnalyzer({"func"})
    called = []
    def fake_generic_visit(node):
        called.append(True)
    analyzer.generic_visit = fake_generic_visit
    node = make_attribute_node("not_module", "not_func")
    analyzer.visit_Attribute(node) # 1.41μs -> 1.30μs (8.53% faster)

# ------------------ Large Scale Test Cases ------------------

def test_large_imported_modules_and_function_names():
    # Many imported modules, many function names, only one matches
    num_modules = 500
    num_funcs = 500
    modules = {f"mod{i}" for i in range(num_modules)}
    funcs = {f"func{i}" for i in range(num_funcs)}
    analyzer = ImportAnalyzer(funcs)
    analyzer.imported_modules = modules
    # Pick one that should match
    node = make_attribute_node("mod123", "func321")
    analyzer.visit_Attribute(node) # 952ns -> 952ns (0.000% faster)
    # Should match only if "func321" in function_names_to_find and "mod123" in imported_modules
    if "func321" in funcs and "mod123" in modules:
        pass
    else:
        pass

def test_large_instance_mapping():
    # Many instance mappings, only one matches
    num_instances = 500
    instances = {f"inst{i}": f"Class{i}" for i in range(num_instances)}
    funcs = {f"Class{i}.method{i}" for i in range(num_instances)}
    analyzer = ImportAnalyzer(funcs)
    analyzer.instance_mapping = instances
    # Pick one that should match
    node = make_attribute_node("inst123", "method123")
    analyzer.visit_Attribute(node) # 9.35μs -> 2.11μs (342% faster)

def test_large_alias_mapping():
    # Many alias mappings, only one matches
    num_aliases = 500
    analyzer = ImportAnalyzer({f"Class{i}.meth{i}" for i in range(num_aliases)})
    analyzer.imported_modules = {f"alias{i}" for i in range(num_aliases)}
    analyzer.alias_mapping = {f"alias{i}": f"Class{i}" for i in range(num_aliases)}
    node = make_attribute_node("alias321", "meth321")
    analyzer.visit_Attribute(node) # 23.4μs -> 2.15μs (985% faster)

def test_large_dynamic_imports():
    # Many possible attrs, only one matches, dynamic import mode
    num_funcs = 500
    funcs = {f"func{i}" for i in range(num_funcs)}
    analyzer = ImportAnalyzer(funcs)
    analyzer.has_dynamic_imports = True
    node = make_attribute_node("anyobj", "func250")
    analyzer.visit_Attribute(node) # 1.47μs -> 1.28μs (14.8% faster)

def test_large_no_match():
    # Large sets, but no match
    num_modules = 500
    num_funcs = 500
    modules = {f"mod{i}" for i in range(num_modules)}
    funcs = {f"func{i}" for i in range(num_funcs)}
    analyzer = ImportAnalyzer(funcs)
    analyzer.imported_modules = modules
    node = make_attribute_node("mod999", "func999")  # not in sets
    analyzer.visit_Attribute(node) # 8.48μs -> 8.91μs (4.84% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr868-2025-10-31T21.36.53 and push.

Codeflash Static Badge

The optimized code achieves a 38% speedup by eliminating expensive repeated string operations and set iterations within the hot path of `visit_Attribute()`.

**Key optimizations:**

1. **Precomputed lookup structures**: During initialization, the code now builds three efficient lookup structures:
   - `_dot_methods`: Maps method names to sets of possible class names (e.g., "my_method" → {"MyClass", "OtherClass"})
   - `_class_method_to_target`: Maps (class, method) tuples to full target names for O(1) reconstruction
   - These replace the expensive loop that called `target_func.rsplit(".", 1)` on every function name for every attribute node

2. **Eliminated expensive loops**: The original code had nested loops iterating through all `function_names_to_find` for each attribute access. The optimized version uses fast hash table lookups (`self._dot_methods.get(node_attr)`) followed by set membership tests.

3. **Reduced attribute access overhead**: Local variables `node_value` and `node_attr` cache the attribute lookups to avoid repeated property access.

**Performance impact by test case type:**
- **Large alias mappings**: Up to 985% faster (23.4μs → 2.15μs) - most dramatic improvement when many aliases need checking
- **Large instance mappings**: 342% faster (9.35μs → 2.11μs) - significant gains with many instance variables  
- **Class method access**: 24-27% faster - consistent improvement for dotted name resolution
- **Basic cases**: 7-15% faster - modest but consistent gains even for simple scenarios

The optimization is most effective for codebases with many qualified names (e.g., "Class.method" patterns) and particularly shines when the analyzer needs to check large sets of potential matches, which is common in real-world code discovery scenarios.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 31, 2025
@codeflash-ai codeflash-ai bot mentioned this pull request Oct 31, 2025
@misrasaurabh1 misrasaurabh1 merged commit 753eef9 into import-analyser-fix Oct 31, 2025
21 of 23 checks passed
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr868-2025-10-31T21.36.53 branch October 31, 2025 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants