Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Jun 20, 2025

⚡️ This pull request contains optimizations for PR #355

If you approve this dependent PR, these changes will be merged into the original PR branch filter_test_files_by_imports_bug_fix.

This PR will be automatically closed if the original PR is merged.


📄 240% (2.40x) speedup for ImportAnalyzer.visit_ImportFrom in codeflash/discovery/discover_unit_tests.py

⏱️ Runtime : 9.87 milliseconds 2.90 milliseconds (best of 189 runs)

📝 Explanation and details

Here is an optimized version of your program, focusing on the key bottlenecks identified in the profiler.

Major improvements:

  • Use set lookup and precomputed data: To avoid repeated work in any(...) calls, we build sets/maps to batch-check function names needing exact and prefix matching.
  • Flatten loop logic: We reduce string concatenation and duplicate calculation.
  • Short-circuit loop on match: As soon as a match is found, break out of loops ASAP.
  • Precompute most-used string to minimize per-iteration computation.

Summary of changes:

  • We pre-group full match and dotted-prefix match targets.
  • We remove two any() generator expressions over a set in favor of direct set lookups and for-loops over a prefiltered small candidate list.
  • All string concatenations and attribute accesses are done at most once per iteration.
  • Early returns are used to short-circuit unnecessary further work.

This should be significantly faster, especially when the set of names is large and there are many aliases per import.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 4096 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import ast

# imports
import pytest  # used for our unit tests
from codeflash.discovery.discover_unit_tests import ImportAnalyzer

# unit tests

# Helper to create ast.ImportFrom nodes
def make_importfrom(module, names):
    """
    Helper to create an ast.ImportFrom node.
    names: list of (name, asname) tuples.
    """
    return ast.ImportFrom(
        module=module,
        names=[ast.alias(name=n, asname=a) for n, a in names],
        level=0
    )

# ----------------------
# 1. Basic Test Cases
# ----------------------

def test_simple_import_found_by_name():
    # from math import sqrt
    node = make_importfrom("math", [("sqrt", None)])
    analyzer = ImportAnalyzer({"sqrt"})
    analyzer.visit_ImportFrom(node)

def test_simple_import_found_by_qualified_name():
    # from math import sqrt
    node = make_importfrom("math", [("sqrt", None)])
    analyzer = ImportAnalyzer({"math.sqrt"})
    analyzer.visit_ImportFrom(node)

def test_simple_import_not_found():
    # from math import sqrt
    node = make_importfrom("math", [("sqrt", None)])
    analyzer = ImportAnalyzer({"log"})
    analyzer.visit_ImportFrom(node)

def test_import_with_asname():
    # from math import sqrt as square_root
    node = make_importfrom("math", [("sqrt", "square_root")])
    analyzer = ImportAnalyzer({"square_root"})
    analyzer.visit_ImportFrom(node)

def test_import_with_asname_and_qualified_name():
    # from math import sqrt as square_root
    node = make_importfrom("math", [("sqrt", "square_root")])
    analyzer = ImportAnalyzer({"math.sqrt"})
    analyzer.visit_ImportFrom(node)

def test_multiple_imports_one_match():
    # from math import sqrt, log
    node = make_importfrom("math", [("sqrt", None), ("log", None)])
    analyzer = ImportAnalyzer({"log"})
    analyzer.visit_ImportFrom(node)

def test_multiple_imports_no_match():
    # from math import sqrt, log
    node = make_importfrom("math", [("sqrt", None), ("log", None)])
    analyzer = ImportAnalyzer({"exp"})
    analyzer.visit_ImportFrom(node)

def test_import_dynamic_import_module():
    # from importlib import import_module
    node = make_importfrom("importlib", [("import_module", None)])
    analyzer = ImportAnalyzer({"import_module"})
    analyzer.visit_ImportFrom(node)

def test_import_star():
    # from math import *
    node = make_importfrom("math", [("*", None)])
    analyzer = ImportAnalyzer({"sqrt"})
    analyzer.visit_ImportFrom(node)

# ----------------------
# 2. Edge Test Cases
# ----------------------

def test_importfrom_with_no_module():
    # from . import something (module=None)
    node = ast.ImportFrom(module=None, names=[ast.alias(name="something", asname=None)], level=1)
    analyzer = ImportAnalyzer({"something"})
    analyzer.visit_ImportFrom(node)

def test_importfrom_with_empty_names():
    # from math import  (nothing)
    node = make_importfrom("math", [])
    analyzer = ImportAnalyzer({"sqrt"})
    analyzer.visit_ImportFrom(node)

def test_importfrom_with_duplicate_names():
    # from math import sqrt, sqrt as s
    node = make_importfrom("math", [("sqrt", None), ("sqrt", "s")])
    analyzer = ImportAnalyzer({"sqrt", "s"})
    analyzer.visit_ImportFrom(node)

def test_importfrom_with_multiple_star_and_names():
    # from math import *, sqrt
    node = make_importfrom("math", [("*", None), ("sqrt", None)])
    analyzer = ImportAnalyzer({"sqrt"})
    analyzer.visit_ImportFrom(node)

def test_importfrom_with_asname_and_nonmatching_target():
    # from math import sqrt as s
    node = make_importfrom("math", [("sqrt", "s")])
    analyzer = ImportAnalyzer({"sqrt"})
    analyzer.visit_ImportFrom(node)

def test_importfrom_with_submodule_prefix_match():
    # from math import sqrt
    node = make_importfrom("math", [("sqrt", None)])
    # function_names_to_find contains "math.sqrt.pow"
    analyzer = ImportAnalyzer({"math.sqrt.pow"})
    analyzer.visit_ImportFrom(node)

def test_importfrom_with_multiple_targets_and_prefix():
    # from math import sqrt, log
    node = make_importfrom("math", [("sqrt", None), ("log", None)])
    analyzer = ImportAnalyzer({"math.sqrt.pow", "math.log10"})
    analyzer.visit_ImportFrom(node)

def test_importfrom_returns_early_on_found():
    # from math import sqrt, log
    node = make_importfrom("math", [("sqrt", None), ("log", None)])
    analyzer = ImportAnalyzer({"sqrt"})
    analyzer.found_any_target_function = True  # simulate already found
    analyzer.visit_ImportFrom(node)

# ----------------------
# 3. Large Scale Test Cases
# ----------------------

def test_importfrom_many_names_one_match():
    # from math import name0, name1, ..., name999
    names = [(f"name{i}", None) for i in range(1000)]
    node = make_importfrom("math", names)
    # Only one target matches
    analyzer = ImportAnalyzer({f"name777"})
    analyzer.visit_ImportFrom(node)

def test_importfrom_many_names_no_match():
    # from math import name0, name1, ..., name999
    names = [(f"name{i}", None) for i in range(1000)]
    node = make_importfrom("math", names)
    analyzer = ImportAnalyzer({"not_present"})
    analyzer.visit_ImportFrom(node)
    # All names should be in imported_modules
    for i in range(1000):
        pass

def test_importfrom_many_targets_with_prefix_match():
    # from mod import func0, func1, ..., func999
    names = [(f"func{i}", None) for i in range(1000)]
    node = make_importfrom("mod", names)
    # function_names_to_find contains "mod.func777.extra"
    analyzer = ImportAnalyzer({"mod.func777.extra"})
    analyzer.visit_ImportFrom(node)

def test_importfrom_many_star_imports():
    # from mod import *, *, *, ...
    names = [("*", None) for _ in range(1000)]
    node = make_importfrom("mod", names)
    analyzer = ImportAnalyzer({"something"})
    analyzer.visit_ImportFrom(node)

def test_importfrom_many_asnames_and_targets():
    # from mod import func0 as f0, func1 as f1, ..., func999 as f999
    names = [(f"func{i}", f"f{i}") for i in range(1000)]
    node = make_importfrom("mod", names)
    analyzer = ImportAnalyzer({"f500"})
    analyzer.visit_ImportFrom(node)

def test_importfrom_large_target_set():
    # from mod import foo
    node = make_importfrom("mod", [("foo", None)])
    # function_names_to_find is very large, only one matches
    targets = {f"func{i}" for i in range(999)}
    targets.add("foo")
    analyzer = ImportAnalyzer(targets)
    analyzer.visit_ImportFrom(node)

def test_importfrom_large_target_set_with_prefixes():
    # from mod import foo
    node = make_importfrom("mod", [("foo", None)])
    # function_names_to_find contains many "mod.foo.something" entries
    targets = {f"mod.foo.something{i}" for i in range(999)}
    analyzer = ImportAnalyzer(targets)
    analyzer.visit_ImportFrom(node)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from __future__ import annotations

import ast

# imports
import pytest  # used for our unit tests
from codeflash.discovery.discover_unit_tests import ImportAnalyzer

# unit tests

# Helper to create ImportFrom nodes
def make_importfrom(module, names):
    """Helper to create an ast.ImportFrom node."""
    return ast.ImportFrom(
        module=module,
        names=[ast.alias(name=n[0], asname=n[1]) for n in names],
        level=0
    )

# -----------------------------
# Basic Test Cases
# -----------------------------

def test_basic_import_single_match_by_name():
    # from math import sqrt
    node = make_importfrom("math", [("sqrt", None)])
    analyzer = ImportAnalyzer({"sqrt"})
    analyzer.visit_ImportFrom(node)

def test_basic_import_single_match_by_qualified_name():
    # from math import sqrt
    node = make_importfrom("math", [("sqrt", None)])
    analyzer = ImportAnalyzer({"math.sqrt"})
    analyzer.visit_ImportFrom(node)

def test_basic_import_no_match():
    # from math import sqrt
    node = make_importfrom("math", [("sqrt", None)])
    analyzer = ImportAnalyzer({"cos"})
    analyzer.visit_ImportFrom(node)

def test_basic_import_with_asname():
    # from math import sqrt as mysqrt
    node = make_importfrom("math", [("sqrt", "mysqrt")])
    analyzer = ImportAnalyzer({"mysqrt"})
    analyzer.visit_ImportFrom(node)

def test_basic_import_multiple_names_one_match():
    # from math import sqrt, cos
    node = make_importfrom("math", [("sqrt", None), ("cos", None)])
    analyzer = ImportAnalyzer({"cos"})
    analyzer.visit_ImportFrom(node)

def test_basic_import_multiple_names_no_match():
    # from math import sqrt, cos
    node = make_importfrom("math", [("sqrt", None), ("cos", None)])
    analyzer = ImportAnalyzer({"tan"})
    analyzer.visit_ImportFrom(node)

def test_basic_import_with_asname_and_qualified_match():
    # from math import sqrt as mysqrt
    node = make_importfrom("math", [("sqrt", "mysqrt")])
    analyzer = ImportAnalyzer({"math.sqrt"})
    analyzer.visit_ImportFrom(node)

# -----------------------------
# Edge Test Cases
# -----------------------------

def test_edge_import_star():
    # from math import *
    node = make_importfrom("math", [("*", None)])
    analyzer = ImportAnalyzer({"sqrt"})
    analyzer.visit_ImportFrom(node)

def test_edge_import_from_with_no_module():
    # from . import foo (module=None)
    node = ast.ImportFrom(module=None, names=[ast.alias(name="foo", asname=None)], level=1)
    analyzer = ImportAnalyzer({"foo"})
    analyzer.visit_ImportFrom(node)

def test_edge_import_dynamic_importlib():
    # from importlib import import_module
    node = make_importfrom("importlib", [("import_module", None)])
    analyzer = ImportAnalyzer({"import_module"})
    analyzer.visit_ImportFrom(node)

def test_edge_import_dynamic_importlib_no_match():
    # from importlib import import_module
    node = make_importfrom("importlib", [("import_module", None)])
    analyzer = ImportAnalyzer({"other_func"})
    analyzer.visit_ImportFrom(node)

def test_edge_import_with_prefix_match():
    # from foo.bar import baz
    node = make_importfrom("foo.bar", [("baz", None)])
    analyzer = ImportAnalyzer({"foo.bar.baz.qux"})
    analyzer.visit_ImportFrom(node)

def test_edge_import_with_asname_and_prefix_match():
    # from foo.bar import baz as quux
    node = make_importfrom("foo.bar", [("baz", "quux")])
    analyzer = ImportAnalyzer({"foo.bar.baz.extra"})
    analyzer.visit_ImportFrom(node)

def test_edge_import_multiple_names_prefix_and_direct_match():
    # from foo.bar import baz, spam
    node = make_importfrom("foo.bar", [("baz", None), ("spam", None)])
    analyzer = ImportAnalyzer({"foo.bar.baz.eggs", "spam"})
    analyzer.visit_ImportFrom(node)

def test_edge_import_found_any_target_function_short_circuit():
    # from math import sqrt
    node = make_importfrom("math", [("sqrt", None)])
    analyzer = ImportAnalyzer({"sqrt"})
    analyzer.found_any_target_function = True
    analyzer.visit_ImportFrom(node)

# -----------------------------
# Large Scale Test Cases
# -----------------------------

def test_large_scale_many_imports_one_match():
    # from mod import name0, name1, ..., name999
    names = [(f"name{i}", None) for i in range(1000)]
    node = make_importfrom("mod", names)
    analyzer = ImportAnalyzer({f"name500"})
    analyzer.visit_ImportFrom(node)

def test_large_scale_many_imports_no_match():
    # from mod import name0, name1, ..., name999
    names = [(f"name{i}", None) for i in range(1000)]
    node = make_importfrom("mod", names)
    analyzer = ImportAnalyzer({"not_present"})
    analyzer.visit_ImportFrom(node)

def test_large_scale_many_targets_one_import():
    # from mod import foo
    node = make_importfrom("mod", [("foo", None)])
    targets = {f"mod.foo.{i}" for i in range(1000)}
    analyzer = ImportAnalyzer(targets)
    analyzer.visit_ImportFrom(node)

def test_large_scale_many_targets_and_imports():
    # from mod import name0, ..., name999
    names = [(f"name{i}", None) for i in range(1000)]
    node = make_importfrom("mod", names)
    targets = {f"name{i}" for i in range(1000)}
    analyzer = ImportAnalyzer(targets)
    analyzer.visit_ImportFrom(node)

def test_large_scale_wildcard_imports():
    # from mod import * (repeated 1000 times)
    for i in range(1000):
        node = make_importfrom(f"mod{i}", [("*", None)])
        analyzer = ImportAnalyzer({"foo"})
        analyzer.visit_ImportFrom(node)

def test_large_scale_dynamic_imports():
    # from importlib import import_module (repeated 1000 times)
    for i in range(1000):
        node = make_importfrom("importlib", [("import_module", None)])
        analyzer = ImportAnalyzer({"not_present"})
        analyzer.visit_ImportFrom(node)

# -----------------------------
# Additional Edge Cases
# -----------------------------

def test_edge_import_with_duplicate_names():
    # from math import sqrt, sqrt
    node = make_importfrom("math", [("sqrt", None), ("sqrt", None)])
    analyzer = ImportAnalyzer({"sqrt"})
    analyzer.visit_ImportFrom(node)

def test_edge_import_with_empty_names():
    # from math import (nothing)
    node = make_importfrom("math", [])
    analyzer = ImportAnalyzer({"sqrt"})
    analyzer.visit_ImportFrom(node)

def test_edge_import_with_level():
    # from .foo import bar (level > 0)
    node = ast.ImportFrom(module="foo", names=[ast.alias(name="bar", asname=None)], level=1)
    analyzer = ImportAnalyzer({"foo.bar"})
    analyzer.visit_ImportFrom(node)

def test_edge_import_with_nonstring_targets():
    # from math import sqrt
    node = make_importfrom("math", [("sqrt", None)])
    analyzer = ImportAnalyzer({b"sqrt".decode()})
    analyzer.visit_ImportFrom(node)

def test_edge_import_with_long_module_name():
    # from a.b.c.d.e.f.g.h.i.j import foo
    node = make_importfrom("a.b.c.d.e.f.g.h.i.j", [("foo", None)])
    analyzer = ImportAnalyzer({"a.b.c.d.e.f.g.h.i.j.foo"})
    analyzer.visit_ImportFrom(node)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr355-2025-06-20T22.22.32 and push.

Codeflash

… (`filter_test_files_by_imports_bug_fix`)

Here is an optimized version of your program, focusing on the key bottlenecks identified in the profiler.

**Major improvements:**
- **Use set lookup and precomputed data:** To avoid repeated work in `any(...)` calls, we build sets/maps to batch-check function names needing exact and prefix matching.
- **Flatten loop logic:** We reduce string concatenation and duplicate calculation.
- **Short-circuit loop on match:** As soon as a match is found, break out of loops ASAP.
- **Precompute most-used string to minimize per-iteration computation.**



**Summary of changes:**
- We pre-group full match and dotted-prefix match targets.
- We remove two `any()` generator expressions over a set in favor of direct set lookups and for-loops over a prefiltered small candidate list.
- All string concatenations and attribute accesses are done at most once per iteration.
- Early returns are used to short-circuit unnecessary further work.

This should be **significantly faster**, especially when the set of names is large and there are many aliases per import.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jun 20, 2025
@KRRT7 KRRT7 merged commit c3e6bec into filter_test_files_by_imports_bug_fix Jun 20, 2025
16 of 17 checks passed
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr355-2025-06-20T22.22.32 branch June 20, 2025 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants