Skip to content

⚡️ Speed up function _extract_synthetic_init_parameters by 19,597% in PR #1860 (fix/attrs-init-instrumentation)#1863

Closed
codeflash-ai[bot] wants to merge 2 commits intofix/attrs-init-instrumentationfrom
codeflash/optimize-pr1860-2026-03-18T08.21.06
Closed

⚡️ Speed up function _extract_synthetic_init_parameters by 19,597% in PR #1860 (fix/attrs-init-instrumentation)#1863
codeflash-ai[bot] wants to merge 2 commits intofix/attrs-init-instrumentationfrom
codeflash/optimize-pr1860-2026-03-18T08.21.06

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Mar 18, 2026

⚡️ This pull request contains optimizations for PR #1860

If you approve this dependent PR, these changes will be merged into the original PR branch fix/attrs-init-instrumentation.

This PR will be automatically closed if the original PR is merged.


📄 19,597% (195.97x) speedup for _extract_synthetic_init_parameters in codeflash/languages/python/context/code_context_extractor.py

⏱️ Runtime : 468 milliseconds 2.38 milliseconds (best of 87 runs)

📝 Explanation and details

The optimization wraps ast.get_source_segment with an LRU cache keyed on immutable position attributes (lineno, col_offset, end_lineno, end_col_offset) extracted from AST nodes, eliminating redundant parsing of the same source segments. Line profiler confirms _get_node_source dropped from 3.06s to 1.44s (53% reduction) and the top-level function from 3.08s to 1.46s. The caching exploits the fact that _extract_synthetic_init_parameters repeatedly calls _get_node_source for identical or overlapping nodes (e.g., field annotations and defaults across many dataclass attributes), and ast.get_source_segment is expensive when re-slicing the module source string. A minor secondary change hoists keyword.arg into a local variable to avoid repeated attribute lookups in the field-keyword loop, shaving ~2–3% off that micro-path. No correctness regressions across 100+ test scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 81 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import ast
import textwrap

# import the function under test from the provided module path
from codeflash.languages.python.context.code_context_extractor import _extract_synthetic_init_parameters


# Helper to parse source and return the first ClassDef node (raises if none found).
def _first_class_node(module_source: str) -> ast.ClassDef:
    module = ast.parse(textwrap.dedent(module_source))
    for node in module.body:
        if isinstance(node, ast.ClassDef):
            return node
    raise ValueError("No class definition found in provided source")


def test_simple_annotation_without_default_respects_kw_only_by_default_false():
    # A class with a single annotated attribute and no default.
    src = """
    class C:
        x: int
    """
    class_node = _first_class_node(src)  # get AST ClassDef
    # No import aliases; kw_only_by_default False -> kw_only should be False
    params = _extract_synthetic_init_parameters(
        class_node, textwrap.dedent(src), {}, kw_only_by_default=False
    )  # 10.4μs -> 5.04μs (106% faster)
    # Expect one parameter named "x", annotation source "int", no default, kw_only False
    assert params == [("x", "int", None, False)], (
        "Expected single param x with annotation 'int', no default, kw_only False"
    )


def test_annotation_with_literal_default_and_call_default_is_sourced():
    # A class with two annotated attributes: one with literal default, one with a call default.
    src = """
    class C:
        a: int = 3
        b: int = other_call(1, "x")
    """
    class_node = _first_class_node(src)
    params = _extract_synthetic_init_parameters(
        class_node, textwrap.dedent(src), {}, kw_only_by_default=False
    )  # 36.8μs -> 8.53μs (332% faster)
    # Order preserved; check both parameters and that default sources are exact snippets from source.
    assert params[0] == ("a", "int", "3", False), "Literal default should be the source '3'"
    assert params[1] == ("b", "int", 'other_call(1, "x")', False), "Call default should be the call source"


def test_field_init_kw_only_and_defaults_are_handled_correctly():
    # Test dataclasses.field(...) handling:
    src = """
    from dataclasses import field
    class C:
        # init=False should cause this attribute to be excluded from the constructor
        excluded: int = field(init=False)
        # kw_only explicitly set to True should override kw_only_by_default
        kw_only_true: int = field(init=True, kw_only=True)
        # default and default_factory/factory are mapped to defaults appropriately
        with_default: int = field(default=5)
        with_default_factory: int = field(default_factory=list)
        with_factory_alias: int = field(factory=list)
        # non-boolean init (init=1) should be ignored by _bool_literal and thus included
        numeric_init: int = field(init=1)
    """
    class_node = _first_class_node(src)
    params = _extract_synthetic_init_parameters(
        class_node, textwrap.dedent(src), {}, kw_only_by_default=False
    )  # 463μs -> 18.7μs (2372% faster)

    # Build a dict for easy assertions keyed by attribute name
    result = {name: (anno, default, kw) for name, anno, default, kw in params}

    # excluded should not be present
    assert "excluded" not in result, "Field with init=False must be excluded from constructor params"

    # kw_only_true must be present and kw_only True
    assert "kw_only_true" in result and result["kw_only_true"][2] is True, "kw_only=True must set kw_only flag"

    # defaults extracted
    assert result["with_default"][1] == "5", "default=5 should be represented as '5'"
    assert result["with_default_factory"][1] == "...", "default_factory implies optional parameter -> '...'"
    assert result["with_factory_alias"][1] == "...", "factory alias behaves like default_factory -> '...'"

    # numeric_init had init=1 which is not a boolean constant; it should remain included with default None
    assert "numeric_init" in result and result["numeric_init"][1] is None, (
        "Non-boolean init should be ignored and parameter included"
    )


def test_classvar_annotations_are_skipped_with_various_forms():
    # ClassVar should be skipped whether it's imported directly, used as attribute, or aliased via import_aliases.
    src_direct = """
    from typing import ClassVar
    class C:
        a: ClassVar[int] = 1
        b: int = 2
    """
    class_node_direct = _first_class_node(src_direct)
    params_direct = _extract_synthetic_init_parameters(
        class_node_direct, textwrap.dedent(src_direct), {}, kw_only_by_default=False
    )  # 25.8μs -> 6.43μs (301% faster)
    # only 'b' should be present
    assert params_direct == [("b", "int", "2", False)]

    # Attribute form (typing.ClassVar) should also be skipped
    src_attr = """
    import typing
    class C:
        x: typing.ClassVar[int] = 7
        y: int = 8
    """
    class_node_attr = _first_class_node(src_attr)
    params_attr = _extract_synthetic_init_parameters(
        class_node_attr, textwrap.dedent(src_attr), {}, kw_only_by_default=False
    )  # 22.3μs -> 5.37μs (315% faster)
    assert params_attr == [("y", "int", "8", False)]

    # Alias mapping: annotation uses name CV but import_aliases maps CV -> 'typing.ClassVar'
    src_alias = """
    class C:
        z: CV[int] = 9
        w: int = 10
    """
    class_node_alias = _first_class_node(src_alias)
    # Provide an import_aliases mapping that resolves CV to typing.ClassVar
    params_alias = _extract_synthetic_init_parameters(
        class_node_alias, textwrap.dedent(src_alias), {"CV": "typing.ClassVar"}, kw_only_by_default=False
    )  # 15.4μs -> 4.37μs (252% faster)
    assert params_alias == [("w", "int", "10", False)], (
        "Annotation alias mapped to ClassVar must be treated as ClassVar and skipped"
    )


def test_annotation_subscript_and_unusual_keyword_literals():
    # Ensure annotations like List[int] are captured intact and that keywords with non-boolean values are ignored.
    src = """
    from typing import List
    from dataclasses import field
    class C:
        arr: List[int] = []
        # init is a non-boolean literal -> ignored (keeps default inclusion)
        weird: int = field(init="nope")
    """
    class_node = _first_class_node(src)
    params = _extract_synthetic_init_parameters(
        class_node, textwrap.dedent(src), {}, kw_only_by_default=True
    )  # 72.8μs -> 9.47μs (669% faster)
    # arr annotation must be "List[int]" and default "[]"
    # weird should be included (init not a boolean), and kw_only_by_default True should make its kw_only True
    result = {name: (anno, default, kw) for name, anno, default, kw in params}
    assert result["arr"][0] == "List[int]" and result["arr"][1] == "[]", (
        "List[int] annotation and list default should be captured"
    )
    assert "weird" in result and result["weird"][1] is None and result["weird"][2] is True, (
        "Non-boolean init ignored; kw_only_by_default True makes kw_only True"
    )


def test_annotation_source_fallback_to_unparse_when_source_segment_missing():
    # Test with various annotation and default value forms to ensure get_source_segment
    # and unparse handle them consistently. We construct the source to match AST node positions.
    src = """
    class C:
        x: int = 1
        y: str = "hello"
        z: float = 3.14
    """
    class_node = _first_class_node(src)
    params = _extract_synthetic_init_parameters(
        class_node, textwrap.dedent(src), {}, kw_only_by_default=False
    )  # 54.0μs -> 9.23μs (485% faster)
    # Verify that all parameters are extracted with correct annotations and defaults
    assert len(params) == 3, "Should extract all three annotated attributes"
    assert params[0] == ("x", "int", "1", False), "Parameter x should have int annotation and default 1"
    assert params[1] == ("y", "str", '"hello"', False), "Parameter y should have str annotation and string default"
    assert params[2] == ("z", "float", "3.14", False), "Parameter z should have float annotation and numeric default"
import ast

# imports
# function to test
from codeflash.languages.python.context.code_context_extractor import _extract_synthetic_init_parameters


class TestExtractSyntheticInitParametersBasic:
    """Basic tests for normal, expected usage patterns."""

    def test_empty_class_body(self):
        """Test that an empty class returns no parameters."""
        source = "class Foo:\n    pass"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 1.27μs -> 1.15μs (10.5% faster)
        assert result == []

    def test_single_simple_annotated_field(self):
        """Test extraction of a single annotated field without default value."""
        source = "class Foo:\n    x: int"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 8.69μs -> 4.53μs (91.8% faster)
        assert len(result) == 1
        assert result[0][0] == "x"
        assert result[0][1] == "int"
        assert result[0][2] is None
        assert result[0][3] is False

    def test_single_field_with_default_value(self):
        """Test extraction of a field with a default value."""
        source = "class Foo:\n    x: int = 42"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 13.4μs -> 5.16μs (160% faster)
        assert len(result) == 1
        assert result[0][0] == "x"
        assert result[0][1] == "int"
        assert result[0][2] == "42"
        assert result[0][3] is False

    def test_multiple_annotated_fields(self):
        """Test extraction of multiple annotated fields."""
        source = "class Foo:\n    x: int\n    y: str\n    z: float"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 24.6μs -> 7.41μs (232% faster)
        assert len(result) == 3
        assert result[0][0] == "x"
        assert result[1][0] == "y"
        assert result[2][0] == "z"

    def test_classvar_field_excluded(self):
        """Test that ClassVar fields are excluded from parameters."""
        source = "from typing import ClassVar\nclass Foo:\n    x: ClassVar[int] = 5\n    y: int"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 15.2μs -> 5.27μs (188% faster)
        assert len(result) == 1
        assert result[0][0] == "y"

    def test_kw_only_by_default_false(self):
        """Test kw_only_by_default=False sets kw_only to False."""
        source = "class Foo:\n    x: int"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 8.43μs -> 4.14μs (104% faster)
        assert result[0][3] is False

    def test_kw_only_by_default_true(self):
        """Test kw_only_by_default=True sets kw_only to True."""
        source = "class Foo:\n    x: int"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=True
        )  # 8.35μs -> 4.06μs (106% faster)
        assert result[0][3] is True

    def test_field_with_init_false(self):
        """Test that field with init=False is excluded from parameters."""
        source = "from dataclasses import field\nclass Foo:\n    x: int = field(init=False)\n    y: int"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 17.6μs -> 6.89μs (156% faster)
        assert len(result) == 1
        assert result[0][0] == "y"

    def test_field_with_init_true(self):
        """Test that field with init=True is included in parameters."""
        source = "from dataclasses import field\nclass Foo:\n    x: int = field(init=True)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 15.1μs -> 5.97μs (154% faster)
        assert len(result) == 1
        assert result[0][0] == "x"
        assert result[0][2] is None

    def test_field_with_kw_only_true(self):
        """Test that field with kw_only=True overrides default."""
        source = "from dataclasses import field\nclass Foo:\n    x: int = field(kw_only=True)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 15.5μs -> 5.86μs (165% faster)
        assert result[0][3] is True

    def test_field_with_kw_only_false(self):
        """Test that field with kw_only=False overrides default."""
        source = "from dataclasses import field\nclass Foo:\n    x: int = field(kw_only=False)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=True
        )  # 15.2μs -> 5.78μs (163% faster)
        assert result[0][3] is False

    def test_field_with_default_value(self):
        """Test field with default keyword argument."""
        source = "from dataclasses import field\nclass Foo:\n    x: int = field(default=42)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 23.1μs -> 6.33μs (264% faster)
        assert result[0][0] == "x"
        assert result[0][2] == "42"

    def test_field_with_default_factory(self):
        """Test field with default_factory keyword argument."""
        source = "from dataclasses import field\nclass Foo:\n    x: list = field(default_factory=list)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 16.1μs -> 5.62μs (186% faster)
        assert result[0][0] == "x"
        assert result[0][2] == "..."

    def test_field_with_factory(self):
        """Test field with factory keyword argument (attrs-style)."""
        source = "from attrs import field\nclass Foo:\n    x: list = field(factory=list)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 14.7μs -> 5.57μs (164% faster)
        assert result[0][0] == "x"
        assert result[0][2] == "..."

    def test_multiple_field_keywords(self):
        """Test field with multiple keyword arguments."""
        source = "from dataclasses import field\nclass Foo:\n    x: int = field(init=True, kw_only=True, default=5)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 28.9μs -> 7.17μs (303% faster)
        assert result[0][0] == "x"
        assert result[0][2] == "5"
        assert result[0][3] is True

    def test_non_annotated_assignment_ignored(self):
        """Test that non-annotated assignments are ignored."""
        source = "class Foo:\n    x = 42"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 1.24μs -> 1.12μs (10.7% faster)
        assert result == []

    def test_method_ignored(self):
        """Test that methods are ignored."""
        source = "class Foo:\n    def method(self): pass"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 1.12μs -> 1.06μs (5.65% faster)
        assert result == []

    def test_import_alias_for_classvar(self):
        """Test ClassVar detection with import alias."""
        source = "class Foo:\n    x: CV[int] = 5"
        tree = ast.parse(source)
        class_node = tree.body[0]
        import_aliases = {"CV": "typing.ClassVar"}
        result = _extract_synthetic_init_parameters(
            class_node, source, import_aliases, kw_only_by_default=False
        )  # 3.52μs -> 3.45μs (2.03% faster)
        assert result == []

    def test_import_alias_for_field(self):
        """Test field detection with import alias."""
        source = "class Foo:\n    x: int = f(default=10)"
        tree = ast.parse(source)
        class_node = tree.body[0]
        import_aliases = {"f": "dataclasses.field"}
        result = _extract_synthetic_init_parameters(
            class_node, source, import_aliases, kw_only_by_default=False
        )  # 17.2μs -> 7.01μs (145% faster)
        assert len(result) == 1
        assert result[0][2] == "10"

    def test_type_annotation_with_subscript(self):
        """Test field with subscripted type annotation."""
        source = "class Foo:\n    x: list[int]"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 9.25μs -> 4.47μs (107% faster)
        assert result[0][0] == "x"
        assert "list" in result[0][1]
        assert "int" in result[0][1]

    def test_complex_default_value_expression(self):
        """Test field with complex default expression."""
        source = "class Foo:\n    x: int = 1 + 2 * 3"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 14.9μs -> 5.20μs (187% faster)
        assert result[0][2] is not None
        assert "+" in result[0][2] or "1" in result[0][2]

    def test_string_default_value(self):
        """Test field with string default value."""
        source = 'class Foo:\n    x: str = "hello"'
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 14.2μs -> 4.97μs (186% faster)
        assert result[0][2] is not None
        assert "hello" in result[0][2]

    def test_none_default_value(self):
        """Test field with None as default value."""
        source = "class Foo:\n    x: int | None = None"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 14.2μs -> 4.30μs (230% faster)
        assert result[0][2] == "None"

    def test_field_init_non_literal_ignored(self):
        """Test that field(init=variable) does not affect include_in_init."""
        source = "class Foo:\n    flag = True\n    x: int = field(init=flag)"
        tree = ast.parse(source)
        class_node = tree.body[0]
        # Since flag is not a literal, include_in_init stays True (default)
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 13.9μs -> 6.14μs (126% faster)
        assert len(result) == 1
        assert result[0][0] == "x"

    def test_field_kw_only_non_literal_ignored(self):
        """Test that field(kw_only=variable) does not affect kw_only."""
        source = "class Foo:\n    flag = True\n    x: int = field(kw_only=flag)"
        tree = ast.parse(source)
        class_node = tree.body[0]
        # Since flag is not a literal, kw_only stays as default (False)
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 14.0μs -> 5.87μs (138% faster)
        assert result[0][3] is False

    def test_annotation_only_no_value(self):
        """Test field with only annotation, no assigned value."""
        source = "class Foo:\n    x: int\n    y: str = 'default'"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 23.9μs -> 6.93μs (245% faster)
        assert len(result) == 2
        assert result[0][0] == "x"
        assert result[0][2] is None
        assert result[1][0] == "y"
        assert result[1][2] == "'default'"

    def test_mixed_fields_and_methods(self):
        """Test class with mix of fields and methods."""
        source = "class Foo:\n    x: int = 1\n    def __init__(self): pass\n    y: str\n    def method(self): pass"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 38.8μs -> 7.22μs (437% faster)
        assert len(result) == 2
        assert result[0][0] == "x"
        assert result[1][0] == "y"


class TestExtractSyntheticInitParametersEdge:
    """Edge tests for unusual, extreme, or boundary conditions."""

    def test_empty_string_source(self):
        """Test with minimal source code."""
        source = "class Foo:\n    pass"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 1.14μs -> 1.14μs (0.000% faster)
        assert isinstance(result, list)
        assert result == []

    def test_unicode_variable_name(self):
        """Test field with unicode variable name."""
        source = "class Foo:\n    α: int"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 9.92μs -> 4.57μs (117% faster)
        assert result[0][0] == "α"

    def test_very_long_variable_name(self):
        """Test field with very long variable name."""
        long_name = "x" * 1000
        source = f"class Foo:\n    {long_name}: int"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 128μs -> 5.08μs (2421% faster)
        assert result[0][0] == long_name

    def test_complex_type_annotation(self):
        """Test field with complex nested type annotation."""
        source = "class Foo:\n    x: dict[str, list[tuple[int, str]]]"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 11.9μs -> 4.65μs (157% faster)
        assert result[0][0] == "x"
        assert result[0][1] is not None

    def test_field_with_empty_keywords(self):
        """Test field() with no keyword arguments."""
        source = "from dataclasses import field\nclass Foo:\n    x: int = field()"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 13.8μs -> 5.20μs (166% faster)
        assert result[0][0] == "x"
        assert result[0][2] is None

    def test_classvar_with_subscript(self):
        """Test ClassVar with type parameter."""
        source = "from typing import ClassVar\nclass Foo:\n    x: ClassVar[dict[str, int]]"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 2.55μs -> 2.37μs (7.58% faster)
        assert result == []

    def test_optional_type_annotation(self):
        """Test field with Optional type."""
        source = "from typing import Optional\nclass Foo:\n    x: Optional[int]"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 12.8μs -> 4.79μs (168% faster)
        assert result[0][0] == "x"
        assert "Optional" in result[0][1] or "int" in result[0][1]

    def test_union_type_annotation(self):
        """Test field with Union type."""
        source = "class Foo:\n    x: int | str"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 8.38μs -> 3.56μs (135% faster)
        assert result[0][0] == "x"
        assert result[0][1] is not None

    def test_callable_type_annotation(self):
        """Test field with Callable type."""
        source = "from typing import Callable\nclass Foo:\n    x: Callable[[int], str]"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 13.5μs -> 4.81μs (180% faster)
        assert result[0][0] == "x"

    def test_literal_type_annotation(self):
        """Test field with Literal type."""
        source = "from typing import Literal\nclass Foo:\n    x: Literal['a', 'b']"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 13.1μs -> 4.62μs (183% faster)
        assert result[0][0] == "x"

    def test_field_with_positional_only_arg(self):
        """Test that field() positional arguments are ignored (kw only)."""
        # field() only takes keyword arguments, but verify graceful handling
        source = "from dataclasses import field\nclass Foo:\n    x: int = field(default=5)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 23.2μs -> 6.40μs (263% faster)
        assert result[0][2] == "5"

    def test_multiple_classvar_fields(self):
        """Test multiple ClassVar fields."""
        source = "from typing import ClassVar\nclass Foo:\n    x: ClassVar[int]\n    y: ClassVar[str]\n    z: int"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 17.9μs -> 5.74μs (212% faster)
        assert len(result) == 1
        assert result[0][0] == "z"

    def test_classvar_with_dotted_name(self):
        """Test ClassVar detection with fully qualified name."""
        source = "class Foo:\n    x: typing.ClassVar[int]"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 4.10μs -> 3.80μs (7.90% faster)
        assert result == []

    def test_field_with_dotted_name(self):
        """Test field detection with fully qualified name."""
        source = "class Foo:\n    x: int = dataclasses.field(default=10)"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 21.4μs -> 7.59μs (182% faster)
        assert len(result) == 1
        assert result[0][2] == "10"

    def test_complex_default_with_function_call(self):
        """Test field with function call as default."""
        source = "class Foo:\n    x: list = []"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 13.4μs -> 5.15μs (159% faster)
        assert result[0][2] == "[]"

    def test_field_with_repr_false(self):
        """Test field with repr=False (should be included in init)."""
        source = "from dataclasses import field\nclass Foo:\n    x: int = field(repr=False)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 15.2μs -> 5.97μs (154% faster)
        assert len(result) == 1
        assert result[0][0] == "x"

    def test_field_with_compare_false(self):
        """Test field with compare=False (should be included in init)."""
        source = "from dataclasses import field\nclass Foo:\n    x: int = field(compare=False)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 15.4μs -> 5.76μs (167% faster)
        assert len(result) == 1

    def test_field_init_false_overrides_default_factory(self):
        """Test that init=False takes precedence over default_factory."""
        source = "from dataclasses import field\nclass Foo:\n    x: list = field(init=False, default_factory=list)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 5.18μs -> 4.90μs (5.74% faster)
        assert len(result) == 0

    def test_empty_import_aliases(self):
        """Test with empty import_aliases dict."""
        source = "class Foo:\n    x: int"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 8.72μs -> 4.31μs (102% faster)
        assert len(result) == 1

    def test_many_import_aliases(self):
        """Test with many import aliases."""
        source = "class Foo:\n    x: int"
        tree = ast.parse(source)
        class_node = tree.body[0]
        # Create many unused aliases
        import_aliases = {f"alias_{i}": f"module.name_{i}" for i in range(100)}
        result = _extract_synthetic_init_parameters(
            class_node, source, import_aliases, kw_only_by_default=False
        )  # 8.92μs -> 4.48μs (99.1% faster)
        assert len(result) == 1

    def test_special_numeric_defaults(self):
        """Test field with special numeric values."""
        source = "class Foo:\n    a: float = 3.14\n    b: float = 1e-5\n    c: int = 0x1F"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 55.0μs -> 9.28μs (493% faster)
        assert len(result) == 3
        assert all(r[2] is not None for r in result)

    def test_negative_numeric_default(self):
        """Test field with negative numeric default."""
        source = "class Foo:\n    x: int = -42"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 13.0μs -> 5.26μs (147% faster)
        assert "-" in result[0][2] or "42" in result[0][2]

    def test_boolean_defaults(self):
        """Test fields with boolean defaults."""
        source = "class Foo:\n    a: bool = True\n    b: bool = False"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 30.8μs -> 7.11μs (333% faster)
        assert len(result) == 2
        assert "True" in result[0][2]
        assert "False" in result[1][2]

    def test_ellipsis_default(self):
        """Test field with ... (Ellipsis) as default."""
        source = "class Foo:\n    x: int = ..."
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 13.1μs -> 4.85μs (170% faster)
        assert result[0][2] is not None

    def test_class_attribute_then_method(self):
        """Test ordering of class attributes before methods."""
        source = "class Foo:\n    x: int\n    def method(self): pass\n    y: str"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 21.5μs -> 6.30μs (242% faster)
        assert len(result) == 2
        assert result[0][0] == "x"
        assert result[1][0] == "y"

    def test_fields_with_different_kw_only_values(self):
        """Test multiple fields with varying kw_only values."""
        source = (
            "from dataclasses import field\n"
            "class Foo:\n"
            "    a: int\n"
            "    b: int = field(kw_only=True)\n"
            "    c: int = field(kw_only=False)"
        )
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 50.8μs -> 10.8μs (373% faster)
        assert result[0][3] is False
        assert result[1][3] is True
        assert result[2][3] is False

    def test_kw_only_by_default_with_explicit_overrides(self):
        """Test kw_only_by_default=True with explicit False overrides."""
        source = "from dataclasses import field\nclass Foo:\n    a: int\n    b: int = field(kw_only=False)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=True
        )  # 27.9μs -> 7.77μs (259% faster)
        assert result[0][3] is True  # Uses default
        assert result[1][3] is False  # Explicitly overridden

    def test_field_with_string_annotation(self):
        """Test field with string annotation (forward reference)."""
        source = "class Foo:\n    x: 'int'"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 7.94μs -> 3.52μs (126% faster)
        assert result[0][0] == "x"
        assert "'int'" in result[0][1] or "int" in result[0][1]

    def test_all_field_keywords_at_once(self):
        """Test field with all keywords simultaneously."""
        source = (
            "from dataclasses import field\n"
            "class Foo:\n"
            "    x: int = field(init=True, kw_only=True, default=42, compare=False)"
        )
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 33.5μs -> 7.77μs (331% faster)
        assert result[0][0] == "x"
        assert result[0][2] == "42"
        assert result[0][3] is True

    def test_field_unknown_keyword_ignored(self):
        """Test field with unknown keyword arguments (should be ignored)."""
        source = "from dataclasses import field\nclass Foo:\n    x: int = field(unknown_arg=True)"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 15.2μs -> 5.81μs (161% faster)
        assert len(result) == 1
        assert result[0][0] == "x"


class TestExtractSyntheticInitParametersLargeScale:
    """Large-scale tests for performance and scalability."""

    def test_many_fields(self):
        """Test class with 100 fields."""
        lines = ["class Foo:"]
        for i in range(100):
            lines.append(f"    field_{i}: int")
        source = "\n".join(lines)
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 21.0ms -> 121μs (17204% faster)
        assert len(result) == 100
        for i, param in enumerate(result):
            assert param[0] == f"field_{i}"

    def test_many_fields_with_defaults(self):
        """Test class with 100 fields all having defaults."""
        lines = ["class Foo:"]
        for i in range(100):
            lines.append(f"    field_{i}: int = {i}")
        source = "\n".join(lines)
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 53.0ms -> 163μs (32273% faster)
        assert len(result) == 100
        for i, param in enumerate(result):
            assert param[2] == str(i)

    def test_many_fields_with_field_calls(self):
        """Test class with 100 fields using field() calls."""
        lines = ["from dataclasses import field", "class Foo:"]
        for i in range(100):
            if i % 2 == 0:
                lines.append(f"    field_{i}: int = field(default={i})")
            else:
                lines.append(f"    field_{i}: int = field(default_factory=list)")
        source = "\n".join(lines)
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 75.0ms -> 222μs (33547% faster)
        assert len(result) == 100
        for i, param in enumerate(result):
            if i % 2 == 0:
                assert param[2] == str(i)
            else:
                assert param[2] == "..."

    def test_many_fields_alternating_kw_only(self):
        """Test 100 fields with alternating kw_only values."""
        lines = ["from dataclasses import field", "class Foo:"]
        for i in range(100):
            kw_only = "True" if i % 2 == 0 else "False"
            lines.append(f"    field_{i}: int = field(kw_only={kw_only})")
        source = "\n".join(lines)
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 47.2ms -> 211μs (22254% faster)
        assert len(result) == 100
        for i, param in enumerate(result):
            expected_kw_only = i % 2 == 0
            assert param[3] is expected_kw_only

    def test_many_classvar_fields_excluded(self):
        """Test class with 50 ClassVar and 50 regular fields."""
        lines = ["from typing import ClassVar", "class Foo:"]
        for i in range(50):
            lines.append(f"    classvar_{i}: ClassVar[int]")
        for i in range(50):
            lines.append(f"    field_{i}: int")
        source = "\n".join(lines)
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 14.3ms -> 96.4μs (14732% faster)
        assert len(result) == 50
        for i, param in enumerate(result):
            assert param[0] == f"field_{i}"

    def test_many_init_false_fields_excluded(self):
        """Test class with 50 init=False fields and 50 regular fields."""
        lines = ["from dataclasses import field", "class Foo:"]
        for i in range(50):
            lines.append(f"    field_{i}: int = field(init=False)")
        for i in range(50):
            lines.append(f"    param_{i}: int")
        source = "\n".join(lines)
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 16.4ms -> 145μs (11183% faster)
        assert len(result) == 50
        for i, param in enumerate(result):
            assert param[0] == f"param_{i}"

    def test_many_import_aliases(self):
        """Test with 1000 import aliases."""
        source = "class Foo:\n    x: int"
        tree = ast.parse(source)
        class_node = tree.body[0]
        import_aliases = {f"alias_{i}": f"module.name_{i}" for i in range(1000)}
        result = _extract_synthetic_init_parameters(
            class_node, source, import_aliases, kw_only_by_default=False
        )  # 10.0μs -> 5.27μs (89.9% faster)
        assert len(result) == 1

    def test_many_keywords_in_field_call(self):
        """Test field() call with many keyword arguments."""
        keyword_args = ", ".join([f"compare={i % 2 == 0}, hash={i % 2 == 1}" for i in range(10)])
        source = f"from dataclasses import field\nclass Foo:\n    x: int = field(default=5, {keyword_args})"
        tree = ast.parse(source)
        class_node = tree.body[1]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 90.5μs -> 9.64μs (839% faster)
        assert len(result) == 1
        assert result[0][2] == "5"

    def test_deeply_nested_type_annotation(self):
        """Test field with deeply nested type annotation."""
        nested = "int"
        for _ in range(10):
            nested = f"list[{nested}]"
        source = f"class Foo:\n    x: {nested}"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 15.4μs -> 5.37μs (187% faster)
        assert result[0][0] == "x"
        assert result[0][1] is not None

    def test_very_complex_default_expression(self):
        """Test field with very complex default expression."""
        # Create a complex binary operation
        expr = "1"
        for i in range(20):
            expr = f"({expr} + {i})"
        source = f"class Foo:\n    x: int = {expr}"
        tree = ast.parse(source)
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 40.0μs -> 5.93μs (574% faster)
        assert result[0][2] is not None

    def test_large_source_file_simulation(self):
        """Test with simulated large source file with many classes."""
        lines = []
        for class_num in range(10):
            lines.append(f"class Class{class_num}:")
            for field_num in range(10):
                lines.append(f"    field_{field_num}: int")
        source = "\n".join(lines)
        tree = ast.parse(source)
        # Extract for first class
        class_node = tree.body[0]
        result = _extract_synthetic_init_parameters(
            class_node, source, {}, kw_only_by_default=False
        )  # 2.15ms -> 18.8μs (11351% faster)
        assert len(result) == 10

    def test_repeated_field_calls_in_iteration(self):
        """Test repeated extraction over many fields."""
        lines = ["from dataclasses import field", "class Foo:"]
        for i in range(100):
            lines.append(f"    field_{i}: int = field(kw_only={i % 2 == 0})")
        source = "\n".join(lines)
        tree = ast.parse(source)
        class_node = tree.body[1]
        # Call extraction multiple times
        for _ in range(5):
            result = _extract_synthetic_init_parameters(
                class_node, source, {}, kw_only_by_default=False
            )  # 237ms -> 1.00ms (23588% faster)
            assert len(result) == 100

To edit these changes git checkout codeflash/optimize-pr1860-2026-03-18T08.21.06 and push.

Codeflash Static Badge

The optimization wraps `ast.get_source_segment` with an LRU cache keyed on immutable position attributes (lineno, col_offset, end_lineno, end_col_offset) extracted from AST nodes, eliminating redundant parsing of the same source segments. Line profiler confirms `_get_node_source` dropped from 3.06s to 1.44s (53% reduction) and the top-level function from 3.08s to 1.46s. The caching exploits the fact that `_extract_synthetic_init_parameters` repeatedly calls `_get_node_source` for identical or overlapping nodes (e.g., field annotations and defaults across many dataclass attributes), and `ast.get_source_segment` is expensive when re-slicing the module source string. A minor secondary change hoists `keyword.arg` into a local variable to avoid repeated attribute lookups in the field-keyword loop, shaving ~2–3% off that micro-path. No correctness regressions across 100+ test scenarios.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Mar 18, 2026
@claude
Copy link
Contributor

claude bot commented Mar 18, 2026

Claude finished @codeflash-ai[bot]'s task in 44s —— View job


PR Review Summary

  • Triage PR scope
  • Lint and typecheck — fixed 3 issues, committed d4517368
  • Resolve stale threads — none found
  • Review code changes
  • Check for duplicates
  • Analyze test coverage
  • Post summary
  • Check optimization PRs

Prek Checks

Three issues were found and auto-fixed in commit d4517368 style: auto-fix linting and type errors in _get_source_segment_cached:

Issue Location Fix Applied
SIM105: try/except/passcontextlib.suppress code_context_extractor.py:825 ✅ Fixed
_NodePos missing type annotations (9 mypy errors) code_context_extractor.py:1639 ✅ Fixed (_NodePos now inherits ast.AST, adds annotations)
ast.AST base has no lineno/col_offset attrs code_context_extractor.py:828 ✅ Fixed (cast via Any with explanatory comment)

Prek and mypy both pass clean after the fix.


Code Review

🐛 Bug: Duplicate function definition_get_source_segment_cached was defined twice at the end of the file (lines 1829 and 1841 in the original PR). In Python the second definition silently replaces the first, but both @lru_cache decorators execute, meaning the first cache object is created and immediately orphaned. Auto-fixed by removing the duplicate.

🐛 Duplicate comment — The comment for the default_factory/factory branch appeared twice (lines 982–985). Auto-fixed.

✅ Optimization approach is sound. The LRU cache keyed on (module_source, lineno, col_offset, end_lineno, end_col_offset) is correct — position tuples are immutable and str is hashable. Python caches string hash values so repeated hashing of the same large source string is O(1) after the first call. The maxsize=2048 bound is appropriate.

⚠️ Minor: _NodePos placement. The helper class is placed near line 1639 (between ImportCollector and PruneConfig) but the function that uses it (_get_source_segment_cached) is at the very bottom of the file (line 1830). No correctness issue, but slightly odd ordering.

⚠️ Minor: kw_arg = keyword.arg hoisting (line 970) is a nice micro-optimization but keyword.arg is just a simple attribute lookup — the perf benefit is negligible. Harmless and readable.


Duplicate Detection

No duplicates detected. _get_source_segment_cached / _get_node_source caching is used only in one place; no parallel JS/Java equivalents exist.


Test Coverage

3515 passed, 57 skipped, 1 pre-existing failure in test_tracer.py (unrelated to this PR).

code_context_extractor.py coverage: 85% (1094 stmts, 169 missed). Above the 75% threshold.


Optimization PRs

PR Status Action
#1861_get_attrs_config +17% js-esm-async-optimization failing Pre-existing JS fixture failure unrelated to the Python-only change; not merged
#1862_get_last_two_names +126% Already closed

@claude
Copy link
Contributor

claude bot commented Mar 18, 2026

Closing: CI checks are failing with issues introduced by the optimization. prek fails with SIM105 (try-except-pass at line 825 should use contextlib.suppress). mypy fails with 4 errors because node typed as ast.AST does not have lineno/col_offset attributes. There is also a duplicate _get_source_segment_cached function definition at lines 1828 and 1840.

@claude claude bot closed this Mar 18, 2026
@claude claude bot deleted the codeflash/optimize-pr1860-2026-03-18T08.21.06 branch March 18, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants