Skip to content

Conversation

@mohammedahmed18
Copy link
Contributor

@mohammedahmed18 mohammedahmed18 commented Sep 25, 2025

The issue was that it could accidentally select a different function with the same name (but in another class). In that case, all helpers most-likely would appear unused—since it's another function, causing the optimization to end up empty if it's meant for helpers.

this happened when optimizing LSPMarkdownMessage.serialize, it did find optimization for helpers but reverted it since it picked LSPMessage.serialize as the target function instead

@mohammedahmed18 mohammedahmed18 requested a review from a team September 25, 2025 14:16
@mohammedahmed18 mohammedahmed18 added the bug Something isn't working label Sep 25, 2025
@github-actions
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The new helper find_target_node only traverses ClassDef nodes for parents and matches by name, ignoring the type field in FunctionParent and not handling nested scopes like functions within functions or other node types that might appear in parents. This could misidentify nodes or fail in cases with non-class parents or duplicate class names in different modules.

def find_target_node(root: ast.AST, function_to_optimize: FunctionToOptimize) -> Optional[ast.FunctionDef]:
    def _find(node: ast.AST, parents: list[FunctionParent]) -> Optional[ast.FunctionDef]:
        if not parents:
            for child in getattr(node, "body", []):
                if isinstance(child, ast.FunctionDef) and child.name == function_to_optimize.function_name:
                    return child
            return None

        parent = parents[0]
        for child in getattr(node, "body", []):
            if isinstance(child, ast.ClassDef) and child.name == parent.name:
                return _find(child, parents[1:])
        return None

    return _find(root, function_to_optimize.parents)
Robustness

find_target_node assumes parents are ordered from outermost to innermost and that each parent exists directly under the previous node’s body. It does not guard against missing attributes or different structures (e.g., decorators, async defs, nested classes with same name). Consider handling AsyncFunctionDef, verifying parent types, or providing clearer fallbacks/logging.

def find_target_node(root: ast.AST, function_to_optimize: FunctionToOptimize) -> Optional[ast.FunctionDef]:
    def _find(node: ast.AST, parents: list[FunctionParent]) -> Optional[ast.FunctionDef]:
        if not parents:
            for child in getattr(node, "body", []):
                if isinstance(child, ast.FunctionDef) and child.name == function_to_optimize.function_name:
                    return child
            return None

        parent = parents[0]
        for child in getattr(node, "body", []):
            if isinstance(child, ast.ClassDef) and child.name == parent.name:
                return _find(child, parents[1:])
        return None

    return _find(root, function_to_optimize.parents)
Test Isolation

The new test writes files and imports modules but doesn’t adjust sys.path or ensure isolation from ambient environment. If the same module names exist in the environment, resolution might differ. Ensure the optimizer and detection use the temp project root deterministically.

def test_unused_helper_detection_with_duplicated_function_name_in_different_classes():
    """Test detection when helpers are called via module.function style."""
    temp_dir = Path(tempfile.mkdtemp())

    try:
        # Main file
        main_file = temp_dir / "main.py"
        main_file.write_text("""from __future__ import annotations
import json
from helpers import replace_quotes_with_backticks, simplify_worktree_paths
from dataclasses import asdict, dataclass

@dataclass
class LspMessage:

    def serialize(self) -> str:
        data = self._loop_through(asdict(self))
        # Important: keep type as the first key, for making it easy and fast for the client to know if this is a lsp message before parsing it
        ordered = {"type": self.type(), **data}
        return (
            message_delimiter
            + json.dumps(ordered)
            + message_delimiter
        )


@dataclass
class LspMarkdownMessage(LspMessage):

    def serialize(self) -> str:
        self.markdown = simplify_worktree_paths(self.markdown)
        self.markdown = replace_quotes_with_backticks(self.markdown)
        return super().serialize()
""")

        # Helpers file
        helpers_file = temp_dir / "helpers.py"
        helpers_file.write_text("""def simplify_worktree_paths(msg: str, highlight: bool = True) -> str:  # noqa: FBT001, FBT002
    path_in_msg = worktree_path_regex.search(msg)
    if path_in_msg:
        last_part_of_path = path_in_msg.group(0).split("/")[-1]
        if highlight:
            last_part_of_path = f"`{last_part_of_path}`"
        return msg.replace(path_in_msg.group(0), last_part_of_path)
    return msg


def replace_quotes_with_backticks(text: str) -> str:
    # double-quoted strings
    text = _double_quote_pat.sub(r"`\1`", text)
    # single-quoted strings
    return _single_quote_pat.sub(r"`\1`", text)
""")

        # Optimized version that only uses add_numbers
        optimized_code = """
```python:main.py
from __future__ import annotations

import json
from dataclasses import asdict, dataclass

from codeflash.lsp.helpers import (replace_quotes_with_backticks,
                                   simplify_worktree_paths)


@dataclass
class LspMessage:

    def serialize(self) -> str:
        # Use local variable to minimize lookup costs and avoid unnecessary dictionary unpacking
        data = self._loop_through(asdict(self))
        msg_type = self.type()
        ordered = {'type': msg_type}
        ordered.update(data)
        return (
            message_delimiter
            + json.dumps(ordered)
            + message_delimiter  # \u241F is the message delimiter becuase it can be more than one message sent over the same message, so we need something to separate each message
        )

@dataclass
class LspMarkdownMessage(LspMessage):

    def serialize(self) -> str:
        # Side effect required, must preserve for behavioral correctness
        self.markdown = simplify_worktree_paths(self.markdown)
        self.markdown = replace_quotes_with_backticks(self.markdown)
        return super().serialize()
def simplify_worktree_paths(msg: str, highlight: bool = True) -> str:  # noqa: FBT001, FBT002
    m = worktree_path_regex.search(msg)
    if m:
        # More efficient way to get last path part
        last_part_of_path = m.group(0).rpartition('/')[-1]
        if highlight:
            last_part_of_path = f"`{last_part_of_path}`"
        return msg.replace(m.group(0), last_part_of_path)
    return msg

def replace_quotes_with_backticks(text: str) -> str:
    # Efficient string substitution, reduces intermediate string allocations
    return _single_quote_pat.sub(
        r"`\1`",
        _double_quote_pat.sub(r"`\1`", text),
    )

"""

    # Create test config
    test_cfg = TestConfig(
        tests_root=temp_dir / "tests",
        tests_project_rootdir=temp_dir,
        project_root_path=temp_dir,
        test_framework="pytest",
        pytest_cmd="pytest",
    )

    # Create FunctionToOptimize instance
    function_to_optimize = FunctionToOptimize(
        file_path=main_file, function_name="serialize", qualified_name="serialize", parents=[
            FunctionParent(name="LspMarkdownMessage", type="ClassDef"),
        ]
    )

    optimizer = FunctionOptimizer(
        function_to_optimize=function_to_optimize,
        test_cfg=test_cfg,
        function_to_optimize_source_code=main_file.read_text(),
    )

    ctx_result = optimizer.get_code_optimization_context()
    assert ctx_result.is_successful(), f"Failed to get context: {ctx_result.failure()}"

    code_context = ctx_result.unwrap()

    unused_helpers = detect_unused_helper_functions(optimizer.function_to_optimize, code_context, CodeStringsMarkdown.parse_markdown_code(optimized_code))

    unused_names = {uh.qualified_name for uh in unused_helpers}
    assert len(unused_names) == 0 # no unused helpers

finally:
    # Cleanup
    import shutil

    shutil.rmtree(temp_dir, ignore_errors=True)

</details>

</td></tr>
</table>

@github-actions
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Support async and missing parents

Ensure nested scopes (e.g., functions/classes inside classes) are reliably found by
handling both ast.FunctionDef and ast.AsyncFunctionDef and by iterating over
ast.AsyncFunctionDef. Also, guard against None or empty parents on
function_to_optimize to avoid attribute errors. This improves robustness when
optimizing async methods or when parent info is missing.

codeflash/context/unused_definition_remover.py [615-631]

-def find_target_node(root: ast.AST, function_to_optimize: FunctionToOptimize) -> Optional[ast.FunctionDef]:
-    def _find(node: ast.AST, parents: list[FunctionParent]) -> Optional[ast.FunctionDef]:
+def find_target_node(root: ast.AST, function_to_optimize: FunctionToOptimize) -> Optional[ast.FunctionDef | ast.AsyncFunctionDef]:
+    def _find(node: ast.AST, parents: list[FunctionParent]) -> Optional[ast.FunctionDef | ast.AsyncFunctionDef]:
         if not parents:
             for child in getattr(node, "body", []):
-                if isinstance(child, ast.FunctionDef) and child.name == function_to_optimize.function_name:
+                if ((isinstance(child, ast.FunctionDef) or isinstance(child, ast.AsyncFunctionDef))
+                    and child.name == function_to_optimize.function_name):
                     return child
             return None
 
         parent = parents[0]
         for child in getattr(node, "body", []):
             if isinstance(child, ast.ClassDef) and child.name == parent.name:
                 return _find(child, parents[1:])
         return None
 
-    return _find(root, function_to_optimize.parents)
+    parents = getattr(function_to_optimize, "parents", None) or []
+    return _find(root, parents)
Suggestion importance[1-10]: 7

__

Why: Extending the finder to include ast.AsyncFunctionDef and guarding against missing parents improves robustness and aligns with real code patterns without changing behavior otherwise. It's accurate to the diff context, though not critical to correctness.

Medium
General
Robust function node matching

Match methods defined with decorators like @classmethod or @staticmethod by not
relying on parent type strings and only class names. Additionally, ensure dunder
methods or property methods are also found; name matching is fine, but be sure to
check for ast.AnnAssign or other nodes are not accidentally treated as
functions—restrict to function nodes but cover async. This avoids missing valid
target methods.

codeflash/context/unused_definition_remover.py [615-621]

-def find_target_node(root: ast.AST, function_to_optimize: FunctionToOptimize) -> Optional[ast.FunctionDef]:
-    def _find(node: ast.AST, parents: list[FunctionParent]) -> Optional[ast.FunctionDef]:
+def find_target_node(root: ast.AST, function_to_optimize: FunctionToOptimize) -> Optional[ast.FunctionDef | ast.AsyncFunctionDef]:
+    def _is_func(node: ast.AST) -> bool:
+        return isinstance(node, ast.FunctionDef) or isinstance(node, ast.AsyncFunctionDef)
+
+    def _find(node: ast.AST, parents: list[FunctionParent]) -> Optional[ast.FunctionDef | ast.AsyncFunctionDef]:
         if not parents:
             for child in getattr(node, "body", []):
-                if isinstance(child, ast.FunctionDef) and child.name == function_to_optimize.function_name:
+                if _is_func(child) and child.name == function_to_optimize.function_name:
                     return child
             return None
 
+        parent = parents[0]
+        for child in getattr(node, "body", []):
+            if isinstance(child, ast.ClassDef) and child.name == parent.name:
+                return _find(child, parents[1:])
+        return None
+
+    return _find(root, getattr(function_to_optimize, "parents", []) or [])
+
Suggestion importance[1-10]: 6

__

Why: This refactor cleanly abstracts function-node checks and includes async functions, which is a reasonable improvement. Impact is moderate since it mainly broadens matching without addressing a concrete bug shown in the PR.

Low
Add diagnostic guard logging

After locating the node, verify it is within the expected class hierarchy to avoid
false positives when duplicate function names exist in different classes. If parents
were provided and the node isn't found, fail fast with a debug log to aid diagnosis.

codeflash/context/unused_definition_remover.py [661-665]

 entrypoint_function_ast = find_target_node(optimized_ast, function_to_optimize)
+if not entrypoint_function_ast:
+    logger.debug(
+        f"Entrypoint not found for {function_to_optimize.function_name} with parents "
+        f"{[p.name for p in (getattr(function_to_optimize, 'parents', []) or [])]}"
+    )
+    return []
Suggestion importance[1-10]: 5

__

Why: Adding an early diagnostic log and return if the target isn't found slightly improves debuggability but duplicates existing check a few lines later; marginal benefit and low impact.

Low

mohammedahmed18 and others added 2 commits September 25, 2025 17:20
The optimized version eliminates recursive function calls by replacing the recursive `_find` helper with an iterative approach. This provides significant performance benefits:

**Key Optimizations:**

1. **Removed Recursion Overhead**: The original code used a recursive helper function `_find` that created new stack frames for each parent traversal. The optimized version uses a simple iterative loop that traverses parents sequentially without function call overhead.

2. **Eliminated Function Creation**: The original code defined the `_find` function on every call to `find_target_node`. The optimized version removes this repeated function definition entirely.

3. **Early Exit with for-else**: The optimized code uses Python's `for-else` construct to immediately return `None` when a parent class isn't found, avoiding unnecessary continued searching.

4. **Reduced Attribute Access**: By caching `function_to_optimize.function_name` in a local variable `target_name` and reusing `body` variables, the code reduces repeated attribute lookups.

**Performance Impact by Test Case:**
- **Simple cases** (top-level functions, basic class methods): 23-62% faster due to eliminated recursion overhead
- **Nested class scenarios**: 45-84% faster, with deeper nesting showing greater improvements as recursion elimination has more impact
- **Large-scale tests**: 12-22% faster, showing consistent benefits even with many nodes to traverse
- **Edge cases** (empty modules, non-existent classes): 52-76% faster due to more efficient early termination

The optimization is particularly effective for deeply nested class hierarchies where the original recursive approach created multiple stack frames, while the iterative version maintains constant memory usage regardless of nesting depth.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Sep 25, 2025

⚡️ Codeflash found optimizations for this PR

📄 18% (0.18x) speedup for find_target_node in codeflash/context/unused_definition_remover.py

⏱️ Runtime : 892 microseconds 753 microseconds (best of 101 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch fix/correctly-find-funtion-node-when-reverting-helpers).

…25-09-25T14.28.58

⚡️ Speed up function `find_target_node` by 18% in PR #763 (`fix/correctly-find-funtion-node-when-reverting-helpers`)
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Sep 25, 2025

@aseembits93
Copy link
Contributor

aseembits93 commented Sep 25, 2025

@mohammedahmed18 can you share an example (screenshot) of an instance when the old way of reverting was causing the issue? it's slightly hard to directly follow from the code, thanks!

@mohammedahmed18
Copy link
Contributor Author

@aseembits93 yeah I kept the logs, this is the optimization of LSPMarkdownMessage.serialize
output.log

INFO     2025-09-24 15:58:09,036 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log] Best candidate 🚀                                                                                                 
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 # file: codeflash/lsp/helpers.py                                                                                                                                                                                                       
   2 def simplify_worktree_paths(msg: str, highlight: bool = True) -> str:  # noqa: FBT001, FBT002                                                                                                                                          
   3     m = worktree_path_regex.search(msg)                                                                                                                                                                                                
   4     if m:                                                                                                                                                                                                                              
   5         # More efficient way to get last path part                                                                                                                                                                                     
   6         last_part_of_path = m.group(0).rpartition('/')[-1]                                                                                                                                                                             
   7         if highlight:                                                                                                                                                                                                                  
   8             last_part_of_path = f"`{last_part_of_path}`"                                                                                                                                                                               
   9         return msg.replace(m.group(0), last_part_of_path)                                                                                                                                                                              
  10     return msg                                                                                                                                                                                                                         
  11                                                                                                                                                                                                                                        
  12 def replace_quotes_with_backticks(text: str) -> str:                                                                                                                                                                                   
  13     # Efficient string substitution, reduces intermediate string allocations                                                                                                                                                           
  14     return _single_quote_pat.sub(                                                                                                                                                                                                      
  15         r"`\1`",                                                                                                                                                                                                                       
  16         _double_quote_pat.sub(r"`\1`", text),                                                                                                                                                                                          
  17     )                                                                                                                                                                                                                                  
  18 # file: codeflash/lsp/lsp_message.py                                                                                                                                                                                                   
  19 from __future__ import annotations                                                                                                                                                                                                     
  20                                                                                                                                                                                                                                        
  21 import json                                                                                                                                                                                                                            
  22 from dataclasses import asdict, dataclass                                                                                                                                                                                              
  23                                                                                                                                                                                                                                        
  24 from codeflash.lsp.helpers import (replace_quotes_with_backticks,                                                                                                                                                                      
  25                                    simplify_worktree_paths)                                                                                                                                                                            
  26                                                                                                                                                                                                                                        
  27                                                                                                                                                                                                                                        
  28 @dataclass                                                                                                                                                                                                                             
  29 class LspMessage:                                                                                                                                                                                                                      
  30                                                                                                                                                                                                                                        
  31     def serialize(self) -> str:                                                                                                                                                                                                        
  32         # Use local variable to minimize lookup costs and avoid unnecessary dictionary unpacking                                                                                                                                       
  33         data = self._loop_through(asdict(self))                                                                                                                                                                                        
  34         msg_type = self.type()                                                                                                                                                                                                         
  35         ordered = {'type': msg_type}                                                                                                                                                                                                   
  36         ordered.update(data)                                                                                                                                                                                                           
  37         return (                                                                                                                                                                                                                       
  38             message_delimiter                                                                                                                                                                                                          
  39             + json.dumps(ordered)                                                                                                                                                                                                      
  40             + message_delimiter  # \u241F is the message delimiter becuase it can be more than one message sent over the same message, so we need something to separate each message                                                   
  41         )                                                                                                                                                                                                                              
  42                                                                                                                                                                                                                                        
  43 @dataclass                                                                                                                                                                                                                             
  44 class LspMarkdownMessage(LspMessage):                                                                                                                                                                                                  
  45                                                                                                                                                                                                                                        
  46     def serialize(self) -> str:                                                                                                                                                                                                        
  47         # Side effect required, must preserve for behavioral correctness                                                                                                                                                               
  48         self.markdown = simplify_worktree_paths(self.markdown)                                                                                                                                                                         
  49         self.markdown = replace_quotes_with_backticks(self.markdown)                                                                                                                                                                   
  50         return super().serialize()                                                                                                                                                                                                     
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
DEBUG    2025-09-24 15:58:09,421 [/home/mohammed/Work/codeflash/.venv/lib/python3.12/site-packages/urllib3/connectionpool.py:544 in function _make_request] https://us.i.posthog.com:443 "POST /batch/ HTTP/1.1" 200 15                     
DEBUG    2025-09-24 15:58:10,018 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log] Could not find entrypoint function serialize in optimized code                                                    
DEBUG    2025-09-24 15:58:10,020 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log] Functions called in optimized entrypoint: {'LspMarkdownMessage._loop_through', 'LspMarkdownMessage.type', 'type', 
         'dumps', 'ordered.update', '_loop_through', 'asdict', 'update', 'json.dumps'}                                                                                                                                                      
DEBUG    2025-09-24 15:58:10,023 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log] Imported names mapping: {}                                                                                        
DEBUG    2025-09-24 15:58:10,026 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log] Helper function simplify_worktree_paths is not called in optimized code                                           
DEBUG    2025-09-24 15:58:10,029 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log]   Checked names: {'simplify_worktree_paths', 'codeflash.lsp.helpers.simplify_worktree_paths',                     
         'helpers.simplify_worktree_paths'}                                                                                                                                                                                                 
DEBUG    2025-09-24 15:58:10,031 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log] Helper function replace_quotes_with_backticks is not called in optimized code                                     
DEBUG    2025-09-24 15:58:10,034 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log]   Checked names: {'codeflash.lsp.helpers.replace_quotes_with_backticks', 'helpers.replace_quotes_with_backticks', 
         'replace_quotes_with_backticks'}                                                                                                                                                                                                   
DEBUG    2025-09-24 15:58:10,036 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log] Helper function LspMessage.serialize is not called in optimized code                                              
DEBUG    2025-09-24 15:58:10,040 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log]   Checked names: {'LspMessage.serialize', 'serialize', 'codeflash.lsp.lsp_message.LspMessage.serialize'}          
INFO     2025-09-24 15:58:10,044 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log] Reverting 3 unused helper function(s) to original definitions                                                     
DEBUG    2025-09-24 15:58:10,555 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log] Reverted unused helpers in /home/mohammed/Work/codeflash/codeflash/lsp/helpers.py: simplify_worktree_paths,       
         replace_quotes_with_backticks                                                                                                                                                                                                      
DEBUG    2025-09-24 15:58:11,211 [/home/mohammed/Work/codeflash/codeflash/lsp/lsp_logger.py:100 in function enhanced_log] Reverted unused helpers in /home/mohammed/Work/codeflash/codeflash/lsp/lsp_message.py: LspMessage.serialize       

@mohammedahmed18 mohammedahmed18 merged commit 2d886e8 into main Sep 25, 2025
17 of 20 checks passed
@KRRT7 KRRT7 deleted the fix/correctly-find-funtion-node-when-reverting-helpers branch September 25, 2025 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Review effort 2/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants