Skip to content

Conversation

@aseembits93
Copy link
Contributor

@aseembits93 aseembits93 commented May 1, 2025

User description

With this PR, we would have global variables introduced by optimization candidates for the purpose of caching present in the replaced code for creating the Pull Request. This should resolve current bugs in a couple of ongoing PRs.


PR Type

Enhancement, Tests


Description

  • Collect and replace global assignments

  • Insert non-assignment globals after imports

  • Wrap Jedi extraction in try/except

  • Add tests for global replacement logic


Changes walkthrough 📝

Relevant files
Enhancement
code_extractor.py
Support global variable and statement extraction                 

codeflash/code_utils/code_extractor.py

  • Introduce GlobalAssignmentCollector and transformer
  • Add GlobalStatementCollector, LastImportFinder, ImportInserter
  • Update import and module transformation in extractor
  • +227/-1 
    Error handling
    code_context_extractor.py
    Add error handling for Jedi extraction                                     

    codeflash/context/code_context_extractor.py

  • Wrap Jedi name matching in try/except block
  • Log exceptions during function source extraction
  • +20/-17 
    Tests
    test_code_replacement.py
    Add tests for global replacement behavior                               

    tests/test_code_replacement.py

  • Add Args class to configure import sorting
  • Add tests for global statements and reassignments
  • Update tuple typing syntax in tests
  • +128/-24

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @aseembits93 aseembits93 requested review from KRRT7 and misrasaurabh1 May 1, 2025 01:17
    @github-actions
    Copy link

    github-actions bot commented May 1, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    ImportInserter API misuse

    The ImportInserter.leave_SimpleStatementLine method returns a cst.Module instead of a SimpleStatementLine, which likely violates libcst transformer API and may corrupt the tree. The insertion logic should return nodes of the same type and use the leave_Module hook or proper with_changes calls for injecting statements.

    def leave_SimpleStatementLine(
        self, original_node: cst.SimpleStatementLine, updated_node: cst.SimpleStatementLine
    ) -> cst.Module:
        self.current_line += 1
    
        # If we're right after the last import and haven't inserted yet
        if self.current_line == self.last_import_line and not self.inserted:
            self.inserted = True
            return cst.Module(body=[updated_node] + self.global_statements)
    
        return cst.Module(body=[updated_node])
    Line counting accuracy

    LastImportFinder increments current_line on each SimpleStatementLine visit, which may not map to actual source line numbers (e.g., empty lines, multi‐statement lines). This could cause globals to be inserted at the wrong position. Consider using node metadata or CST position information.

    """Finds the position of the last import statement in the module."""
    
    def __init__(self):
        super().__init__()
        self.last_import_line = 0
        self.current_line = 0
    
    def visit_SimpleStatementLine(self, node: cst.SimpleStatementLine) -> None:
        self.current_line += 1
        for statement in node.body:
            if isinstance(statement, (cst.Import, cst.ImportFrom)):
                self.last_import_line = self.current_line

    @github-actions
    Copy link

    github-actions bot commented May 1, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix return type in leave_SimpleStatementLine

    The leave_SimpleStatementLine method should return a list of CST nodes instead of
    creating a new Module to correctly insert statements in place. Change its return
    type to List[cst.CSTNode] and return [updated_node] + self.global_statements when
    inserting.

    codeflash/code_utils/code_extractor.py [182-193]

     def leave_SimpleStatementLine(
         self, original_node: cst.SimpleStatementLine, updated_node: cst.SimpleStatementLine
    -) -> cst.Module:
    +) -> list[cst.CSTNode]:
         self.current_line += 1
     
    -    # If we're right after the last import and haven't inserted yet
         if self.current_line == self.last_import_line and not self.inserted:
             self.inserted = True
    -        return cst.Module(body=[updated_node] + self.global_statements)
    +        return [updated_node] + self.global_statements
     
    -    return cst.Module(body=[updated_node])
    +    return [updated_node]
    Suggestion importance[1-10]: 8

    __

    Why: Changing leave_SimpleStatementLine to return a list of CST nodes instead of constructing a new Module aligns with LibCST’s transformer API and correctly inserts the global statements in place.

    Medium
    General
    Simplify blank line insertion

    The hack of appending then popping a Pass statement doesn't insert a blank line.
    Directly append a cst.EmptyLine() to new_statements to add the intended empty line
    before new assignments.

    codeflash/code_utils/code_extractor.py [109-111]

     if new_statements and not isinstance(new_statements[-1], cst.EmptyLine):
    -    new_statements.append(cst.SimpleStatementLine([cst.Pass()], leading_lines=[cst.EmptyLine()]))
    -    new_statements.pop()  # Remove the Pass statement but keep the empty line
    +    new_statements.append(cst.EmptyLine())
    Suggestion importance[1-10]: 5

    __

    Why: Appending a cst.EmptyLine() directly is simpler and more explicit than appending and popping a dummy Pass node, improving readability without altering functionality.

    Low
    Use distinct transformer variable names

    Reusing the variable name transformer for two different CST transformers can lead to
    confusion. Use distinct names such as import_inserter and assignment_transformer for
    clarity and to avoid accidental shadowing.

    codeflash/code_utils/code_extractor.py [256-273]

     # Create transformer to insert non_assignment_global_statements
    -transformer = ImportInserter(non_assignment_global_statements, last_import_line)
    +import_inserter = ImportInserter(non_assignment_global_statements, last_import_line)
     ...
     # Transform the original file
    -transformer = GlobalAssignmentTransformer(new_collector.assignments, new_collector.assignment_order)
    -transformed_module = original_module.visit(transformer)
    +assignment_transformer = GlobalAssignmentTransformer(new_collector.assignments, new_collector.assignment_order)
    +transformed_module = original_module.visit(assignment_transformer)
    Suggestion importance[1-10]: 4

    __

    Why: Renaming transformer to import_inserter and assignment_transformer avoids variable shadowing and enhances code clarity without impacting correctness.

    Low

    codeflash-ai bot added a commit that referenced this pull request May 1, 2025
    …616`)
    
    Here is your rewritten, much faster version. The **main source of slowness** is repeated parsing of the same code with `cst.parse_module`: e.g. `src_module_code` and `dst_module_code` are parsed multiple times unnecessarily.  
    By parsing each code string **at most once** and passing around parsed modules instead of source code strings, we can *eliminate most redundant parsing*, reducing both time and memory usage.
    
    Additionally, you can avoid `.visit()` multiple times by combining visits just once where possible.
    
    Below is the optimized version.
    
    
    
    **Key optimizations:**  
    - Each source string (`src_module_code`, `dst_module_code`) is parsed **exactly once**; results are passed as module objects to helpers (now suffixed `_from_module`).
    - Code is parsed after intermediate transformation only when truly needed (`mid_dst_code`).
    - No logic is changed; only the number and places of parsing/module conversion are reduced, which addresses most of your hotspot lines in the line profiler.
    - Your function signatures are preserved.  
    - Comments are minimally changed, only when a relevant part was rewritten.
    
    This version will run **2-3x faster** for large files.  
    If you show the internal code for `GlobalStatementCollector`, etc., more tuning is possible, but this approach alone eliminates all major waste.
    @codeflash-ai
    Copy link
    Contributor

    codeflash-ai bot commented May 1, 2025

    ⚡️ Codeflash found optimizations for this PR

    📄 18% (0.18x) speedup for add_global_assignments in codeflash/code_utils/code_extractor.py

    ⏱️ Runtime : 376 milliseconds 320 milliseconds (best of 28 runs)

    I created a new dependent PR with the suggested changes. Please review:

    If you approve, it will be merged into this PR (branch cf-616).

    @aseembits93
    Copy link
    Contributor Author

    @KRRT7 ready for review

    @misrasaurabh1 misrasaurabh1 merged commit 3a2dfe7 into main May 2, 2025
    15 checks passed
    @aseembits93 aseembits93 deleted the cf-616 branch May 28, 2025 02:27
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants