Skip to content

Conversation

mohammedahmed18
Copy link
Contributor

@mohammedahmed18 mohammedahmed18 commented Aug 23, 2025

Since we revert helpers by applying the cached original file code back to the file, any existing global assignments in the cached code would get reapplied like here https://github.com/mohammedahmed18/unstructured/pull/1/files.

The fix was to skip adding global assignments during the revert process.

I don't think this is the best way for reverting helpers, so this is a temporary fix, until we have a better way for doing so

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Typo in comment

The inline comment spelling "becuase" should be corrected to "because" for clarity.

# adding the global assignments before replacing the code, not after
# becuase of an "edge case" where the optimized code intoduced a new import and a global assignment using that import
# and that import wasn't used before, so it was ignored when calling AddImportsVisitor.add_needed_import inside replace_functions_and_add_imports (because the global assignment wasn't added yet)
# this was added at https://github.com/codeflash-ai/codeflash/pull/448
add_global_assignments(code_to_apply, source_code) if not global_assignments_added_before else source_code,
Flag propagation

Verify that setting global_assignments_added_before=True here always prevents duplicate assignments when reverting helpers.

global_assignments_added_before=True,  # since we revert helpers functions after applying the optimization, we know that the file already has global assignments added, otherwise they would be added twice.
Signature change

Confirm that all call sites are updated to pass a dict[Path, str] for original_helper_code instead of a single string.

    self,
    code_context: CodeOptimizationContext,
    optimized_code: CodeStringsMarkdown,
    original_helper_code: dict[Path, str],
) -> bool:

Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Preserve optimized code when skipping assignments

When global_assignments_added_before is true, this branch reverts to source_code and
drops the optimized changes. Use code_to_apply instead to preserve the optimized
code path when skipping global assignments.

codeflash/code_utils/code_replacer.py [425]

-add_global_assignments(code_to_apply, source_code) if not global_assignments_added_before else source_code,
+add_global_assignments(code_to_apply, source_code) if not global_assignments_added_before else code_to_apply,
Suggestion importance[1-10]: 9

__

Why: This fixes a bug where skipping global assignments reverts to source_code, losing the optimized changes in code_to_apply.

High

mohammedahmed18 and others added 4 commits August 23, 2025 20:17
…licate-global-assignments-when-reverting-helpers
…/duplicate-global-assignments-when-reverting-helpers`)

The optimized code achieves a **17% speedup** by eliminating redundant CST parsing operations, which are the most expensive parts of the function according to the line profiler.

**Key optimizations:**

1. **Eliminate duplicate parsing**: The original code parsed `src_module_code` and `dst_module_code` multiple times. The optimized version introduces `_extract_global_statements_once()` that parses each module only once and reuses the parsed CST objects throughout the function.

2. **Reuse parsed modules**: Instead of re-parsing `dst_module_code` after modifications, the optimized version conditionally reuses the already-parsed `dst_module` when no global statements need insertion, avoiding unnecessary `cst.parse_module()` calls.

3. **Early termination**: Added an early return when `new_collector.assignments` is empty, avoiding the expensive `GlobalAssignmentTransformer` creation and visitation when there's nothing to transform.

4. **Minor optimization in uniqueness check**: Added a fast-path identity check (`stmt is existing_stmt`) before the expensive `deep_equals()` comparison, though this has minimal impact.

**Performance impact by test case type:**
- **Empty/minimal cases**: Show the highest gains (59-88% faster) due to early termination optimizations
- **Standard cases**: Achieve consistent 20-30% improvements from reduced parsing
- **Large-scale tests**: Benefit significantly (18-23% faster) as parsing overhead scales with code size

The optimization is most effective for workloads with moderate to large code files where CST parsing dominates the runtime, as evidenced by the original profiler showing 70%+ of time spent in `cst.parse_module()` and `module.visit()` operations.
Copy link
Contributor

codeflash-ai bot commented Aug 25, 2025

⚡️ Codeflash found optimizations for this PR

📄 18% (0.18x) speedup for add_global_assignments in codeflash/code_utils/code_extractor.py

⏱️ Runtime : 1.23 seconds 1.05 seconds (best of 9 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch fix/duplicate-global-assignments-when-reverting-helpers).

mohammedahmed18 and others added 3 commits August 26, 2025 03:46
…25-08-25T18.50.33

⚡️ Speed up function `add_global_assignments` by 18% in PR #683 (`fix/duplicate-global-assignments-when-reverting-helpers`)
Copy link
Contributor

codeflash-ai bot commented Aug 30, 2025

@@ -1783,7 +1782,6 @@ def new_function2(value):
"""
expected_code = """import numpy as np

print("Hello world")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are only adding the unique global statements, since the exact print statement was in both original and optimized code, we should get only one statement in the final code not two

@@ -3453,3 +3447,157 @@ def hydrate_input_text_actions_with_field_names(
main_file.unlink(missing_ok=True)

assert new_code == expected

def test_duplicate_global_assignments_when_reverting_helpers():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the unstructured bug: mohammedahmed18/unstructured#1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant