[FEAT] Multi-file context (CF-687) (CF-387) (CF-640) #553

mohammedahmed18 · 2025-07-16T15:49:15Z

User description

depends on #1705

Changes diagram

flowchart LR
  A["get_code_optimization_context()"] --> B["extract_code_markdown_context_from_files()"]
  B --> C["CodeStringsMarkdown (with splitter)"]
  C --> D["encoded_tokens_len(__str__)"]
  C --> E["find_preexisting_objects(__str__)"]

Changes walkthrough 📝

Relevant files

Enhancement

code_context_extractor.py `Switch to markdown context extraction` codeflash/context/code_context_extractor.py Replaced `extract_code_string_context_from_files` call Now uses `extract_code_markdown_context_from_files` Token length uses `.__str__` of markdown context Preexisting objects extracted via `.__str__`	+6/-5
models.py `Add markdown code strings model` codeflash/models/models.py Added `get_code_block_splitter` helper Enhanced `CodeStringsMarkdown` with cache and `__str__` Changed `read_writable_code` type to `CodeStringsMarkdown` Removed unused `Field` import	+16/-2

Tests

test_code_context_extractor.py `Update tests for markdown splitter` tests/test_code_context_extractor.py Imported `get_code_block_splitter` Updated expected context with splitter f-string Assert on `read_writable_code.__str__`	+4/-3

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

PR Type

Enhancement, Tests

Description

Introduce markdown-based code context splitting
Switch extractor to extract_code_markdown_context_from_files
Update optimizer to use CodeStringsMarkdown.flat
Add splitter markers and parse for multi-file replacement

Diagram Walkthrough

flowchart LR
  A["get_code_optimization_context()"]
  A -- "use markdown extractor" --> B["extract_code_markdown_context_from_files()"]
  B --> C["CodeStringsMarkdown"]
  C -- "flatten with .flat" --> D["encoded_tokens_len / find_preexisting_objects"]

File Walkthrough

	Relevant files

github-actions · 2025-07-16T15:50:30Z

PR Reviewer Guide 🔍

(Review updated until commit `654a6ec`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Mutable Default The default value for `code_strings` is a mutable list, which can be shared across instances. Consider using a default_factory to avoid unexpected shared state in the Pydantic model. class CodeStringsMarkdown(BaseModel): code_strings: list[CodeString] = [] cached_code: Optional[str] = None @property def flat(self) -> str: if self.cached_code is not None: return self.cached_code self.cached_code = "\n".join( get_code_block_splitter(block.file_path) + "\n" + block.code for block in self.code_strings ) return self.cached_code @property Debug Print The `code_print` call in `optimize_function` appears to be leftover debugging code and should be removed or replaced with a logger to avoid unwanted console output. code_print(code_context.read_writable_code.flat) Regex Parsing The `parse_splitter_markers` method relies on a regex that may not robustly handle all file path formats, potentially leading to missing or mis-attributed code blocks. Validate against edge cases. def parse_splitter_markers(code_with_markers: str) -> dict[str, str]: pattern = rf"{LINE_SPLITTER_MARKER_PREFIX}([^\n]+)\n" matches = list(re.finditer(pattern, code_with_markers)) results = {} for i, match in enumerate(matches): start = match.end() end = matches[i + 1].start() if i + 1 < len(matches) else len(code_with_markers) file_path = match.group(1).strip() code = code_with_markers[start:end].lstrip("\n") results[file_path] = code return results

github-actions · 2025-07-16T15:52:21Z

PR Code Suggestions ✨

Latest suggestions up to 654a6ec
Explore these optional code suggestions:

Category	Suggestion	Impact
General	Skip markers for None file paths Guard against None `file_path` to avoid inserting literal "None" markers. Skip or handle blocks without a `file_path` explicitly before building the joined string. codeflash/models/models.py [154-160] def flat(self) -> str: if self.cached_code is not None: return self.cached_code - self.cached_code = "\n".join( - get_code_block_splitter(block.file_path) + "\n" + block.code for block in self.code_strings - ) + parts = [] + for block in self.code_strings: + if block.file_path is not None: + parts.append(f"{get_code_block_splitter(block.file_path)}\n{block.code}") + else: + parts.append(block.code) + self.cached_code = "\n".join(parts) return self.cached_code Suggestion importance[1-10]: 7 __ Why: Guarding against `None` in `block.file_path` prevents literal `"None"` markers and avoids confusion in the generated context, improving robustness.	Medium
General	Anchor and escape splitter regex Anchor the splitter marker at the start of a line and escape special characters to avoid partial matches, and enable multiline mode for accurate parsing. codeflash/models/models.py [172-175] @staticmethod def parse_splitter_markers(code_with_markers: str) -> dict[str, str]: - pattern = rf"{LINE_SPLITTER_MARKER_PREFIX}([^\n]+)\n" - matches = list(re.finditer(pattern, code_with_markers)) + pattern = rf"^{re.escape(LINE_SPLITTER_MARKER_PREFIX)}(.+)$" + matches = list(re.finditer(pattern, code_with_markers, re.MULTILINE)) Suggestion importance[1-10]: 5 __ Why: Anchoring the pattern with `^` and using `re.escape` makes the splitter regex more precise and prevents accidental partial matches, enhancing parsing reliability.	Low
Possible issue	Fix f-string interpolation in warning The second string literal isn’t an f-string, so the braces won’t interpolate. Prefix it with `f` or merge into the preceding f-string to correctly display the existing keys. codeflash/optimization/function_optimizer.py [637-640] logger.warning( - f"Optimized code not found for {relative_module_path} In the context\n-------\n{optimized_code}\n-------\n" - "Existing files in the context are: {list(file_to_code_context.keys())}, re-check your 'split markers'" + f"Optimized code not found for {relative_module_path} In the context\n" + f"-------\n{optimized_code}\n-------\n" + f"Existing files in the context are: {list(file_to_code_context.keys())}, re-check your 'split markers'" ) Suggestion importance[1-10]: 6 __ Why: Prefixing the second literal with `f` ensures that `{list(file_to_code_context.keys())}` is interpolated, making the warning accurate and useful for debugging.	Low

Previous suggestions

Suggestions up to commit 216eb7e

Category	Suggestion	Impact
General	Normalize splitter path Use `file_path.as_posix()` in the f-string to ensure the splitter uses a consistent POSIX path, avoiding backslashes on Windows and matching test expectations. codeflash/models/models.py [142-143] def get_code_block_splitter(file_path: Path) -> str: - return f"# codeflash-splitter__{file_path}" + return f"# codeflash-splitter__{file_path.as_posix()}" Suggestion importance[1-10]: 7 __ Why: Using `file_path.as_posix()` in the splitter guarantees forward-slash paths on all OSes, avoiding backslashes on Windows and preventing cross-platform test failures.	Medium
General	Properly implement __str__ Replace the `@property` on `str` with a standard special method override so calling `str(obj)` invokes it correctly and doesn’t shadow built-in behavior. codeflash/models/models.py [150-157] -@property def __str__(self) -> str: if self.cached_code is not None: return self.cached_code self.cached_code = "\n\n".join( get_code_block_splitter(block.file_path) + "\n" + block.code for block in self.code_strings ) return self.cached_code Suggestion importance[1-10]: 6 __ Why: Defining `__str__` with `@property` prevents `str(obj)` from invoking this logic; converting it to a normal special method override ensures the custom formatter is used when calling `str()`, improving usability.	Low

…rkdown-read-writable-context

…n PR #553 (`feat/markdown-read-writable-context`) Here is an optimized version of your program with improved runtime and memory usage. Your original `path_to_code_string` function uses a dictionary comprehension, which is already efficient, but we can further optimize by minimizing attribute lookups and potential object string conversions. Also, since the base class already stores attributes, we can annotate expected attribute types for better speed in static analysis and C extensions (not runtime, but helps readability and future optimization). Here's the improved version. **Notes about the optimization:** - The for-loop avoids repeated attribute lookups and is slightly faster and less memory-intensive than a dictionary comprehension in some cases (especially for larger datasets). - Converted `self.code_strings` to a local variable for faster access inside the loop. - No unnecessary temporary objects or function calls were introduced. - This also makes it easier to add future optimizations, such as slotting or generator-based approaches for extreme scale, if needed. **Performance justification:** This makes the method marginally faster for large `code_strings` collections because it reduces temporary object allocations and attribute lookups, and dictionary insertion in a loop is roughly the same speed as a comprehension but is more explicit for optimization. Let me know if you need even lower-level optimization or have information about the structure of `file_path` or `code` that could allow further improvements!

codeflash-ai · 2025-07-25T13:33:28Z

⚡️ Codeflash found optimizations for this PR

📄 66% (0.66x) speedup for `CodeStringsMarkdown.path_to_code_string` in `codeflash/models/models.py`

⏱️ Runtime : 4.52 microseconds → 2.72 microseconds (best of 33 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method CodeStringsMarkdown.path_to_code_string by 66% in PR #553 (feat/markdown-read-writable-context) #580

If you approve, it will be merged into this PR (branch feat/markdown-read-writable-context).

mohammedahmed18 · 2025-07-26T00:28:39Z

The bubble sort test is failing because it can't generate the .sort() optimization with the current prompt.

Using the prompt here would fix the issue.

github-actions · 2025-07-26T05:51:29Z

Persistent review updated to latest commit 654a6ec

…rkdown-read-writable-context

… (`feat/markdown-read-writable-context`) The optimization achieves a **52% speedup** by eliminating repeated attribute lookups through a simple but effective change: storing `self._cache` in a local variable `cache` at the beginning of the method. **Key optimization:** - **Reduced attribute access overhead**: Instead of accessing `self._cache` multiple times (3-4 times in the original), the optimized version accesses it once and stores it in a local variable. In Python, local variable access is significantly faster than attribute access since it avoids the overhead of attribute resolution through the object's `__dict__`. **Performance impact by operation:** - The `cache.get("file_to_path")` call becomes ~3x faster (from 14,423ns to 1,079ns per hit) - Dictionary assignments and returns also benefit from faster local variable access - Total runtime drops from 22.7μs to 14.8μs **Best suited for:** Based on the test results, this optimization is particularly effective for scenarios with frequent cache lookups, showing **48-58% improvements** in basic usage patterns. The optimization scales well regardless of the `code_strings` content size since the bottleneck was in the cache access pattern, not the dictionary comprehension itself. This is a classic Python micro-optimization that leverages the performance difference between local variables (stored in a fast array) versus instance attributes (requiring dictionary lookups).

codeflash-ai · 2025-08-05T00:25:26Z

⚡️ Codeflash found optimizations for this PR

📄 53% (0.53x) speedup for `CodeStringsMarkdown.file_to_path` in `codeflash/models/models.py`

⏱️ Runtime : 22.7 microseconds → 14.8 microseconds (best of 33 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method CodeStringsMarkdown.file_to_path by 53% in PR #553 (feat/markdown-read-writable-context) #610

If you approve, it will be merged into this PR (branch feat/markdown-read-writable-context).

…rkdown-read-writable-context

misrasaurabh1 · 2025-08-05T06:41:41Z

codeflash/models/models.py

+    _cache: dict = PrivateAttr(default_factory=dict)
+
+    @property
+    def flat(self) -> str:


can you add docstring to define what flat means?

misrasaurabh1 · 2025-08-05T06:44:05Z

codeflash/optimization/function_optimizer.py

+
+            scoped_optimized_code = file_to_code_context.get(relative_module_path)
+            if scoped_optimized_code is None:
+                logger.warning(


i wonder if this code can be moved to the definition of replace_function_definitions_in_module. this will make this function simpler and make calling the replace_function_definitions_in_module safer

aseembits93 · 2025-08-06T01:34:20Z

codeflash/api/aiservice.py

            json_payload = json.dumps(payload, indent=None, default=pydantic_encoder)
+            logger.debug(f"========JSON PAYLOAD FOR {url}==============")
+            logger.debug(json_payload)
+            logger.debug("======================")


not needed imo

yeah, committed that by accident, I removed it

aseembits93 · 2025-08-06T01:37:53Z

codeflash/optimization/function_optimizer.py

        should_run_experiment, code_context, original_helper_code = initialization_result.unwrap()

-        code_print(code_context.read_writable_code)
+        code_print(code_context.read_writable_code.flat)  # Should we print the markdown or the flattened code?


can probably stick with flattened code

…rkdown-read-writable-context

… (`feat/markdown-read-writable-context`) The optimized code achieves a 10% speedup through several targeted performance improvements: **Key Optimizations:** 1. **Reduced attribute lookups in hot loops**: Pre-cached frequently accessed attributes like `helper.jedi_definition`, `helper.file_path.stem`, and method references (`helpers_by_file.__getitem__`) outside loops to avoid repeated attribute resolution. 2. **Faster AST node type checking**: Replaced `isinstance(node, ast.ImportFrom)` with `type(node) is ast.ImportFrom` and cached AST classes (`ImportFrom = ast.ImportFrom`) to eliminate repeated class lookups during AST traversal. 3. **Optimized entrypoint function discovery**: Used `ast.iter_child_nodes()` first to check top-level nodes before falling back to full `ast.walk()`, since entrypoint functions are typically at module level. 4. **Eliminated expensive set operations**: Replaced `set.intersection()` calls with simple membership testing using a direct loop (`for n in possible_call_names: if n in called_fn_names`), which short-circuits on first match and avoids creating intermediate sets. 5. **Streamlined data structure operations**: Used `setdefault()` and direct list operations instead of conditional checks, and stored local references to avoid repeated dictionary lookups. **Performance Impact by Test Case:** - Small-scale tests (basic usage): 3-12% improvement - Large-scale tests with many helpers: 10-15% improvement - Import-heavy scenarios: 4-9% improvement The optimizations are particularly effective for codebases with many helper functions and complex import structures, where the reduced overhead in hot loops compounds significantly.

codeflash-ai · 2025-08-06T22:47:32Z

⚡️ Codeflash found optimizations for this PR

📄 10% (0.10x) speedup for `detect_unused_helper_functions` in `codeflash/context/unused_definition_remover.py`

⏱️ Runtime : 20.7 milliseconds → 18.8 milliseconds (best of 5 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function detect_unused_helper_functions by 10% in PR #553 (feat/markdown-read-writable-context) #626

If you approve, it will be merged into this PR (branch feat/markdown-read-writable-context).

aseembits93 · 2025-08-06T23:17:24Z

codeflash/optimization/function_optimizer.py

        ):
            generated_results = self.generate_tests_and_optimizations(
-                testgen_context_code=code_context.testgen_context_code,
+                testgen_context_code=code_context.testgen_context_code,  # TODO: should we send the markdow context for the testgen instead.


we'll probably have to test the testgen responses with the new markdown format, are you modifying it on aiservice?

no didn't modify it

…553 (`feat/markdown-read-writable-context`) The optimization moves the `type_mapping` dictionary from being recreated inside the `show_message_log` method on every call to being a class-level attribute `_type_mapping` that is created once when the class is defined. **Key optimization:** - **Dictionary creation elimination**: The original code recreates the same 5-element dictionary mapping message type strings to `MessageType` enums on every call to `show_message_log`. The optimized version creates this mapping once as a class attribute and reuses it. **Why this provides a speedup:** - Dictionary creation in Python involves memory allocation and hash table initialization overhead - The line profiler shows the original `show_message_log` method spending significant time (99.3% of its execution) on dictionary creation and operations - By eliminating repeated dictionary creation, the optimized version reduces per-call overhead from ~46ms total time to ~33μs (1000x+ improvement for this method) **Test case performance:** The optimization particularly benefits scenarios with frequent logging calls. Test cases like `test_successful_optimization_speedup_calculation` and `test_successful_optimization_with_different_function_name` that make multiple `show_message_log` calls see the most benefit, as they avoid the repeated dictionary allocation overhead on each logging operation. This is a classic Python optimization pattern - moving constant data structures outside frequently-called methods to avoid repeated allocation costs.

codeflash-ai · 2025-08-07T04:49:28Z

⚡️ Codeflash found optimizations for this PR

📄 1,543% (15.43x) speedup for `perform_function_optimization` in `codeflash/lsp/beta.py`

⏱️ Runtime : 17.9 milliseconds → 1.09 milliseconds (best of 33 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function perform_function_optimization by 1,543% in PR #553 (feat/markdown-read-writable-context) #629

If you approve, it will be merged into this PR (branch feat/markdown-read-writable-context).

start using markdown representation for read writable context

216eb7e

github-actions bot added the Review effort 3/5 label Jul 16, 2025

mohammedahmed18 changed the title ~~start using markdown representation for read writable context~~ start using markdown representation for read writable context (CF-687) Jul 16, 2025

mohammedahmed18 added 7 commits July 24, 2025 15:09

Merge branch 'main' of github.com:codeflash-ai/codeflash into feat/ma…

cc2cc1b

…rkdown-read-writable-context

render the code markdown to the console

083a983

Merge branch 'main' of github.com:codeflash-ai/codeflash into feat/ma…

4385b8e

…rkdown-read-writable-context

split & apply

e504c87

fix tests for context extractor

99cd9dc

fix code replacement tests

330bf91

fix unused helper tests

886616f

codeflash-ai bot mentioned this pull request Jul 25, 2025

⚡️ Speed up method CodeStringsMarkdown.path_to_code_string by 66% in PR #553 (feat/markdown-read-writable-context) #580

Closed

mohammedahmed18 added 2 commits July 25, 2025 16:34

fix for python 3.9

d3e5e6f

flat method rename

f48c77d

mohammedahmed18 added 2 commits July 26, 2025 07:24

test multifile replacement

57f8af0

refactoring

654a6ec

mohammedahmed18 marked this pull request as ready for review July 26, 2025 05:50

github-actions bot added Review effort 4/5 and removed Review effort 3/5 labels Jul 26, 2025

mohammedahmed18 mentioned this pull request Jul 28, 2025

[FIX] Correct scope for new functions in the optimization context (CF-687) #490

Closed

mohammedahmed18 requested review from KRRT7 and misrasaurabh1 July 28, 2025 15:02

mohammedahmed18 added 3 commits July 28, 2025 23:08

Merge branch 'main' of github.com:codeflash-ai/codeflash into feat/ma…

7ff9759

…rkdown-read-writable-context

Merge branch 'main' of github.com:codeflash-ai/codeflash into feat/ma…

af27459

…rkdown-read-writable-context

flatten the context for refinement changes

86913d4

mohammedahmed18 added 2 commits August 4, 2025 18:43

revert unwanted changes

307e6bb

cleanup

f187456

codeflash-ai bot mentioned this pull request Aug 5, 2025

⚡️ Speed up method CodeStringsMarkdown.file_to_path by 53% in PR #553 (feat/markdown-read-writable-context) #610

Closed

Merge branch 'main' of github.com:codeflash-ai/codeflash into feat/ma…

d53be61

…rkdown-read-writable-context

misrasaurabh1 reviewed Aug 5, 2025

View reviewed changes

misrasaurabh1 previously approved these changes Aug 5, 2025

View reviewed changes

mohammedahmed18 added 2 commits August 6, 2025 01:09

send&recieve markdown code

07a9365

unit tests fixing

989b1f3

mohammedahmed18 dismissed misrasaurabh1’s stale review via 989b1f3 August 6, 2025 00:34

typo

684661e

aseembits93 reviewed Aug 6, 2025

View reviewed changes

mohammedahmed18 added 3 commits August 6, 2025 18:24

Merge branch 'main' of github.com:codeflash-ai/codeflash into feat/ma…

6fc8926

…rkdown-read-writable-context

eleminate the use of flat code for parsing

a7ff701

chore: trigger CI

c8d4e05

codeflash-ai bot mentioned this pull request Aug 6, 2025

⚡️ Speed up function detect_unused_helper_functions by 10% in PR #553 (feat/markdown-read-writable-context) #626

Closed

aseembits93 reviewed Aug 6, 2025

View reviewed changes

remove comments

ef083ec

mohammedahmed18 requested a review from misrasaurabh1 August 7, 2025 04:12

misrasaurabh1 approved these changes Aug 7, 2025

View reviewed changes

Merge branch 'main' into feat/markdown-read-writable-context

5573c46

codeflash-ai bot mentioned this pull request Aug 7, 2025

⚡️ Speed up function perform_function_optimization by 1,543% in PR #553 (feat/markdown-read-writable-context) #629

Closed

mohammedahmed18 merged commit 3171bb8 into main Aug 7, 2025
19 checks passed

[FEAT] Multi-file context (CF-687) (CF-387) (CF-640) #553

[FEAT] Multi-file context (CF-687) (CF-387) (CF-640) #553

Uh oh!

Conversation

mohammedahmed18 commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Changes diagram

Changes walkthrough 📝

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 654a6ec)

Uh oh!

github-actions bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

codeflash-ai bot commented Jul 25, 2025

⚡️ Codeflash found optimizations for this PR

📄 66% (0.66x) speedup for CodeStringsMarkdown.path_to_code_string in codeflash/models/models.py

I created a new dependent PR with the suggested changes. Please review:

Uh oh!

mohammedahmed18 commented Jul 26, 2025

Uh oh!

github-actions bot commented Jul 26, 2025

Uh oh!

codeflash-ai bot commented Aug 5, 2025

⚡️ Codeflash found optimizations for this PR

📄 53% (0.53x) speedup for CodeStringsMarkdown.file_to_path in codeflash/models/models.py

I created a new dependent PR with the suggested changes. Please review:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codeflash-ai bot commented Aug 6, 2025

⚡️ Codeflash found optimizations for this PR

📄 10% (0.10x) speedup for detect_unused_helper_functions in codeflash/context/unused_definition_remover.py

I created a new dependent PR with the suggested changes. Please review:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codeflash-ai bot commented Aug 7, 2025

⚡️ Codeflash found optimizations for this PR

📄 1,543% (15.43x) speedup for perform_function_optimization in codeflash/lsp/beta.py

I created a new dependent PR with the suggested changes. Please review:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mohammedahmed18 commented Jul 16, 2025 •

edited

Loading

github-actions bot commented Jul 16, 2025 •

edited

Loading

(Review updated until commit `654a6ec`)

github-actions bot commented Jul 16, 2025 •

edited

Loading

📄 66% (0.66x) speedup for `CodeStringsMarkdown.path_to_code_string` in `codeflash/models/models.py`

📄 53% (0.53x) speedup for `CodeStringsMarkdown.file_to_path` in `codeflash/models/models.py`

📄 10% (0.10x) speedup for `detect_unused_helper_functions` in `codeflash/context/unused_definition_remover.py`

📄 1,543% (15.43x) speedup for `perform_function_optimization` in `codeflash/lsp/beta.py`