-
Notifications
You must be signed in to change notification settings - Fork 22
[FEAT] Multi-file context (CF-687) (CF-387) (CF-640) #553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍(Review updated until commit 654a6ec)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 654a6ec
Previous suggestionsSuggestions up to commit 216eb7e
|
…rkdown-read-writable-context
…rkdown-read-writable-context
…n PR #553 (`feat/markdown-read-writable-context`) Here is an optimized version of your program with improved runtime and memory usage. Your original `path_to_code_string` function uses a dictionary comprehension, which is already efficient, but we can further optimize by minimizing attribute lookups and potential object string conversions. Also, since the base class already stores attributes, we can annotate expected attribute types for better speed in static analysis and C extensions (not runtime, but helps readability and future optimization). Here's the improved version. **Notes about the optimization:** - The for-loop avoids repeated attribute lookups and is slightly faster and less memory-intensive than a dictionary comprehension in some cases (especially for larger datasets). - Converted `self.code_strings` to a local variable for faster access inside the loop. - No unnecessary temporary objects or function calls were introduced. - This also makes it easier to add future optimizations, such as slotting or generator-based approaches for extreme scale, if needed. **Performance justification:** This makes the method marginally faster for large `code_strings` collections because it reduces temporary object allocations and attribute lookups, and dictionary insertion in a loop is roughly the same speed as a comprehension but is more explicit for optimization. Let me know if you need even lower-level optimization or have information about the structure of `file_path` or `code` that could allow further improvements!
⚡️ Codeflash found optimizations for this PR📄 66% (0.66x) speedup for
|
|
The bubble sort test is failing because it can't generate the Using the prompt here would fix the issue.
|
|
Persistent review updated to latest commit 654a6ec |
…rkdown-read-writable-context
…rkdown-read-writable-context
… (`feat/markdown-read-writable-context`)
The optimization achieves a **52% speedup** by eliminating repeated attribute lookups through a simple but effective change: storing `self._cache` in a local variable `cache` at the beginning of the method.
**Key optimization:**
- **Reduced attribute access overhead**: Instead of accessing `self._cache` multiple times (3-4 times in the original), the optimized version accesses it once and stores it in a local variable. In Python, local variable access is significantly faster than attribute access since it avoids the overhead of attribute resolution through the object's `__dict__`.
**Performance impact by operation:**
- The `cache.get("file_to_path")` call becomes ~3x faster (from 14,423ns to 1,079ns per hit)
- Dictionary assignments and returns also benefit from faster local variable access
- Total runtime drops from 22.7μs to 14.8μs
**Best suited for:**
Based on the test results, this optimization is particularly effective for scenarios with frequent cache lookups, showing **48-58% improvements** in basic usage patterns. The optimization scales well regardless of the `code_strings` content size since the bottleneck was in the cache access pattern, not the dictionary comprehension itself.
This is a classic Python micro-optimization that leverages the performance difference between local variables (stored in a fast array) versus instance attributes (requiring dictionary lookups).
⚡️ Codeflash found optimizations for this PR📄 53% (0.53x) speedup for
|
…rkdown-read-writable-context
| _cache: dict = PrivateAttr(default_factory=dict) | ||
|
|
||
| @property | ||
| def flat(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add docstring to define what flat means?
|
|
||
| scoped_optimized_code = file_to_code_context.get(relative_module_path) | ||
| if scoped_optimized_code is None: | ||
| logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if this code can be moved to the definition of replace_function_definitions_in_module. this will make this function simpler and make calling the replace_function_definitions_in_module safer
codeflash/api/aiservice.py
Outdated
| json_payload = json.dumps(payload, indent=None, default=pydantic_encoder) | ||
| logger.debug(f"========JSON PAYLOAD FOR {url}==============") | ||
| logger.debug(json_payload) | ||
| logger.debug("======================") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, committed that by accident, I removed it
| should_run_experiment, code_context, original_helper_code = initialization_result.unwrap() | ||
|
|
||
| code_print(code_context.read_writable_code) | ||
| code_print(code_context.read_writable_code.flat) # Should we print the markdown or the flattened code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can probably stick with flattened code
… (`feat/markdown-read-writable-context`) The optimized code achieves a 10% speedup through several targeted performance improvements: **Key Optimizations:** 1. **Reduced attribute lookups in hot loops**: Pre-cached frequently accessed attributes like `helper.jedi_definition`, `helper.file_path.stem`, and method references (`helpers_by_file.__getitem__`) outside loops to avoid repeated attribute resolution. 2. **Faster AST node type checking**: Replaced `isinstance(node, ast.ImportFrom)` with `type(node) is ast.ImportFrom` and cached AST classes (`ImportFrom = ast.ImportFrom`) to eliminate repeated class lookups during AST traversal. 3. **Optimized entrypoint function discovery**: Used `ast.iter_child_nodes()` first to check top-level nodes before falling back to full `ast.walk()`, since entrypoint functions are typically at module level. 4. **Eliminated expensive set operations**: Replaced `set.intersection()` calls with simple membership testing using a direct loop (`for n in possible_call_names: if n in called_fn_names`), which short-circuits on first match and avoids creating intermediate sets. 5. **Streamlined data structure operations**: Used `setdefault()` and direct list operations instead of conditional checks, and stored local references to avoid repeated dictionary lookups. **Performance Impact by Test Case:** - Small-scale tests (basic usage): 3-12% improvement - Large-scale tests with many helpers: 10-15% improvement - Import-heavy scenarios: 4-9% improvement The optimizations are particularly effective for codebases with many helper functions and complex import structures, where the reduced overhead in hot loops compounds significantly.
⚡️ Codeflash found optimizations for this PR📄 10% (0.10x) speedup for
|
| ): | ||
| generated_results = self.generate_tests_and_optimizations( | ||
| testgen_context_code=code_context.testgen_context_code, | ||
| testgen_context_code=code_context.testgen_context_code, # TODO: should we send the markdow context for the testgen instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll probably have to test the testgen responses with the new markdown format, are you modifying it on aiservice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no didn't modify it
…553 (`feat/markdown-read-writable-context`) The optimization moves the `type_mapping` dictionary from being recreated inside the `show_message_log` method on every call to being a class-level attribute `_type_mapping` that is created once when the class is defined. **Key optimization:** - **Dictionary creation elimination**: The original code recreates the same 5-element dictionary mapping message type strings to `MessageType` enums on every call to `show_message_log`. The optimized version creates this mapping once as a class attribute and reuses it. **Why this provides a speedup:** - Dictionary creation in Python involves memory allocation and hash table initialization overhead - The line profiler shows the original `show_message_log` method spending significant time (99.3% of its execution) on dictionary creation and operations - By eliminating repeated dictionary creation, the optimized version reduces per-call overhead from ~46ms total time to ~33μs (1000x+ improvement for this method) **Test case performance:** The optimization particularly benefits scenarios with frequent logging calls. Test cases like `test_successful_optimization_speedup_calculation` and `test_successful_optimization_with_different_function_name` that make multiple `show_message_log` calls see the most benefit, as they avoid the repeated dictionary allocation overhead on each logging operation. This is a classic Python optimization pattern - moving constant data structures outside frequently-called methods to avoid repeated allocation costs.
⚡️ Codeflash found optimizations for this PR📄 1,543% (15.43x) speedup for
|

User description
depends on #1705
Changes diagram
Changes walkthrough 📝
code_context_extractor.py
Switch to markdown context extractioncodeflash/context/code_context_extractor.py
extract_code_string_context_from_filescallextract_code_markdown_context_from_files.__str__of markdown context.__str__models.py
Add markdown code strings modelcodeflash/models/models.py
get_code_block_splitterhelperCodeStringsMarkdownwith cache and__str__read_writable_codetype toCodeStringsMarkdownFieldimporttest_code_context_extractor.py
Update tests for markdown splittertests/test_code_context_extractor.py
get_code_block_splitterread_writable_code.__str__PR Type
Enhancement, Tests
Description
Introduce markdown-based code context splitting
Switch extractor to
extract_code_markdown_context_from_filesUpdate optimizer to use
CodeStringsMarkdown.flatAdd splitter markers and parse for multi-file replacement
Diagram Walkthrough
File Walkthrough