Skip to content

Conversation

@mohammedahmed18
Copy link
Contributor

@mohammedahmed18 mohammedahmed18 commented Aug 11, 2025

User description

--worktree in non-lsp mode is experimental

$ codeflash --file app/main.py --function sorter -v --no-pr --worktree
patch1 patch2

PR Type

Enhancement, Tests


Description

  • Add --worktree flag to CLI for detached worktree

  • Implement detached git worktree creation and cleanup

  • Generate patches from worktree for optimizations

  • Change coverage candidate generator to use unique set


Diagram Walkthrough

flowchart LR
  A["CLI parse_args with --worktree"] --> B["Optimizer.run"]
  B --> C["Optimizer.worktree_mode"]
  C -- "git worktree add" --> D["Detached worktree dir"]
  B --> E["Function optimization"]
  E -- "create_diff_from_worktree" --> F["Patch files"]
  B --> G["cleanup_temporary_paths"]
  G -- "git worktree remove" --> D
Loading

File Walkthrough

Relevant files
Configuration changes
2 files
cli.py
add --worktree CLI option                                                               
+1/-0     
sentry.py
remove hardcoded sentry DSN                                                           
+1/-1     
Enhancement
6 files
coverage_utils.py
switch generate_candidates to return set                                 
+8/-4     
git_utils.py
add worktree snapshot and diff helpers                                     
+79/-2   
beta.py
integrate worktree diff in LSP optimize                                   
+13/-0   
server.py
annotate optimizer and set worktree arg                                   
+4/-1     
function_optimizer.py
adjust PR logic for worktree mode                                               
+24/-6   
optimizer.py
implement worktree_mode and patch handling                             
+78/-6   
Tests
1 files
test_code_utils.py
update test_generate_candidates to expect set                       
+3/-2     


PR Type

Enhancement, Tests


Description

  • Add --worktree flag for detached worktree

  • Implement detached worktree creation & cleanup

  • Generate patches from worktree for optimizations

  • Use unique set in coverage candidate generator


Diagram Walkthrough

flowchart LR
  A["CLI parse_args with --worktree"] --> B["Optimizer.run"]
  B --> C["Optimizer.worktree_mode"]
  C -- "git worktree add" --> D["Detached worktree dir"]
  B --> E["Function optimization"]
  E -- "create_diff_from_worktree" --> F["Patch files"]
  B --> G["cleanup_temporary_paths"]
  G -- "git worktree remove" --> D
Loading

File Walkthrough

Relevant files
Enhancement
14 files
aiservice.py
Import `is_LSP_enabled` from `lsp.helpers`                             
+2/-1     
cfapi.py
Use `is_LSP_enabled()` for version checks                               
+2/-1     
coverage_utils.py
Make coverage candidates a unique set                                       
+8/-4     
env_utils.py
Refactor LSP detection in `env_utils`                                       
+8/-11   
formatter.py
Use `is_LSP_enabled()` in `format_code`                                   
+2/-2     
git_utils.py
Implement detached worktree utility functions                       
+84/-1   
beta.py
Integrate worktree into LSP optimization flow                       
+142/-132
helpers.py
Add `is_LSP_enabled()` helper                                                       
+7/-0     
server.py
Enable worktree mode in LSP server init                                   
+6/-2     
server_entry.py
Add delimiter in LSP log formatter                                             
+2/-1     
models.py
Allow updating `OptimizedCandidate` explanation                   
+3/-0     
function_optimizer.py
Preserve candidate explanation after review                           
+8/-0     
optimizer.py
Add `worktree_mode` to `Optimizer` class                                 
+86/-6   
concolic_testing.py
Import `is_LSP_enabled` from LSP helpers                                 
+1/-1     
Configuration changes
2 files
cli.py
Add `--worktree` CLI flag                                                               
+1/-0     
console.py
Silence console in LSP mode                                                           
+5/-0     
Refactoring
1 files
__init__.py
Remove console silencing in LSP init                                         
+0/-4     
Bug fix
1 files
main.py
Restore console output in `main` entry                                     
+2/-1     
Tests
1 files
test_code_utils.py
Update test for `generate_candidates` set output                 
+3/-2     

@github-actions
Copy link

github-actions bot commented Aug 11, 2025

PR Reviewer Guide 🔍

(Review updated until commit 22b5065)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Missing Import

The function uses sys.exit(1) but sys is not imported, which will cause a NameError.

if is_LSP_enabled():
    logger.debug(msg)
    return f"Error: {msg}"
sys.exit(1)
Env var check

The existence check if os.getenv("CODEFLASH_LSP") treats any non-empty value (including "false") as true, unintentionally enabling quiet mode.

if os.getenv("CODEFLASH_LSP"):
    console.quiet = True
Return Type Mismatch

create_diff_from_worktree is declared to return a Path but returns None when there are no changes, leading to unexpected None results.

def create_diff_from_worktree(worktree_dir: Path, files: list[str], fto_name: str) -> Path:
    repository = git.Repo(worktree_dir, search_parent_directories=True)
    uni_diff_text = repository.git.diff(None, "HEAD", *files, ignore_blank_lines=True, ignore_space_at_eol=True)

    if not uni_diff_text:
        logger.warning("No changes found in worktree.")
        return None

@github-actions
Copy link

github-actions bot commented Aug 11, 2025

PR Code Suggestions ✨

Latest suggestions up to 22b5065
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Map functions by actual file paths

The current loop uses the same file_path key for every entry and overwrites previous
mappings. Use the actual file paths from optimizable_funcs to build the dictionary
so each file is represented.

codeflash/lsp/beta.py [57-59]

-path_to_qualified_names = {}
-for functions in optimizable_funcs.values():
-    path_to_qualified_names[file_path] = [func.qualified_name for func in functions]
+path_to_qualified_names = {
+    path.as_posix(): [func.qualified_name for func in funcs]
+    for path, funcs in optimizable_funcs.items()
+}
Suggestion importance[1-10]: 10

__

Why: The code overwrites all entries under the same file_path key instead of using each key from optimizable_funcs, causing incorrect mapping of functions to files.

High
Handle worktree creation failures

Because check=True raises an exception on non-zero exit, the returncode check is
never reached. Use check=False and catch CalledProcessError to handle failures
gracefully.

codeflash/code_utils/git_utils.py [216-225]

-result = subprocess.run(
-    ["git", "worktree", "add", "-d", str(worktree_dir)],
-    cwd=git_root,
-    check=True,
-    stdout=subprocess.DEVNULL if is_LSP_enabled() else None,
-    stderr=subprocess.DEVNULL if is_LSP_enabled() else None,
-)
-if result.returncode != 0:
-    logger.error(f"Failed to create worktree: {result.stderr}")
+try:
+    result = subprocess.run(
+        ["git", "worktree", "add", "-d", str(worktree_dir)],
+        cwd=git_root,
+        check=False,
+        stdout=subprocess.DEVNULL if is_LSP_enabled() else None,
+        stderr=subprocess.DEVNULL if is_LSP_enabled() else None,
+    )
+    if result.returncode != 0:
+        raise subprocess.CalledProcessError(result.returncode, result.args, stderr=result.stderr)
+except subprocess.CalledProcessError as e:
+    logger.error(f"Failed to create worktree: {e.stderr}")
     return None
Suggestion importance[1-10]: 8

__

Why: With check=True, a non-zero exit raises before the return code check, so failures are never logged; switching to check=False or catching CalledProcessError ensures graceful error handling.

Medium
Remove undefined initializer call

Calling an undefined function _initialize_optimizer_if_valid will raise a NameError.
Replace it with a clear guard that raises an explicit error or invoke a known
initialization method to ensure server.optimizer is set.

codeflash/lsp/beta.py [73-74]

 if server.optimizer is None:
-    _initialize_optimizer_if_valid(server)
+    raise RuntimeError("Optimizer is not initialized. Ensure `prepare_optimizer_arguments()` is called before use.")
Suggestion importance[1-10]: 8

__

Why: The helper _initialize_optimizer_if_valid is not defined in this context, leading to a NameError and preventing proper optimizer initialization.

Medium

Previous suggestions

Suggestions up to commit b955ad8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Catch worktree creation failures

The check=True flag causes subprocess.run to raise an exception on non-zero exit, so
the returncode check is never reached. Wrap the call in a try/except to handle
failures cleanly and log errors.

codeflash/code_utils/git_utils.py [215-224]

-result = subprocess.run(
-    ["git", "worktree", "add", "-d", str(worktree_dir)],
-    cwd=git_root,
-    check=True,
-    stdout=subprocess.DEVNULL if console.quiet else None,
-    stderr=subprocess.DEVNULL if console.quiet else None,
-)
-if result.returncode != 0:
-    logger.error(f"Failed to create worktree: {result.stderr}")
+try:
+    subprocess.run(
+        ["git", "worktree", "add", "-d", str(worktree_dir)],
+        cwd=git_root,
+        check=True,
+        stdout=subprocess.DEVNULL if console.quiet else None,
+        stderr=subprocess.DEVNULL if console.quiet else None,
+    )
+except subprocess.CalledProcessError as e:
+    logger.error(f"Failed to create worktree: {e}")
     return None
Suggestion importance[1-10]: 8

__

Why: Wrapping the subprocess.run call in a try/except correctly handles errors thrown by check=True, otherwise the returncode check is dead code and failures go uncaught.

Medium
Conditionally include patch_path

If there are no changes, patch_path may be None, causing an unexpected null in the
response. Only include patch_path when it is not None, or convert it to a string
when present.

codeflash/lsp/beta.py [354-361]

-return {
+result = {
     "functionName": params.functionName,
     "status": "success",
     "message": "Optimization completed successfully",
     "extra": f"Speedup: {speedup:.2f}x faster",
     "optimization": optimized_source,
-    "patch_path": patch_path,
 }
+if patch_path:
+    result["patch_path"] = str(patch_path)
+return result
Suggestion importance[1-10]: 6

__

Why: Guarding against a None patch_path prevents null values in the API response and converting to a string ensures consistency in the returned JSON.

Low
General
Allow None diff return

The function is declared to return a Path but returns None when there are no
changes, leading to type mismatches. Update the signature to Optional[Path] to
reflect that None may be returned.

codeflash/code_utils/git_utils.py [259-265]

-def create_diff_from_worktree(worktree_dir: Path, files: list[str], fto_name: str) -> Path:
+def create_diff_from_worktree(worktree_dir: Path, files: list[str], fto_name: str) -> Optional[Path]:
     ...
     if not uni_diff_text:
         logger.warning("No changes found in worktree.")
         return None
Suggestion importance[1-10]: 5

__

Why: The signature -> Path is inaccurate when None is returned on no diff; changing it to Optional[Path] aligns the type hint with actual behavior.

Low

codeflash-ai bot added a commit that referenced this pull request Aug 11, 2025
…etached-worktrees`)

The optimization replaces expensive Path object construction with simple string formatting. The key change is on line 11:

**Original**: `candidate_path = str(Path(current_path.name) / last_added)`
**Optimized**: `candidate_path = f"{current_path.name}/{last_added}"`

This eliminates two major performance bottlenecks:
1. **Path object creation** - Creating a new Path object for each iteration is expensive
2. **Path division operator** - The `/` operator on Path objects involves internal path resolution logic

The f-string formatting is dramatically faster since it's just string concatenation, while Path operations involve filesystem-aware logic even when not needed.

The line profiler shows the critical loop line dropped from 56% of total runtime (42.1ms) to just 5.2% (1.8ms) - a **23x improvement** on the bottleneck operation.

This optimization is particularly effective for:
- **Deeply nested paths** (100+ levels): 394% speedup
- **Wide directory structures**: 151% speedup  
- **Absolute paths**: 108-122% speedup

The performance gains scale with path depth since the optimization is applied once per directory level in the traversal loop. Simple paths see modest gains (1-4%), but complex nested structures see dramatic improvements due to eliminating repeated expensive Path operations.
@mohammedahmed18 mohammedahmed18 marked this pull request as draft August 11, 2025 22:42
codeflash-ai bot added a commit that referenced this pull request Aug 12, 2025
…/detached-worktrees`)

The optimized code achieves a **35% speedup** through several key performance improvements:

**Primary optimizations:**

1. **Eliminated redundant function calls**: The original code called `is_LSP_enabled()` multiple times - once in the conditional expression and again in the fork check. The optimized version caches this result in a variable, reducing function call overhead.

2. **Improved shell config parsing**: Replaced `matches[-1] if matches else None` with `next(reversed(matches), None)`, which is more efficient for getting the last element without creating an intermediate list slice.

3. **Streamlined environment variable access**: Changed `os.getenv("CODEFLASH_LSP", default="false")` to `os.environ.get("CODEFLASH_LSP", "false")`, eliminating the keyword argument overhead of `default=`.

4. **Reduced file I/O operations**: Removed the intermediate `shell_contents` variable assignment, processing the file content directly in the regex operation.

5. **Optimized control flow**: Restructured the conditional logic to avoid redundant `is_LSP_enabled()` calls and simplified the fork detection path.

**Performance characteristics by test type:**
- **Basic environment variable tests**: 4-6% improvement due to reduced function call overhead
- **Missing API key scenarios**: Up to 503% faster due to streamlined error path logic
- **Shell config file operations**: Modest improvements (1-8%) from optimized file processing
- **LSP mode operations**: 5-9% faster due to cached LSP state checks

The optimizations are particularly effective for error cases and LSP mode scenarios where multiple conditional checks were previously duplicated.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Aug 12, 2025

⚡️ Codeflash found optimizations for this PR

📄 36% (0.36x) speedup for get_codeflash_api_key in codeflash/code_utils/env_utils.py

⏱️ Runtime : 665 microseconds 490 microseconds (best of 1 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch feat/detached-worktrees).

@mohammedahmed18 mohammedahmed18 force-pushed the feat/detached-worktrees branch from 15809fc to e33ff21 Compare August 14, 2025 12:20
@mohammedahmed18 mohammedahmed18 marked this pull request as ready for review August 14, 2025 12:22
@github-actions
Copy link

Persistent review updated to latest commit 22b5065

@mohammedahmed18 mohammedahmed18 requested review from KRRT7 and Saga4 August 14, 2025 12:25
codeflash-ai bot added a commit that referenced this pull request Aug 14, 2025
…t/detached-worktrees`)

The optimized code achieves a **81x speedup** by eliminating expensive Path object operations in the main loop. The key optimization replaces the original approach that repeatedly calls `current_path.parent` and constructs `Path(current_path.name) / last_added` objects with a more efficient strategy using `source_code_path.parts`.

**Key Changes:**
1. **Pre-compute path parts**: Instead of traversing up the directory tree with `.parent` calls, the code extracts all path components once using `source_code_path.parts`
2. **Replace Path construction with string formatting**: The expensive `str(Path(current_path.name) / last_added)` operation (96% of original runtime) is replaced with simple string formatting `f"{parts[i]}/{last_added}"`
3. **Eliminate parent traversal loop**: The `while current_path != current_path.parent` loop is replaced with a `for` loop that iterates through pre-computed parts

**Why This Is Faster:**
- Path object construction and filesystem-style operations are expensive in Python
- String concatenation with f-strings is much faster than Path operations
- Array indexing (`parts[i]`) is faster than repeated method calls (`.parent`)
- The optimization eliminates the bottleneck line that consumed 96% of the original runtime

**Performance Characteristics:**
The optimization shows excellent scaling - deeper directory structures see greater speedups (up to 104x faster for 1000-level nesting). All test cases benefit significantly, with the smallest improvement being 30% for simple cases and massive gains of 8000%+ for deeply nested paths.
Comment on lines 46 to 53
current_path = source_code_path.parent

last_added = source_code_path.name
while current_path != current_path.parent:
candidate_path = str(Path(current_path.name) / candidates[-1])
candidates.append(candidate_path)
candidate_path = str(Path(current_path.name) / last_added)
candidates.add(candidate_path)
last_added = candidate_path
current_path = current_path.parent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 8,048% (80.48x) speedup for generate_candidates in codeflash/code_utils/coverage_utils.py

⏱️ Runtime : 162 milliseconds 1.99 milliseconds (best of 365 runs)

📝 Explanation and details

The optimized code achieves a 81x speedup by eliminating expensive Path object operations in the main loop. The key optimization replaces the original approach that repeatedly calls current_path.parent and constructs Path(current_path.name) / last_added objects with a more efficient strategy using source_code_path.parts.

Key Changes:

  1. Pre-compute path parts: Instead of traversing up the directory tree with .parent calls, the code extracts all path components once using source_code_path.parts
  2. Replace Path construction with string formatting: The expensive str(Path(current_path.name) / last_added) operation (96% of original runtime) is replaced with simple string formatting f"{parts[i]}/{last_added}"
  3. Eliminate parent traversal loop: The while current_path != current_path.parent loop is replaced with a for loop that iterates through pre-computed parts

Why This Is Faster:

  • Path object construction and filesystem-style operations are expensive in Python
  • String concatenation with f-strings is much faster than Path operations
  • Array indexing (parts[i]) is faster than repeated method calls (.parent)
  • The optimization eliminates the bottleneck line that consumed 96% of the original runtime

Performance Characteristics:
The optimization shows excellent scaling - deeper directory structures see greater speedups (up to 104x faster for 1000-level nesting). All test cases benefit significantly, with the smallest improvement being 30% for simple cases and massive gains of 8000%+ for deeply nested paths.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 40 Passed
🌀 Generated Regression Tests 38 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_code_utils.py::test_generate_candidates 69.6μs 7.84μs 788%✅
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

from pathlib import Path

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.coverage_utils import generate_candidates

# unit tests

# ------------- Basic Test Cases -------------

def test_basic_single_file():
    # Basic case: file in root directory
    path = Path("foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 7.91μs -> 5.19μs (52.3% faster)

def test_basic_nested_file():
    # File in a nested directory
    path = Path("src/app/foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 25.5μs -> 5.87μs (334% faster)

def test_basic_deep_nested_file():
    # File in a deeper nested directory
    path = Path("a/b/c/d/e/foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 46.4μs -> 6.39μs (626% faster)
    # Should include all intermediate candidates
    expected = {
        "foo.py",
        "e/foo.py",
        "d/e/foo.py",
        "c/d/e/foo.py",
        "b/c/d/e/foo.py",
        "a/b/c/d/e/foo.py"
    }

# ------------- Edge Test Cases -------------

def test_edge_empty_path():
    # Edge: empty path
    path = Path("")
    codeflash_output = generate_candidates(path); result = codeflash_output # 7.03μs -> 4.93μs (42.7% faster)

def test_edge_root_path():
    # Edge: root path ("/foo.py" on Unix)
    path = Path("/foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 7.00μs -> 5.35μs (30.9% faster)

def test_edge_single_directory():
    # Edge: file directly in a directory
    path = Path("dir/foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 17.2μs -> 5.06μs (241% faster)

def test_edge_dot_in_path():
    # Edge: directories or files with dots in their names
    path = Path("a.b/c.d/foo.e.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 25.5μs -> 5.80μs (340% faster)

def test_edge_trailing_slash():
    # Edge: path with trailing slash
    path = Path("src/app/foo.py/")
    codeflash_output = generate_candidates(path); result = codeflash_output # 24.9μs -> 5.58μs (346% faster)

def test_edge_windows_path():
    # Edge: Windows-style path
    path = Path("src\\app\\foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 7.51μs -> 4.90μs (53.4% faster)

def test_edge_non_ascii_characters():
    # Edge: non-ASCII characters in path
    path = Path("src/üñîçødë/文件.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 26.1μs -> 6.15μs (324% faster)

def test_edge_path_with_spaces():
    # Edge: path with spaces
    path = Path("my project/code file.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 16.8μs -> 5.13μs (227% faster)

def test_edge_path_with_dot_and_slash():
    # Edge: path with leading "./"
    path = Path("./foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 7.34μs -> 4.83μs (52.1% faster)

def test_edge_path_with_parent_references():
    # Edge: path with parent directory references
    path = Path("a/b/../c/foo.py").resolve()
    codeflash_output = generate_candidates(path); result = codeflash_output # 61.5μs -> 5.59μs (1001% faster)

# ------------- Large Scale Test Cases -------------

def test_large_deeply_nested_path():
    # Large: deeply nested path (depth 20)
    parts = [f"dir{i}" for i in range(20)]
    path = Path(*parts, "foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 168μs -> 9.58μs (1662% faster)

    # Check that each candidate is present
    last = "foo.py"
    for i in range(1, 21):
        last = f"dir{20-i}/{last}"

def test_large_many_candidates():
    # Large: wide directory tree (simulate many similar files)
    # This test only checks one file but ensures no performance issue
    path = Path("/".join(f"dir{i}" for i in range(50)) + "/foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 494μs -> 15.0μs (3197% faster)

def test_large_unique_candidates():
    # Large: ensure all candidates are unique
    path = Path("a/b/c/d/e/f/g/h/i/j/foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 80.9μs -> 7.11μs (1037% faster)

def test_large_path_performance():
    # Large: path with many directories, performance check
    path = Path("/".join(f"level{i}" for i in range(100)) + "/file.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 1.32ms -> 30.7μs (4198% faster)

def test_large_various_path_types():
    # Large: mix of absolute and relative, dots, and unicode
    path = Path("/abs/üñîçødë/./rel/../file.py").resolve()
    codeflash_output = generate_candidates(path); result = codeflash_output # 23.8μs -> 4.38μs (445% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

from pathlib import Path

# imports
import pytest  # used for our unit tests
from codeflash.code_utils.coverage_utils import generate_candidates

# unit tests

# ---------------------- BASIC TEST CASES ----------------------

def test_single_file_in_root():
    # File in the root directory
    path = Path("foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 7.68μs -> 5.17μs (48.6% faster)

def test_file_in_one_subdirectory():
    # File in a single subdirectory
    path = Path("src/foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 16.9μs -> 5.14μs (228% faster)

def test_file_in_two_subdirectories():
    # File in two nested subdirectories
    path = Path("src/bar/foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 24.5μs -> 5.68μs (331% faster)

def test_file_with_dot_in_name():
    # File with dots in the name
    path = Path("src/foo.bar.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 16.8μs -> 5.00μs (237% faster)

def test_file_with_multiple_extensions():
    # File with multiple extensions
    path = Path("src/foo.test.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 16.5μs -> 4.99μs (230% faster)

# ---------------------- EDGE TEST CASES ----------------------

def test_file_at_filesystem_root():
    # File at the filesystem root (Unix style)
    path = Path("/foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 6.79μs -> 5.13μs (32.4% faster)

def test_file_in_deeply_nested_structure():
    # Deeply nested structure
    path = Path("a/b/c/d/e/f/g/h/i/j/foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 82.7μs -> 7.27μs (1037% faster)
    expected = {
        "foo.py",
        "j/foo.py",
        "i/j/foo.py",
        "h/i/j/foo.py",
        "g/h/i/j/foo.py",
        "f/g/h/i/j/foo.py",
        "e/f/g/h/i/j/foo.py",
        "d/e/f/g/h/i/j/foo.py",
        "c/d/e/f/g/h/i/j/foo.py",
        "b/c/d/e/f/g/h/i/j/foo.py",
        "a/b/c/d/e/f/g/h/i/j/foo.py",
    }

def test_file_with_spaces():
    # File with spaces in the name and directories
    path = Path("my folder/another folder/my file.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 25.1μs -> 5.71μs (340% faster)

def test_file_with_unicode_characters():
    # File with unicode characters
    path = Path("src/测试.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 17.2μs -> 5.10μs (237% faster)

def test_file_with_leading_dot():
    # Hidden file (leading dot)
    path = Path(".hidden/foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 16.2μs -> 4.99μs (224% faster)

def test_file_with_empty_path():
    # Empty path should raise an error
    path = Path("")
    codeflash_output = generate_candidates(path); result = codeflash_output # 7.08μs -> 4.72μs (50.1% faster)

def test_file_with_trailing_slash():
    # Path with trailing slash (should treat as directory, not file)
    path = Path("src/foo.py/")
    codeflash_output = generate_candidates(path); result = codeflash_output # 16.7μs -> 5.05μs (231% faster)

def test_file_with_multiple_separators():
    # Path with redundant separators
    path = Path("src//bar///foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 24.7μs -> 5.74μs (330% faster)

def test_file_with_parent_directory_reference():
    # Path with parent directory references
    path = Path("src/bar/../foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 32.7μs -> 5.78μs (466% faster)
    # Pathlib resolves "bar/../foo.py" to "src/foo.py"
    resolved_path = path.resolve().relative_to(Path.cwd())
    codeflash_output = generate_candidates(resolved_path); result = codeflash_output # 13.9μs -> 4.07μs (243% faster)

# ---------------------- LARGE SCALE TEST CASES ----------------------

def test_large_number_of_nested_directories():
    # Test with 1000 nested directories (limited to 1000 as per instruction)
    dirs = [f"dir{i}" for i in range(1000)]
    path = Path("/".join(dirs)) / "foo.py"
    codeflash_output = generate_candidates(path); result = codeflash_output # 80.6ms -> 991μs (8027% faster)
    # Should contain 1001 candidates: "foo.py", "dir999/foo.py", ..., "dir0/dir1/.../dir999/foo.py"
    expected = set()
    last = "foo.py"
    for i in reversed(range(1000)):
        last = f"dir{i}/{last}"
        expected.add(last)
    expected.add("foo.py")
    expected.add(str(path))

def test_performance_with_wide_directory_names():
    # Test with 1000-character directory names
    dirname = "a" * 1000
    path = Path(dirname) / "foo.py"
    codeflash_output = generate_candidates(path); result = codeflash_output # 18.2μs -> 6.26μs (191% faster)

def test_performance_with_long_file_name():
    # Test with a very long file name (255 chars, common max for filesystems)
    filename = "f" * 255 + ".py"
    path = Path("src") / filename
    codeflash_output = generate_candidates(path); result = codeflash_output # 16.6μs -> 5.29μs (213% faster)

def test_performance_with_many_candidates():
    # Test with 1000 directories, ensure set size is correct
    dirs = [f"d{i}" for i in range(1000)]
    path = Path(*dirs, "foo.py")
    codeflash_output = generate_candidates(path); result = codeflash_output # 78.5ms -> 748μs (10395% faster)

def test_path_with_non_ascii_and_long_names():
    # Test with a mix of unicode and long names
    dirname = "测试" * 200  # 600 characters
    filename = "文件" * 50 + ".py"  # 102 characters
    path = Path(dirname) / filename
    codeflash_output = generate_candidates(path); result = codeflash_output # 17.9μs -> 6.23μs (188% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from codeflash.code_utils.coverage_utils import generate_candidates
from pathlib import Path

def test_generate_candidates():
    generate_candidates(Path())
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_9d591alw/tmpkrdmrlwv/test_concolic_coverage.py::test_generate_candidates 7.26μs 4.90μs 48.3%✅

To test or edit this optimization locally git merge codeflash/optimize-pr649-2025-08-14T13.45.27

Suggested change
current_path = source_code_path.parent
last_added = source_code_path.name
while current_path != current_path.parent:
candidate_path = str(Path(current_path.name) / candidates[-1])
candidates.append(candidate_path)
candidate_path = str(Path(current_path.name) / last_added)
candidates.add(candidate_path)
last_added = candidate_path
current_path = current_path.parent
last_added = source_code_path.name
parts = source_code_path.parts
for i in range(len(parts) - 2, 0, -1):
candidate_path = f"{parts[i]}/{last_added}"
candidates.add(candidate_path)
last_added = candidate_path

Comment on lines 216 to 225
result = subprocess.run(
["git", "worktree", "add", "-d", str(worktree_dir)],
cwd=git_root,
check=True,
stdout=subprocess.DEVNULL if is_LSP_enabled() else None,
stderr=subprocess.DEVNULL if is_LSP_enabled() else None,
)
if result.returncode != 0:
logger.error(f"Failed to create worktree: {result.stderr}")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we do this via the git module we're already using?

Comment on lines 237 to 241
with tempfile.NamedTemporaryFile(mode="w+", suffix=".codeflash.patch", delete=False) as tmp_patch_file:
tmp_patch_file.write(uni_diff_text + "\n") # the new line here is a must otherwise the last hunk won't be valid
tmp_patch_file.flush()

patch_path = Path(tmp_patch_file.name).resolve()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're using a namedtempfile, we should use w mode directly since the file wouldn't exist and thus not truncated

Comment on lines 352 to 354
def set_explanation(self, new_explanation: str) -> None:
object.__setattr__(self, "explanation", new_explanation)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer setattr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually removed the frozen attribute from OptimizedCandidate, so there is no need for this hacky way of mutating the explanation

@mohammedahmed18 mohammedahmed18 requested a review from KRRT7 August 19, 2025 00:25
KRRT7
KRRT7 previously approved these changes Aug 19, 2025
Copy link
Contributor

@KRRT7 KRRT7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving, we should aim to have more test coverage for the LSP once we have it a bit more polished

@mohammedahmed18
Copy link
Contributor Author

@KRRT7 okay let me just fix this mypy issue

@mohammedahmed18 mohammedahmed18 merged commit a716c9b into main Aug 19, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants