Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Sep 2, 2025

⚡️ This pull request contains optimizations for PR #690

If you approve this dependent PR, these changes will be merged into the original PR branch worktree/persist-optimization-patches.

This PR will be automatically closed if the original PR is merged.


📄 112% (1.12x) speedup for get_patches_dir_for_project in codeflash/code_utils/git_worktree_utils.py

⏱️ Runtime : 137 microseconds 64.5 microseconds (best of 5 runs)

📝 Explanation and details

The optimization achieves a 112% speedup through two key changes:

  1. Replace list(repo.iter_commits(...)) with next(repo.iter_commits(...)): The original code materializes all root commits into a list just to access the first one. The optimized version uses next() to get only the first commit from the iterator, avoiding unnecessary memory allocation and iteration through all root commits. This is particularly beneficial for repositories with multiple root commits (though rare, they can occur in merged repositories).

  2. Remove redundant Path() wrapper: The original code wraps patches_dir / project_id in Path(), but since patches_dir is already a Path object and the / operator returns a Path, the wrapper is unnecessary overhead.

The test results show consistent speedups across all scenarios (93-159% faster), with the optimization being especially effective for repositories with many commits (500 commits: 18.0μs → 9.09μs) and complex structures (unusual branches: 16.2μs → 8.28μs). The next() optimization provides the most significant performance gain since it eliminates the need to create intermediate list objects and stops iteration immediately after finding the first commit.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 9 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import os
import shutil
import sys
import tempfile
from functools import lru_cache
from pathlib import Path

import git
# imports
import pytest  # used for our unit tests
from codeflash.code_utils.git_worktree_utils import get_patches_dir_for_project

# Simulate codeflash_cache_dir for testability
codeflash_cache_dir = Path(tempfile.gettempdir()) / "codeflash_test_cache"

patches_dir = codeflash_cache_dir / "patches"
from codeflash.code_utils.git_worktree_utils import get_patches_dir_for_project


@pytest.fixture
def temp_git_repo(tmp_path):
    """
    Create a temporary git repository for testing.
    Returns the repo path and the first commit sha.
    """
    repo_path = tmp_path / "repo"
    repo_path.mkdir()
    # Initialize git repo
    repo = git.Repo.init(str(repo_path))
    # Create a file and commit
    file_path = repo_path / "file.txt"
    file_path.write_text("initial")
    repo.index.add([str(file_path)])
    commit = repo.index.commit("Initial commit")
    # Return repo path and first commit sha
    return repo_path, commit.hexsha

# 1. Basic Test Cases












#------------------------------------------------
from __future__ import annotations

import os
import shutil
import sys
import tempfile
from functools import lru_cache
from pathlib import Path

import git
# imports
import pytest  # used for our unit tests
from codeflash.code_utils.git_worktree_utils import get_patches_dir_for_project

# Simulate codeflash_cache_dir as a global cache directory for testing
codeflash_cache_dir = Path(tempfile.gettempdir()) / "codeflash_cache_test"

patches_dir = codeflash_cache_dir / "patches"
from codeflash.code_utils.git_worktree_utils import get_patches_dir_for_project


def create_git_repo_with_commits(tmp_path, num_commits=1):
    """
    Helper to create a git repo with a specified number of commits.
    Returns (repo, first_commit_sha)
    """
    repo_dir = tmp_path / "repo"
    repo_dir.mkdir()
    repo = git.Repo.init(repo_dir)
    # Create and commit files
    first_commit_sha = None
    for i in range(num_commits):
        file = repo_dir / f"file{i}.txt"
        file.write_text(f"commit {i}")
        repo.index.add([str(file)])
        commit = repo.index.commit(f"commit {i}")
        if i == 0:
            first_commit_sha = commit.hexsha
    return repo, first_commit_sha, repo_dir


# ---------------- BASIC TEST CASES ----------------

def test_basic_single_commit_repo(tmp_path):
    """Test: Repo with a single commit returns correct patch dir."""
    repo, first_commit_sha, repo_dir = create_git_repo_with_commits(tmp_path, 1)
    os.chdir(repo_dir)
    # The patches dir should be patches_dir/<first_commit_sha>
    expected = patches_dir / first_commit_sha
    codeflash_output = get_patches_dir_for_project(); result = codeflash_output # 13.5μs -> 6.00μs (125% faster)

def test_basic_multiple_commits_repo(tmp_path):
    """Test: Repo with multiple commits returns patch dir for first commit."""
    repo, first_commit_sha, repo_dir = create_git_repo_with_commits(tmp_path, 5)
    os.chdir(repo_dir)
    expected = patches_dir / first_commit_sha
    codeflash_output = get_patches_dir_for_project(); result = codeflash_output # 13.4μs -> 5.16μs (159% faster)

def test_basic_dir_is_subdir_of_patches(tmp_path):
    """Test: The returned path is a subdirectory of patches_dir."""
    repo, first_commit_sha, repo_dir = create_git_repo_with_commits(tmp_path, 2)
    os.chdir(repo_dir)
    codeflash_output = get_patches_dir_for_project(); result = codeflash_output # 16.9μs -> 8.15μs (107% faster)

# ---------------- EDGE TEST CASES ----------------

def test_edge_repo_with_non_ascii_commit(tmp_path):
    """Test: Repo with a commit message containing non-ASCII characters."""
    repo_dir = tmp_path / "repo"
    repo_dir.mkdir()
    repo = git.Repo.init(repo_dir)
    file = repo_dir / "file.txt"
    file.write_text("initial")
    repo.index.add([str(file)])
    commit = repo.index.commit("初期コミット")  # Japanese for "initial commit"
    os.chdir(repo_dir)
    expected = patches_dir / commit.hexsha
    codeflash_output = get_patches_dir_for_project(); result = codeflash_output # 12.7μs -> 5.19μs (145% faster)

def test_edge_repo_with_long_commit_sha(tmp_path):
    """Test: Repo with a normal SHA, but ensure full SHA is used."""
    repo, first_commit_sha, repo_dir = create_git_repo_with_commits(tmp_path, 1)
    os.chdir(repo_dir)
    codeflash_output = get_patches_dir_for_project(); result = codeflash_output # 15.6μs -> 7.83μs (98.7% faster)




def test_edge_symlinked_repo(tmp_path):
    """Test: Symlinked repo directory should still yield correct patch dir."""
    repo, first_commit_sha, repo_dir = create_git_repo_with_commits(tmp_path, 1)
    symlink_dir = tmp_path / "symlinked"
    symlink_dir.symlink_to(repo_dir, target_is_directory=True)
    os.chdir(symlink_dir)
    codeflash_output = get_patches_dir_for_project(); result = codeflash_output # 14.3μs -> 6.40μs (124% faster)


def test_edge_repo_with_unusual_branch_names(tmp_path):
    """Test: Repo with unusual branch names does not affect patch dir."""
    repo, first_commit_sha, repo_dir = create_git_repo_with_commits(tmp_path, 1)
    os.chdir(repo_dir)
    repo.git.checkout("-b", "feature/🔥-strange_branch")
    file = repo_dir / "file2.txt"
    file.write_text("branch file")
    repo.index.add([str(file)])
    repo.index.commit("on strange branch")
    # Should still use first commit sha
    codeflash_output = get_patches_dir_for_project(); result = codeflash_output # 16.2μs -> 8.28μs (95.3% faster)

# ---------------- LARGE SCALE TEST CASES ----------------

def test_large_scale_many_commits(tmp_path):
    """Test: Repo with 500 commits returns patch dir for first commit."""
    repo, first_commit_sha, repo_dir = create_git_repo_with_commits(tmp_path, 500)
    os.chdir(repo_dir)
    codeflash_output = get_patches_dir_for_project(); result = codeflash_output # 18.0μs -> 9.09μs (98.5% faster)


def test_large_scale_long_path(tmp_path):
    """Test: Repo in a deeply nested directory returns correct patch dir."""
    deep_dir = tmp_path
    # Create a path 30 directories deep
    for i in range(30):
        deep_dir = deep_dir / f"level_{i}"
        deep_dir.mkdir()
    repo, first_commit_sha, repo_dir = create_git_repo_with_commits(deep_dir, 1)
    os.chdir(repo_dir)
    codeflash_output = get_patches_dir_for_project(); result = codeflash_output # 16.2μs -> 8.35μs (93.9% faster)

To edit these changes git checkout codeflash/optimize-pr690-2025-09-02T21.53.21 and push.

Codeflash

…(`worktree/persist-optimization-patches`)

The optimization achieves a 112% speedup through two key changes:

1. **Replace `list(repo.iter_commits(...))` with `next(repo.iter_commits(...))`**: The original code materializes all root commits into a list just to access the first one. The optimized version uses `next()` to get only the first commit from the iterator, avoiding unnecessary memory allocation and iteration through all root commits. This is particularly beneficial for repositories with multiple root commits (though rare, they can occur in merged repositories).

2. **Remove redundant `Path()` wrapper**: The original code wraps `patches_dir / project_id` in `Path()`, but since `patches_dir` is already a `Path` object and the `/` operator returns a `Path`, the wrapper is unnecessary overhead.

The test results show consistent speedups across all scenarios (93-159% faster), with the optimization being especially effective for repositories with many commits (500 commits: 18.0μs → 9.09μs) and complex structures (unusual branches: 16.2μs → 8.28μs). The `next()` optimization provides the most significant performance gain since it eliminates the need to create intermediate list objects and stops iteration immediately after finding the first commit.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Sep 2, 2025
@misrasaurabh1 misrasaurabh1 merged commit dccb3fe into worktree/persist-optimization-patches Sep 2, 2025
18 of 20 checks passed
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr690-2025-09-02T21.53.21 branch September 2, 2025 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant