-
Couldn't load subscription status.
- Fork 22
[Worktree] Persist optimization patches metadata #690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
a664330
db96393
cd7e1e1
4b7bf76
503fa94
b6f6661
40b91f0
0de7ebd
c35705e
e83478e
9acb430
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -11,7 +11,11 @@ | |||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| from codeflash.api.cfapi import get_codeflash_api_key, get_user_id | ||||||||||||||||||||||||||
| from codeflash.cli_cmds.cli import process_pyproject_config | ||||||||||||||||||||||||||
| from codeflash.code_utils.git_utils import create_diff_patch_from_worktree | ||||||||||||||||||||||||||
| from codeflash.code_utils.git_utils import ( | ||||||||||||||||||||||||||
| create_diff_patch_from_worktree, | ||||||||||||||||||||||||||
| get_patches_metadata, | ||||||||||||||||||||||||||
| overwrite_patch_metadata, | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
| from codeflash.code_utils.shell_utils import save_api_key_to_rc | ||||||||||||||||||||||||||
| from codeflash.discovery.functions_to_optimize import filter_functions, get_functions_within_git_diff | ||||||||||||||||||||||||||
| from codeflash.either import is_successful | ||||||||||||||||||||||||||
|
|
@@ -39,6 +43,11 @@ class ProvideApiKeyParams: | |||||||||||||||||||||||||
| api_key: str | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| @dataclass | ||||||||||||||||||||||||||
| class OnPatchAppliedParams: | ||||||||||||||||||||||||||
| patch_id: str | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| server = CodeflashLanguageServer("codeflash-language-server", "v1.0", protocol_cls=CodeflashLanguageServerProtocol) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
@@ -216,6 +225,34 @@ def provide_api_key(server: CodeflashLanguageServer, params: ProvideApiKeyParams | |||||||||||||||||||||||||
| return {"status": "error", "message": "something went wrong while saving the api key"} | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| @server.feature("retrieveSuccessfulOptimizations") | ||||||||||||||||||||||||||
| def retrieve_successful_optimizations(_server: CodeflashLanguageServer, _params: any) -> dict[str, str]: | ||||||||||||||||||||||||||
| metadata = get_patches_metadata() | ||||||||||||||||||||||||||
| return {"status": "success", "patches": metadata["patches"]} | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| @server.feature("onPatchApplied") | ||||||||||||||||||||||||||
| def on_patch_applied(_server: CodeflashLanguageServer, params: OnPatchAppliedParams) -> dict[str, str]: | ||||||||||||||||||||||||||
| # first remove the patch from the metadata | ||||||||||||||||||||||||||
| metadata = get_patches_metadata() | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| deleted_patch_file = None | ||||||||||||||||||||||||||
| new_patches = [] | ||||||||||||||||||||||||||
| for patch in metadata["patches"]: | ||||||||||||||||||||||||||
| if patch["id"] == params.patch_id: | ||||||||||||||||||||||||||
| deleted_patch_file = patch["patch_path"] | ||||||||||||||||||||||||||
| continue | ||||||||||||||||||||||||||
| new_patches.append(patch) | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| overwrite_patch_metadata(new_patches) | ||||||||||||||||||||||||||
| # then remove the patch file | ||||||||||||||||||||||||||
| if deleted_patch_file: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
| Test | Status |
|---|---|
| ⚙️ Existing Unit Tests | 🔘 None Found |
| 🌀 Generated Regression Tests | ✅ 28 Passed |
| ⏪ Replay Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | 🔘 None Found |
| 📊 Tests Coverage | 64.3% |
🌀 Generated Regression Tests and Runtime
from __future__ import annotations
import json
import os
import shutil
import tempfile
from functools import lru_cache
from pathlib import Path
from types import SimpleNamespace
from typing import Any
import git
# imports
import pytest # used for our unit tests
from codeflash.code_utils.compat import codeflash_cache_dir
from codeflash.code_utils.git_utils import (get_patches_metadata,
overwrite_patch_metadata)
from codeflash.lsp.beta import on_patch_applied
from codeflash.lsp.server import (CodeflashLanguageServer,
CodeflashLanguageServerProtocol)
from filelock import FileLock
from git import Repo
server = CodeflashLanguageServer("codeflash-language-server", "v1.0", protocol_cls=CodeflashLanguageServerProtocol)
from codeflash.lsp.beta import on_patch_applied
# Helper to create a patch metadata file and patch file
def setup_patch(tmp_path, patch_id="patch1", patch_content="patch content"):
patches_root = tmp_path / "cache" / "patches" / "test_project_id"
patches_root.mkdir(parents=True, exist_ok=True)
patch_file = patches_root / f"{patch_id}.patch"
patch_file.write_text(patch_content)
metadata = {
"id": "test_project_id",
"patches": [
{"id": patch_id, "patch_path": str(patch_file)}
]
}
(patches_root / "metadata.json").write_text(json.dumps(metadata, indent=2))
return patch_file, patches_root / "metadata.json"
# Helper to create arbitrary metadata
def setup_metadata(tmp_path, patches):
patches_root = tmp_path / "cache" / "patches" / "test_project_id"
patches_root.mkdir(parents=True, exist_ok=True)
metadata = {
"id": "test_project_id",
"patches": patches
}
(patches_root / "metadata.json").write_text(json.dumps(metadata, indent=2))
return patches_root / "metadata.json"
# Helper params object
def make_params(patch_id):
return SimpleNamespace(patch_id=patch_id)
# -------------------------
# Basic Test Cases
# -------------------------
def test_remove_existing_patch(tmp_path):
"""Test removing a patch that exists in metadata and on disk."""
patch_file, meta_file = setup_patch(tmp_path, patch_id="patch1")
params = make_params("patch1")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 454μs -> 79.3μs (473% faster)
# Metadata should have no patches
data = json.loads(meta_file.read_text())
def test_remove_patch_updates_metadata_only(tmp_path):
"""Test removing a patch updates only the target patch in metadata, not others."""
# Setup two patches
patches_root = tmp_path / "cache" / "patches" / "test_project_id"
patches_root.mkdir(parents=True, exist_ok=True)
patch_file1 = patches_root / "patch1.patch"
patch_file2 = patches_root / "patch2.patch"
patch_file1.write_text("patch1 content")
patch_file2.write_text("patch2 content")
metadata = {
"id": "test_project_id",
"patches": [
{"id": "patch1", "patch_path": str(patch_file1)},
{"id": "patch2", "patch_path": str(patch_file2)},
]
}
meta_file = patches_root / "metadata.json"
meta_file.write_text(json.dumps(metadata, indent=2))
# Remove patch1
params = make_params("patch1")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 418μs -> 70.2μs (497% faster)
# Only patch2 left in metadata
data = json.loads(meta_file.read_text())
def test_remove_patch_file_missing(tmp_path):
"""Test removing a patch where the patch file is already missing (should still succeed)."""
patch_file, meta_file = setup_patch(tmp_path, patch_id="patch1")
patch_file.unlink() # Remove the patch file before running
params = make_params("patch1")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 416μs -> 70.7μs (489% faster)
# Metadata should have no patches
data = json.loads(meta_file.read_text())
# -------------------------
# Edge Test Cases
# -------------------------
def test_remove_nonexistent_patch(tmp_path):
"""Test removing a patch that does not exist in metadata."""
setup_patch(tmp_path, patch_id="patch1")
params = make_params("not_a_patch")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 414μs -> 67.1μs (517% faster)
def test_remove_patch_empty_metadata(tmp_path):
"""Test removing a patch when metadata is empty."""
setup_metadata(tmp_path, [])
params = make_params("patch1")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 414μs -> 68.3μs (507% faster)
def test_remove_patch_metadata_file_missing(tmp_path):
"""Test removing a patch when metadata.json does not exist at all."""
patches_root = tmp_path / "cache" / "patches" / "test_project_id"
# No metadata.json created
params = make_params("patch1")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 418μs -> 71.9μs (482% faster)
def test_remove_patch_with_similar_id(tmp_path):
"""Test removing a patch where another patch has a similar but not identical id."""
patches_root = tmp_path / "cache" / "patches" / "test_project_id"
patches_root.mkdir(parents=True, exist_ok=True)
patch_file1 = patches_root / "patch1.patch"
patch_file2 = patches_root / "patch1a.patch"
patch_file1.write_text("patch1 content")
patch_file2.write_text("patch1a content")
metadata = {
"id": "test_project_id",
"patches": [
{"id": "patch1", "patch_path": str(patch_file1)},
{"id": "patch1a", "patch_path": str(patch_file2)},
]
}
meta_file = patches_root / "metadata.json"
meta_file.write_text(json.dumps(metadata, indent=2))
# Remove patch1
params = make_params("patch1")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 416μs -> 69.5μs (499% faster)
data = json.loads(meta_file.read_text())
def test_remove_patch_with_non_string_id(tmp_path):
"""Test removing a patch with a numeric id (should work as long as ids match)."""
patches_root = tmp_path / "cache" / "patches" / "test_project_id"
patches_root.mkdir(parents=True, exist_ok=True)
patch_file = patches_root / "123.patch"
patch_file.write_text("patch content")
metadata = {
"id": "test_project_id",
"patches": [
{"id": 123, "patch_path": str(patch_file)}
]
}
meta_file = patches_root / "metadata.json"
meta_file.write_text(json.dumps(metadata, indent=2))
params = make_params(123)
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 417μs -> 70.3μs (494% faster)
data = json.loads(meta_file.read_text())
def test_remove_patch_when_patch_path_is_missing_in_metadata(tmp_path):
"""Test removing a patch where patch_path is missing in patch metadata."""
patches_root = tmp_path / "cache" / "patches" / "test_project_id"
patches_root.mkdir(parents=True, exist_ok=True)
metadata = {
"id": "test_project_id",
"patches": [
{"id": "patch1"} # missing patch_path
]
}
meta_file = patches_root / "metadata.json"
meta_file.write_text(json.dumps(metadata, indent=2))
params = make_params("patch1")
# Should not raise, but should return success (patch_path is None, so no file is deleted)
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 425μs -> 71.7μs (494% faster)
data = json.loads(meta_file.read_text())
# -------------------------
# Large Scale Test Cases
# -------------------------
def test_remove_patch_from_large_metadata(tmp_path):
"""Test removing a patch from a large metadata file (1000 patches)."""
patches_root = tmp_path / "cache" / "patches" / "test_project_id"
patches_root.mkdir(parents=True, exist_ok=True)
patches = []
patch_files = []
for i in range(1000):
patch_file = patches_root / f"patch{i}.patch"
patch_file.write_text(f"patch {i} content")
patches.append({"id": f"patch{i}", "patch_path": str(patch_file)})
patch_files.append(patch_file)
meta_file = patches_root / "metadata.json"
meta_file.write_text(json.dumps({"id": "test_project_id", "patches": patches}, indent=2))
# Remove patch500
params = make_params("patch500")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 444μs -> 77.9μs (471% faster)
for i, pf in enumerate(patch_files):
if i == 500:
continue
# patch500 removed from metadata
data = json.loads(meta_file.read_text())
ids = [p["id"] for p in data["patches"]]
def test_remove_all_patches_one_by_one(tmp_path):
"""Test removing all patches one by one from a large metadata file."""
patches_root = tmp_path / "cache" / "patches" / "test_project_id"
patches_root.mkdir(parents=True, exist_ok=True)
patches = []
patch_files = []
for i in range(10):
patch_file = patches_root / f"patch{i}.patch"
patch_file.write_text(f"patch {i} content")
patches.append({"id": f"patch{i}", "patch_path": str(patch_file)})
patch_files.append(patch_file)
meta_file = patches_root / "metadata.json"
meta_file.write_text(json.dumps({"id": "test_project_id", "patches": patches}, indent=2))
# Remove all patches one by one
for i in range(10):
params = make_params(f"patch{i}")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 4.07ms -> 521μs (681% faster)
data = json.loads(meta_file.read_text())
ids = [p["id"] for p in data["patches"]]
# After all, metadata should be empty
data = json.loads(meta_file.read_text())
def test_remove_patch_from_large_metadata_file_missing(tmp_path):
"""Test removing a patch from a large metadata file where the patch file is missing."""
patches_root = tmp_path / "cache" / "patches" / "test_project_id"
patches_root.mkdir(parents=True, exist_ok=True)
patches = []
for i in range(100):
patch_file = patches_root / f"patch{i}.patch"
if i == 50:
# Don't create patch50 file
patch_path = patch_file
else:
patch_file.write_text(f"patch {i} content")
patch_path = patch_file
patches.append({"id": f"patch{i}", "patch_path": str(patch_path)})
meta_file = patches_root / "metadata.json"
meta_file.write_text(json.dumps({"id": "test_project_id", "patches": patches}, indent=2))
# Remove patch50 (file missing)
params = make_params("patch50")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 419μs -> 70.1μs (498% faster)
# patch50 not in metadata
data = json.loads(meta_file.read_text())
ids = [p["id"] for p in data["patches"]]
def test_remove_patch_from_large_metadata_not_found(tmp_path):
"""Test removing a patch not present in a large metadata file."""
patches_root = tmp_path / "cache" / "patches" / "test_project_id"
patches_root.mkdir(parents=True, exist_ok=True)
patches = []
for i in range(100):
patch_file = patches_root / f"patch{i}.patch"
patch_file.write_text(f"patch {i} content")
patches.append({"id": f"patch{i}", "patch_path": str(patch_file)})
meta_file = patches_root / "metadata.json"
meta_file.write_text(json.dumps({"id": "test_project_id", "patches": patches}, indent=2))
# Remove patch100 (not present)
params = make_params("patch100")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 419μs -> 70.0μs (500% faster)
# All files remain
for i in range(100):
pass
# Metadata unchanged
data = json.loads(meta_file.read_text())
ids = [p["id"] for p in data["patches"]]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations
import json
import os
import shutil
import tempfile
from functools import lru_cache
from pathlib import Path
from types import SimpleNamespace
from typing import Any
import git
# imports
import pytest # used for our unit tests
from codeflash.code_utils.compat import codeflash_cache_dir
from codeflash.code_utils.git_utils import (get_patches_metadata,
overwrite_patch_metadata)
from codeflash.lsp.beta import on_patch_applied
from codeflash.lsp.server import (CodeflashLanguageServer,
CodeflashLanguageServerProtocol)
from filelock import FileLock
from git import Repo
patches_dir = codeflash_cache_dir / "patches"
server = CodeflashLanguageServer("codeflash-language-server", "v1.0", protocol_cls=CodeflashLanguageServerProtocol)
from codeflash.lsp.beta import on_patch_applied
def write_metadata(tmp_path, patches):
"""Helper to write metadata.json with given patches."""
meta = {
"id": "test_project_id",
"patches": patches
}
patches_dir = tmp_path / "patches" / "test_project_id"
patches_dir.mkdir(parents=True, exist_ok=True)
(patches_dir / "metadata.json").write_text(json.dumps(meta, indent=2))
def read_metadata(tmp_path):
"""Helper to read metadata.json."""
patches_dir = tmp_path / "patches" / "test_project_id"
return json.loads((patches_dir / "metadata.json").read_text())
def make_patch_file(tmp_path, name="patch.diff"):
"""Helper to create a dummy patch file."""
patches_dir = tmp_path / "patches" / "test_project_id"
patch_path = patches_dir / name
patch_path.write_text("patch content")
return str(patch_path)
# ------------------------
# 1. Basic Test Cases
# ------------------------
def test_patch_not_found_returns_error(tmp_path, monkeypatch):
"""
Test that on_patch_applied returns error if patch_id not present.
"""
patches = [
{"id": "patch1", "patch_path": "patch1.diff"}
]
write_metadata(tmp_path, patches)
params = SimpleNamespace(patch_id="patch999")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 430μs -> 72.6μs (493% faster)
# Metadata should be unchanged
meta = read_metadata(tmp_path)
def test_empty_metadata(tmp_path, monkeypatch):
"""
Test when metadata.json exists but has no patches.
"""
write_metadata(tmp_path, [])
params = SimpleNamespace(patch_id="patchX")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 429μs -> 72.1μs (496% faster)
meta = read_metadata(tmp_path)
def test_patch_file_missing_on_disk(tmp_path, monkeypatch):
"""
Test that on_patch_applied succeeds even if the patch file is already deleted.
"""
patch_path = tmp_path / "patches" / "test_project_id" / "patch1.diff"
# Do not create the file
patches = [
{"id": "patch1", "patch_path": str(patch_path)}
]
write_metadata(tmp_path, patches)
params = SimpleNamespace(patch_id="patch1")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 426μs -> 72.5μs (489% faster)
meta = read_metadata(tmp_path)
def test_patch_path_is_empty_string(tmp_path, monkeypatch):
"""
Test that a patch with empty patch_path is removed from metadata and does not error.
"""
patches = [
{"id": "patch1", "patch_path": ""}
]
write_metadata(tmp_path, patches)
params = SimpleNamespace(patch_id="patch1")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 439μs -> 74.0μs (493% faster)
meta = read_metadata(tmp_path)
def test_patch_path_is_none(tmp_path, monkeypatch):
"""
Test that a patch with None patch_path is removed from metadata and does not error.
"""
patches = [
{"id": "patch1", "patch_path": None}
]
write_metadata(tmp_path, patches)
params = SimpleNamespace(patch_id="patch1")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 420μs -> 70.8μs (494% faster)
meta = read_metadata(tmp_path)
def test_patch_path_is_nonexistent_file(tmp_path, monkeypatch):
"""
Test that a patch_path pointing to a file that doesn't exist is handled gracefully.
"""
patch_path = tmp_path / "patches" / "test_project_id" / "nonexistent.diff"
patches = [
{"id": "patch1", "patch_path": str(patch_path)}
]
write_metadata(tmp_path, patches)
params = SimpleNamespace(patch_id="patch1")
codeflash_output = on_patch_applied(None, params); result = codeflash_output # 419μs -> 70.7μs (494% faster)
meta = read_metadata(tmp_path)
# ------------------------
# 3. Large Scale Test Cases
# ------------------------To test or edit this optimization locally git merge codeflash/optimize-pr690-2025-08-27T16.33.55
| overwrite_patch_metadata(new_patches) | |
| # then remove the patch file | |
| if deleted_patch_file: | |
| # then remove the patch file | |
| if deleted_patch_file: | |
| overwrite_patch_metadata(new_patches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 99% (0.99x) speedup for
get_patches_dir_for_projectincodeflash/code_utils/git_utils.py⏱️ Runtime :
88.3 microseconds→44.4 microseconds(best of5runs)📝 Explanation and details
The optimization achieves a 98% speedup by eliminating two unnecessary operations in the
get_patches_dir_for_project()function:Key Changes:
Removed redundant
Path()constructor: The original code unnecessarily wrappedpatches_dir / project_idwithPath(), even though the/operator on aPathobject already returns aPath. This eliminated an extra object creation.Removed unnecessary
or ""fallback: Sinceget_git_project_id()is guaranteed to return a valid string (or raise an exception if the repo doesn't exist), theor ""fallback was redundant and added extra evaluation overhead.Performance Impact:
The line profiler shows the return statement improved from 593,800ns to 232,729ns (61% faster), while the overall function time decreased from 88.3μs to 44.4μs. The optimizations are particularly effective because:
Path()constructor calls have overhead for validation and normalizationor ""operation requires evaluating truthiness of the string resultTest Case Performance:
All test cases show consistent 86-133% speedups, with the optimization being especially beneficial for:
The optimization maintains identical behavior while eliminating unnecessary computational overhead in the path construction logic.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To test or edit this optimization locally
git merge codeflash/optimize-pr690-2025-08-27T15.52.13There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it