Skip to content

Commit 03ee7a4

Browse files
committed
2 parents 5b4d4fb + 53ecd83 commit 03ee7a4

File tree

17 files changed

+1150
-345
lines changed

17 files changed

+1150
-345
lines changed

WARP.md

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# WARP.md
2+
3+
This file provides guidance to WARP (warp.dev) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
Codeflash is a general-purpose optimizer for Python that helps improve code performance while maintaining correctness. It uses advanced LLMs to generate optimization ideas, tests them for correctness, and benchmarks them for performance, then creates merge-ready pull requests.
8+
9+
## Development Environment Setup
10+
11+
### Prerequisites
12+
- Python 3.9+ (project uses uv for dependency management)
13+
- Git (for version control and PR creation)
14+
- Codeflash API key (for AI services)
15+
16+
### Initial Setup
17+
```bash
18+
# Install dependencies using uv (preferred over pip)
19+
uv sync
20+
21+
# Initialize codeflash configuration
22+
uv run codeflash init
23+
```
24+
25+
## Core Development Commands
26+
27+
### Code Quality & Linting
28+
```bash
29+
# Format code with ruff (includes check and format)
30+
uv run ruff check --fix codeflash/
31+
uv run ruff format codeflash/
32+
33+
# Type checking with mypy
34+
uv run mypy codeflash/
35+
36+
# Pre-commit hooks (ruff check + format)
37+
uv run pre-commit run --all-files
38+
```
39+
40+
### Testing
41+
```bash
42+
# Run all tests
43+
uv run pytest
44+
45+
# Run specific test file
46+
uv run pytest tests/test_specific_file.py
47+
48+
# Run tests matching pattern
49+
uv run pytest -k "pattern"
50+
51+
```
52+
53+
### Running Codeflash
54+
```bash
55+
# Optimize entire codebase
56+
uv run codeflash --all
57+
58+
# Optimize specific file
59+
uv run codeflash --file path/to/file.py
60+
61+
# Optimize specific function
62+
uv run codeflash --function "module.function"
63+
64+
# Optimize a script end-to-end
65+
uv run codeflash optimize script.py
66+
67+
# Run with benchmarking
68+
uv run codeflash --benchmark
69+
70+
# Verify setup
71+
uv run codeflash --verify-setup
72+
```
73+
74+
## Architecture Overview
75+
76+
### Main Components
77+
78+
**Core Modules:**
79+
- `codeflash/main.py` - CLI entry point and command coordination
80+
- `codeflash/cli_cmds/` - Command-line interface implementations
81+
- `codeflash/optimization/` - Core optimization engine and algorithms
82+
- `codeflash/verification/` - Code correctness verification
83+
- `codeflash/benchmarking/` - Performance measurement and comparison
84+
- `codeflash/discovery/` - Code analysis and function discovery
85+
- `codeflash/tracing/` - Runtime tracing and profiling
86+
- `codeflash/context/` - Code context extraction and analysis
87+
- `codeflash/result/` - Result processing, PR creation, and explanations
88+
89+
**Supporting Systems:**
90+
- `codeflash/api/` - Backend API communication
91+
- `codeflash/github/` - GitHub integration for PR creation
92+
- `codeflash/models/` - Data models and schemas
93+
- `codeflash/telemetry/` - Analytics and error reporting
94+
- `codeflash/code_utils/` - Code parsing, formatting, and manipulation utilities
95+
96+
### Key Workflows
97+
98+
1. **Code Discovery**: Analyzes codebase to identify optimization candidates
99+
2. **Context Extraction**: Extracts relevant code context and dependencies
100+
3. **Optimization Generation**: Uses LLMs to generate optimization candidates
101+
4. **Verification**: Tests optimizations for correctness using existing tests
102+
5. **Benchmarking**: Measures performance improvements
103+
6. **Result Processing**: Creates explanations and pull requests
104+
105+
### Configuration
106+
107+
Configuration is stored in `pyproject.toml` under `[tool.codeflash]`:
108+
- `module-root` - Source code location (default: "codeflash")
109+
- `tests-root` - Test location (default: "tests")
110+
- `benchmarks-root` - Benchmark location (default: "tests/benchmarks")
111+
- `test-framework` - Testing framework ("pytest" or "unittest")
112+
- `formatter-cmds` - Commands for code formatting
113+
114+
## Project Structure
115+
116+
```
117+
codeflash/
118+
├── api/ # Backend API communication
119+
├── benchmarking/ # Performance measurement
120+
├── cli_cmds/ # CLI command implementations
121+
├── code_utils/ # Code analysis and manipulation
122+
├── context/ # Code context extraction
123+
├── discovery/ # Function and test discovery
124+
├── github/ # GitHub API integration
125+
├── lsp/ # Language server protocol support
126+
├── models/ # Data models and schemas
127+
├── optimization/ # Core optimization engine
128+
├── result/ # Result processing and PR creation
129+
├── telemetry/ # Analytics and monitoring
130+
├── tracing/ # Runtime tracing and profiling
131+
├── verification/ # Correctness verification
132+
└── main.py # CLI entry point
133+
134+
tests/ # Test suite
135+
├── benchmarks/ # Performance benchmarks
136+
└── scripts/ # Test utilities
137+
138+
docs/ # Documentation
139+
code_to_optimize/ # Example code for optimization
140+
codeflash-benchmark/ # Benchmark workspace member
141+
```
142+
143+
## Development Notes
144+
145+
### Code Style
146+
- Uses ruff for linting and formatting (configured in pyproject.toml)
147+
- Strict mypy type checking enabled
148+
- Pre-commit hooks enforce code quality
149+
150+
### Testing
151+
- pytest-based test suite with extensive coverage
152+
- Parameterized tests for multiple scenarios
153+
- Benchmarking tests for performance validation
154+
- Test discovery supports both pytest and unittest frameworks
155+
156+
### Workspace Structure
157+
- Uses uv workspace with `codeflash-benchmark` as a member
158+
- Dependencies managed through uv.lock
159+
- Dynamic versioning from git tags using uv-dynamic-versioning
160+
161+
### Build & Distribution
162+
- Uses hatchling as build backend
163+
- BSL-1.1 license
164+
- Excludes development files from distribution packages
165+
166+
### CI/CD Integration
167+
- GitHub Actions workflow for automatic optimization of PR code
168+
- Pre-commit hooks for code quality enforcement
169+
- Automated testing and benchmarking
170+
171+
## Important Patterns
172+
173+
### Error Handling
174+
- Uses `either.py` for functional error handling patterns
175+
- Comprehensive error tracking through Sentry integration
176+
- Graceful degradation when AI services are unavailable
177+
178+
### Instrumentation
179+
- Extensive tracing capabilities for performance analysis
180+
- Line profiler integration for detailed performance metrics
181+
- Custom tracer implementation for code execution analysis
182+
183+
### AI Integration
184+
- Structured prompts and response handling for LLM interactions
185+
- Critic module for evaluating optimization quality
186+
- Context-aware code generation and explanation
187+
188+
### Git Integration
189+
- GitPython for repository operations
190+
- Automated PR creation with detailed explanations
191+
- Branch management for optimization experiments

codeflash-benchmark/codeflash_benchmark/plugin.py

Lines changed: 38 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,21 +8,54 @@
88

99
PYTEST_BENCHMARK_INSTALLED = importlib.util.find_spec("pytest_benchmark") is not None
1010

11+
benchmark_options = [
12+
("--benchmark-columns", "store", None, "Benchmark columns"),
13+
("--benchmark-group-by", "store", None, "Benchmark group by"),
14+
("--benchmark-name", "store", None, "Benchmark name pattern"),
15+
("--benchmark-sort", "store", None, "Benchmark sort column"),
16+
("--benchmark-json", "store", None, "Benchmark JSON output file"),
17+
("--benchmark-save", "store", None, "Benchmark save name"),
18+
("--benchmark-warmup", "store", None, "Benchmark warmup"),
19+
("--benchmark-warmup-iterations", "store", None, "Benchmark warmup iterations"),
20+
("--benchmark-min-time", "store", None, "Benchmark minimum time"),
21+
("--benchmark-max-time", "store", None, "Benchmark maximum time"),
22+
("--benchmark-min-rounds", "store", None, "Benchmark minimum rounds"),
23+
("--benchmark-timer", "store", None, "Benchmark timer"),
24+
("--benchmark-calibration-precision", "store", None, "Benchmark calibration precision"),
25+
("--benchmark-disable", "store_true", False, "Disable benchmarks"),
26+
("--benchmark-skip", "store_true", False, "Skip benchmarks"),
27+
("--benchmark-only", "store_true", False, "Only run benchmarks"),
28+
("--benchmark-verbose", "store_true", False, "Verbose benchmark output"),
29+
("--benchmark-histogram", "store", None, "Benchmark histogram"),
30+
("--benchmark-compare", "store", None, "Benchmark compare"),
31+
("--benchmark-compare-fail", "store", None, "Benchmark compare fail threshold"),
32+
]
33+
1134

1235
def pytest_configure(config: pytest.Config) -> None:
1336
"""Register the benchmark marker and disable conflicting plugins."""
1437
config.addinivalue_line("markers", "benchmark: mark test as a benchmark that should be run with codeflash tracing")
1538

16-
if config.getoption("--codeflash-trace") and PYTEST_BENCHMARK_INSTALLED:
17-
config.option.benchmark_disable = True
18-
config.pluginmanager.set_blocked("pytest_benchmark")
19-
config.pluginmanager.set_blocked("pytest-benchmark")
39+
if config.getoption("--codeflash-trace"):
40+
# When --codeflash-trace is used, ignore all benchmark options by resetting them to defaults
41+
for option, _, default, _ in benchmark_options:
42+
option_name = option.replace("--", "").replace("-", "_")
43+
if hasattr(config.option, option_name):
44+
setattr(config.option, option_name, default)
45+
46+
if PYTEST_BENCHMARK_INSTALLED:
47+
config.pluginmanager.set_blocked("pytest_benchmark")
48+
config.pluginmanager.set_blocked("pytest-benchmark")
2049

2150

2251
def pytest_addoption(parser: pytest.Parser) -> None:
2352
parser.addoption(
2453
"--codeflash-trace", action="store_true", default=False, help="Enable CodeFlash tracing for benchmarks"
2554
)
55+
# These options are ignored when --codeflash-trace is used
56+
for option, action, default, help_text in benchmark_options:
57+
help_suffix = " (ignored when --codeflash-trace is used)"
58+
parser.addoption(option, action=action, default=default, help=help_text + help_suffix)
2659

2760

2861
@pytest.fixture
@@ -37,7 +70,7 @@ def benchmark(request: pytest.FixtureRequest) -> object:
3770
# If pytest-benchmark is installed and --codeflash-trace is not enabled,
3871
# return the normal pytest-benchmark fixture
3972
if PYTEST_BENCHMARK_INSTALLED:
40-
from pytest_benchmark.fixture import BenchmarkFixture as BSF # noqa: N814
73+
from pytest_benchmark.fixture import BenchmarkFixture as BSF # pyright: ignore[reportMissingImports] # noqa: I001, N814
4174

4275
bs = getattr(config, "_benchmarksession", None)
4376
if bs and bs.skip:

codeflash-benchmark/pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "codeflash-benchmark"
3-
version = "0.1.0"
3+
version = "0.2.0"
44
description = "Pytest benchmarking plugin for codeflash.ai - automatic code performance optimization"
55
authors = [{ name = "CodeFlash Inc.", email = "[email protected]" }]
66
requires-python = ">=3.9"
@@ -25,8 +25,8 @@ Repository = "https://github.com/codeflash-ai/codeflash-benchmark"
2525
codeflash-benchmark = "codeflash_benchmark.plugin"
2626

2727
[build-system]
28-
requires = ["setuptools>=45", "wheel", "setuptools_scm"]
28+
requires = ["setuptools>=45", "wheel"]
2929
build-backend = "setuptools.build_meta"
3030

3131
[tool.setuptools]
32-
packages = ["codeflash_benchmark"]
32+
packages = ["codeflash_benchmark"]

codeflash/LICENSE

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Business Source License 1.1
33
Parameters
44

55
Licensor: CodeFlash Inc.
6-
Licensed Work: Codeflash Client version 0.15.x
6+
Licensed Work: Codeflash Client version 0.16.x
77
The Licensed Work is (c) 2024 CodeFlash Inc.
88

99
Additional Use Grant: None. Production use of the Licensed Work is only permitted
@@ -13,7 +13,7 @@ Additional Use Grant: None. Production use of the Licensed Work is only permitte
1313
Platform. Please visit codeflash.ai for further
1414
information.
1515

16-
Change Date: 2029-07-03
16+
Change Date: 2029-08-14
1717

1818
Change License: MIT
1919

codeflash/api/aiservice.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ def optimize_python_code_line_profiler( # noqa: D417
202202

203203
if response.status_code == 200:
204204
optimizations_json = response.json()["optimizations"]
205-
logger.info(f"Generated {len(optimizations_json)} candidate optimizations.")
205+
logger.info(f"Generated {len(optimizations_json)} candidate optimizations using line profiler information.")
206206
console.rule()
207207
return [
208208
OptimizedCandidate(
@@ -248,7 +248,7 @@ def optimize_python_code_refinement(self, request: list[AIServiceRefinerRequest]
248248
}
249249
for opt in request
250250
]
251-
logger.info(f"Refining {len(request)} optimizations…")
251+
logger.debug(f"Refining {len(request)} optimizations…")
252252
console.rule()
253253
try:
254254
response = self.make_ai_service_request("/refinement", payload=payload, timeout=600)
@@ -259,7 +259,7 @@ def optimize_python_code_refinement(self, request: list[AIServiceRefinerRequest]
259259

260260
if response.status_code == 200:
261261
refined_optimizations = response.json()["refinements"]
262-
logger.info(f"Generated {len(refined_optimizations)} candidate refinements.")
262+
logger.debug(f"Generated {len(refined_optimizations)} candidate refinements.")
263263
console.rule()
264264
return [
265265
OptimizedCandidate(
@@ -339,7 +339,6 @@ def get_new_explanation( # noqa: D417
339339

340340
if response.status_code == 200:
341341
explanation: str = response.json()["explanation"]
342-
logger.debug(f"New Explanation: {explanation}")
343342
console.rule()
344343
return explanation
345344
try:
@@ -360,6 +359,7 @@ def log_results( # noqa: D417
360359
is_correct: dict[str, bool] | None,
361360
optimized_line_profiler_results: dict[str, str] | None,
362361
metadata: dict[str, Any] | None,
362+
optimizations_post: dict[str, str] | None = None,
363363
) -> None:
364364
"""Log features to the database.
365365
@@ -372,6 +372,7 @@ def log_results( # noqa: D417
372372
- is_correct (Optional[Dict[str, bool]]): Whether the optimized code is correct.
373373
- optimized_line_profiler_results: line_profiler results for every candidate mapped to their optimization_id
374374
- metadata: contains the best optimization id
375+
- optimizations_post - dict mapping opt id to code str after postprocessing
375376
376377
"""
377378
payload = {
@@ -383,6 +384,7 @@ def log_results( # noqa: D417
383384
"codeflash_version": codeflash_version,
384385
"optimized_line_profiler_results": optimized_line_profiler_results,
385386
"metadata": metadata,
387+
"optimizations_post": optimizations_post,
386388
}
387389
try:
388390
self.make_ai_service_request("/log_features", payload=payload, timeout=5)

0 commit comments

Comments
 (0)