codeflash-ai
diff --git a/‎WARP.md‎
Lines changed: 191 additions & 0 deletions b/‎WARP.md‎
Lines changed: 191 additions & 0 deletions
diff --git a/‎codeflash-benchmark/codeflash_benchmark/plugin.py‎
Lines changed: 38 additions & 5 deletions b/‎codeflash-benchmark/codeflash_benchmark/plugin.py‎
Lines changed: 38 additions & 5 deletions
diff --git a/‎codeflash-benchmark/pyproject.toml‎
Lines changed: 3 additions & 3 deletions b/‎codeflash-benchmark/pyproject.toml‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎codeflash/LICENSE‎
Lines changed: 2 additions & 2 deletions b/‎codeflash/LICENSE‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎codeflash/api/aiservice.py‎
Lines changed: 6 additions & 4 deletions b/‎codeflash/api/aiservice.py‎
Lines changed: 6 additions & 4 deletions
@@ -0,0 +1,191 @@
+# WARP.md
+
+This file provides guidance to WARP (warp.dev) when working with code in this repository.
+
+## Project Overview
+
+Codeflash is a general-purpose optimizer for Python that helps improve code performance while maintaining correctness. It uses advanced LLMs to generate optimization ideas, tests them for correctness, and benchmarks them for performance, then creates merge-ready pull requests.
+
+## Development Environment Setup
+
+### Prerequisites
+- Python 3.9+ (project uses uv for dependency management)
+- Git (for version control and PR creation)
+- Codeflash API key (for AI services)
+
+### Initial Setup
+```bash
+# Install dependencies using uv (preferred over pip)
+uv sync
+
+# Initialize codeflash configuration
+uv run codeflash init
+```
+
+## Core Development Commands
+
+### Code Quality & Linting
+```bash
+# Format code with ruff (includes check and format)
+uv run ruff check --fix codeflash/
+uv run ruff format codeflash/
+
+# Type checking with mypy
+uv run mypy codeflash/
+
+# Pre-commit hooks (ruff check + format)
+uv run pre-commit run --all-files
+```
+
+### Testing
+```bash
+# Run all tests
+uv run pytest
+
+# Run specific test file
+uv run pytest tests/test_specific_file.py
+
+# Run tests matching pattern
+uv run pytest -k "pattern"
+
+```
+
+### Running Codeflash
+```bash
+# Optimize entire codebase
+uv run codeflash --all
+
+# Optimize specific file
+uv run codeflash --file path/to/file.py
+
+# Optimize specific function
+uv run codeflash --function "module.function"
+
+# Optimize a script end-to-end
+uv run codeflash optimize script.py
+
+# Run with benchmarking
+uv run codeflash --benchmark
+
+# Verify setup
+uv run codeflash --verify-setup
+```
+
+## Architecture Overview
+
+### Main Components
+
+**Core Modules:**
+- `codeflash/main.py` - CLI entry point and command coordination
+- `codeflash/cli_cmds/` - Command-line interface implementations
+- `codeflash/optimization/` - Core optimization engine and algorithms
+- `codeflash/verification/` - Code correctness verification
+- `codeflash/benchmarking/` - Performance measurement and comparison
+- `codeflash/discovery/` - Code analysis and function discovery
+- `codeflash/tracing/` - Runtime tracing and profiling
+- `codeflash/context/` - Code context extraction and analysis
+- `codeflash/result/` - Result processing, PR creation, and explanations
+
+**Supporting Systems:**
+- `codeflash/api/` - Backend API communication
+- `codeflash/github/` - GitHub integration for PR creation
+- `codeflash/models/` - Data models and schemas
+- `codeflash/telemetry/` - Analytics and error reporting
+- `codeflash/code_utils/` - Code parsing, formatting, and manipulation utilities
+
+### Key Workflows
+
+1. **Code Discovery**: Analyzes codebase to identify optimization candidates
+2. **Context Extraction**: Extracts relevant code context and dependencies
+3. **Optimization Generation**: Uses LLMs to generate optimization candidates
+4. **Verification**: Tests optimizations for correctness using existing tests
+5. **Benchmarking**: Measures performance improvements
+6. **Result Processing**: Creates explanations and pull requests
+
+### Configuration
+
+Configuration is stored in `pyproject.toml` under `[tool.codeflash]`:
+- `module-root` - Source code location (default: "codeflash")
+- `tests-root` - Test location (default: "tests") 
+- `benchmarks-root` - Benchmark location (default: "tests/benchmarks")
+- `test-framework` - Testing framework ("pytest" or "unittest")
+- `formatter-cmds` - Commands for code formatting
+
+## Project Structure
+
+```
+codeflash/
+├── api/                 # Backend API communication
+├── benchmarking/        # Performance measurement
+├── cli_cmds/           # CLI command implementations
+├── code_utils/         # Code analysis and manipulation
+├── context/            # Code context extraction
+├── discovery/          # Function and test discovery  
+├── github/             # GitHub API integration
+├── lsp/                # Language server protocol support
+├── models/             # Data models and schemas
+├── optimization/       # Core optimization engine
+├── result/             # Result processing and PR creation
+├── telemetry/          # Analytics and monitoring
+├── tracing/            # Runtime tracing and profiling
+├── verification/       # Correctness verification
+└── main.py            # CLI entry point
+
+tests/                  # Test suite
+├── benchmarks/         # Performance benchmarks
+└── scripts/           # Test utilities
+
+docs/                   # Documentation
+code_to_optimize/       # Example code for optimization
+codeflash-benchmark/    # Benchmark workspace member
+```
+
+## Development Notes
+
+### Code Style
+- Uses ruff for linting and formatting (configured in pyproject.toml)
+- Strict mypy type checking enabled
+- Pre-commit hooks enforce code quality
+
+### Testing
+- pytest-based test suite with extensive coverage
+- Parameterized tests for multiple scenarios
+- Benchmarking tests for performance validation
+- Test discovery supports both pytest and unittest frameworks
+
+### Workspace Structure
+- Uses uv workspace with `codeflash-benchmark` as a member
+- Dependencies managed through uv.lock
+- Dynamic versioning from git tags using uv-dynamic-versioning
+
+### Build & Distribution
+- Uses hatchling as build backend
+- BSL-1.1 license
+- Excludes development files from distribution packages
+
+### CI/CD Integration
+- GitHub Actions workflow for automatic optimization of PR code
+- Pre-commit hooks for code quality enforcement
+- Automated testing and benchmarking
+
+## Important Patterns
+
+### Error Handling
+- Uses `either.py` for functional error handling patterns
+- Comprehensive error tracking through Sentry integration
+- Graceful degradation when AI services are unavailable
+
+### Instrumentation
+- Extensive tracing capabilities for performance analysis
+- Line profiler integration for detailed performance metrics
+- Custom tracer implementation for code execution analysis
+
+### AI Integration
+- Structured prompts and response handling for LLM interactions
+- Critic module for evaluating optimization quality
+- Context-aware code generation and explanation
+
+### Git Integration 
+- GitPython for repository operations
+- Automated PR creation with detailed explanations
+- Branch management for optimization experiments
@@ -8,21 +8,54 @@
 
 PYTEST_BENCHMARK_INSTALLED = importlib.util.find_spec("pytest_benchmark") is not None
 
+benchmark_options = [
+    ("--benchmark-columns", "store", None, "Benchmark columns"),
+    ("--benchmark-group-by", "store", None, "Benchmark group by"),
+    ("--benchmark-name", "store", None, "Benchmark name pattern"),
+    ("--benchmark-sort", "store", None, "Benchmark sort column"),
+    ("--benchmark-json", "store", None, "Benchmark JSON output file"),
+    ("--benchmark-save", "store", None, "Benchmark save name"),
+    ("--benchmark-warmup", "store", None, "Benchmark warmup"),
+    ("--benchmark-warmup-iterations", "store", None, "Benchmark warmup iterations"),
+    ("--benchmark-min-time", "store", None, "Benchmark minimum time"),
+    ("--benchmark-max-time", "store", None, "Benchmark maximum time"),
+    ("--benchmark-min-rounds", "store", None, "Benchmark minimum rounds"),
+    ("--benchmark-timer", "store", None, "Benchmark timer"),
+    ("--benchmark-calibration-precision", "store", None, "Benchmark calibration precision"),
+    ("--benchmark-disable", "store_true", False, "Disable benchmarks"),
+    ("--benchmark-skip", "store_true", False, "Skip benchmarks"),
+    ("--benchmark-only", "store_true", False, "Only run benchmarks"),
+    ("--benchmark-verbose", "store_true", False, "Verbose benchmark output"),
+    ("--benchmark-histogram", "store", None, "Benchmark histogram"),
+    ("--benchmark-compare", "store", None, "Benchmark compare"),
+    ("--benchmark-compare-fail", "store", None, "Benchmark compare fail threshold"),
+]
+
 
 def pytest_configure(config: pytest.Config) -> None:
     """Register the benchmark marker and disable conflicting plugins."""
     config.addinivalue_line("markers", "benchmark: mark test as a benchmark that should be run with codeflash tracing")
 
-    if config.getoption("--codeflash-trace") and PYTEST_BENCHMARK_INSTALLED:
-        config.option.benchmark_disable = True
-        config.pluginmanager.set_blocked("pytest_benchmark")
-        config.pluginmanager.set_blocked("pytest-benchmark")
+    if config.getoption("--codeflash-trace"):
+        # When --codeflash-trace is used, ignore all benchmark options by resetting them to defaults
+        for option, _, default, _ in benchmark_options:
+            option_name = option.replace("--", "").replace("-", "_")
+            if hasattr(config.option, option_name):
+                setattr(config.option, option_name, default)
+
+        if PYTEST_BENCHMARK_INSTALLED:
+            config.pluginmanager.set_blocked("pytest_benchmark")
+            config.pluginmanager.set_blocked("pytest-benchmark")
 
 
 def pytest_addoption(parser: pytest.Parser) -> None:
     parser.addoption(
         "--codeflash-trace", action="store_true", default=False, help="Enable CodeFlash tracing for benchmarks"
     )
+    # These options are ignored when --codeflash-trace is used
+    for option, action, default, help_text in benchmark_options:
+        help_suffix = " (ignored when --codeflash-trace is used)"
+        parser.addoption(option, action=action, default=default, help=help_text + help_suffix)
 
 
 @pytest.fixture
@@ -37,7 +70,7 @@ def benchmark(request: pytest.FixtureRequest) -> object:
     # If pytest-benchmark is installed and --codeflash-trace is not enabled,
     # return the normal pytest-benchmark fixture
     if PYTEST_BENCHMARK_INSTALLED:
-        from pytest_benchmark.fixture import BenchmarkFixture as BSF  # noqa: N814
+        from pytest_benchmark.fixture import BenchmarkFixture as BSF  # pyright: ignore[reportMissingImports]  # noqa: I001, N814
 
         bs = getattr(config, "_benchmarksession", None)
         if bs and bs.skip:
 
@@ -1,6 +1,6 @@
 [project]
 name = "codeflash-benchmark"
-version = "0.1.0"
+version = "0.2.0"
 description = "Pytest benchmarking plugin for codeflash.ai - automatic code performance optimization"
 authors = [{ name = "CodeFlash Inc.", email = "[email protected]" }]
 requires-python = ">=3.9"
@@ -25,8 +25,8 @@ Repository = "https://github.com/codeflash-ai/codeflash-benchmark"
 codeflash-benchmark = "codeflash_benchmark.plugin"
 
 [build-system]
-requires = ["setuptools>=45", "wheel", "setuptools_scm"]
+requires = ["setuptools>=45", "wheel"]
 build-backend = "setuptools.build_meta"
 
 [tool.setuptools]
-packages = ["codeflash_benchmark"]
+packages = ["codeflash_benchmark"]
@@ -3,7 +3,7 @@ Business Source License 1.1
 Parameters
 
 Licensor:             CodeFlash Inc.
-Licensed Work:        Codeflash Client version 0.15.x
+Licensed Work:        Codeflash Client version 0.16.x
                       The Licensed Work is (c) 2024 CodeFlash Inc.
 
 Additional Use Grant: None. Production use of the Licensed Work is only permitted
@@ -13,7 +13,7 @@ Additional Use Grant: None. Production use of the Licensed Work is only permitte
                       Platform. Please visit codeflash.ai for further
                       information.
 
-Change Date:          2029-07-03
+Change Date:          2029-08-14
 
 Change License:       MIT
 
 
@@ -202,7 +202,7 @@ def optimize_python_code_line_profiler(  # noqa: D417
 
         if response.status_code == 200:
             optimizations_json = response.json()["optimizations"]
-            logger.info(f"Generated {len(optimizations_json)} candidate optimizations.")
+            logger.info(f"Generated {len(optimizations_json)} candidate optimizations using line profiler information.")
             console.rule()
             return [
                 OptimizedCandidate(
@@ -248,7 +248,7 @@ def optimize_python_code_refinement(self, request: list[AIServiceRefinerRequest]
             }
             for opt in request
         ]
-        logger.info(f"Refining {len(request)} optimizations…")
+        logger.debug(f"Refining {len(request)} optimizations…")
         console.rule()
         try:
             response = self.make_ai_service_request("/refinement", payload=payload, timeout=600)
@@ -259,7 +259,7 @@ def optimize_python_code_refinement(self, request: list[AIServiceRefinerRequest]
 
         if response.status_code == 200:
             refined_optimizations = response.json()["refinements"]
-            logger.info(f"Generated {len(refined_optimizations)} candidate refinements.")
+            logger.debug(f"Generated {len(refined_optimizations)} candidate refinements.")
             console.rule()
             return [
                 OptimizedCandidate(
@@ -339,7 +339,6 @@ def get_new_explanation(  # noqa: D417
 
         if response.status_code == 200:
             explanation: str = response.json()["explanation"]
-            logger.debug(f"New Explanation: {explanation}")
             console.rule()
             return explanation
         try:
@@ -360,6 +359,7 @@ def log_results(  # noqa: D417
         is_correct: dict[str, bool] | None,
         optimized_line_profiler_results: dict[str, str] | None,
         metadata: dict[str, Any] | None,
+        optimizations_post: dict[str, str] | None = None,
     ) -> None:
         """Log features to the database.
 
@@ -372,6 +372,7 @@ def log_results(  # noqa: D417
         - is_correct (Optional[Dict[str, bool]]): Whether the optimized code is correct.
         - optimized_line_profiler_results: line_profiler results for every candidate mapped to their optimization_id
         - metadata: contains the best optimization id
+        - optimizations_post - dict mapping opt id to code str after postprocessing
 
         """
         payload = {
@@ -383,6 +384,7 @@ def log_results(  # noqa: D417
             "codeflash_version": codeflash_version,
             "optimized_line_profiler_results": optimized_line_profiler_results,
             "metadata": metadata,
+            "optimizations_post": optimizations_post,
         }
         try:
             self.make_ai_service_request("/log_features", payload=payload, timeout=5)