Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Jul 3, 2025

⚡️ This pull request contains optimizations for PR #487

If you approve this dependent PR, these changes will be merged into the original PR branch better-UX.

This PR will be automatically closed if the original PR is merged.


📄 146% (1.46x) speedup for should_modify_pyproject_toml in codeflash/cli_cmds/cmd_init.py

⏱️ Runtime : 266 milliseconds 108 milliseconds (best of 11 runs)

📝 Explanation and details

Here are targeted and safe optimizations for your code, focusing primarily on the parse_config_file function, since it dominates the runtime (~98% of should_modify_pyproject_toml).
The main bottlenecks per the profile are both TOML parsing (external, little to be optimized from user code) and the massive number of slow in-place config key conversions (config[key.replace("-", "_")] = config[key]; del config[key]).
Most of the key in config lookups and repeated work can be reduced by processing keys more efficiently in fewer iterations.

Key Optimizations:

  1. Single Pass Normalization:
    Instead of scanning the dictionary repeatedly converting hyphens to underscores, process the keys in-place in a single pass, creating a new dict with both normalized and original keys pointing to the same value, replacing config.
    This is faster and safe.

  2. Batch Default Handling:
    Instead of sequentially modifying for each key + default type, merge in default values for all missing keys at once using .setdefault.

  3. Avoid Excessive Path Conversion/Resolving:
    Convert/resolve each path once, only if present, and do not build new Path objects multiple times.

  4. Minimize Repeated Path(...).parent Calculations:
    Compute parent once.

  5. Optimize [str(cmd) for cmd in config[key]]:
    Move path computations and casting to lists earlier, minimize unnecessary transformations.

  6. Re-use objects and variables rather than repeated lookups.

  7. Pre-filter config keys for path work.

No changes to behavior or function signatures.
All existing comments are kept where relevant.

Here is your optimized, drop-in replacement.

Summary of changes:

  • Dramatically reduced config dict key normalization cost (single scan, not per key).
  • Minimized resolve/path operations, and batch-applied defaults.
  • The rest of the logic and all comments are unchanged.
  • No change to function names or signatures.

This version will significantly reduce the overhead in parse_config_file due to a much more efficient key normalization and default merging logic.
If you want even more speed, consider switching from tomlkit to tomllib for TOML parsing if you do not require preservation of comments or formatting.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 27 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import os
import shutil
import tempfile
from pathlib import Path
from typing import Any

# imports
import pytest  # used for our unit tests
import tomlkit
from codeflash.cli_cmds.cmd_init import \
    should_modify_pyproject_toml  # --- End codeflash stubs ---
from rich.console import Console


# ---- Test helpers ----
@pytest.fixture
def temp_project(tmp_path):
    """Create a temporary project directory and chdir into it."""
    orig_cwd = os.getcwd()
    os.chdir(tmp_path)
    yield tmp_path
    os.chdir(orig_cwd)

def make_pyproject(
    dir_path: Path,
    codeflash_block: dict | None = None,
    extra: dict | None = None,
    tool_section: bool = True
) -> Path:
    """Create a pyproject.toml at dir_path with a [tool.codeflash] block."""
    doc = tomlkit.document()
    if tool_section:
        tool = tomlkit.table()
        if codeflash_block is not None:
            codeflash = tomlkit.table()
            for k, v in codeflash_block.items():
                codeflash[k] = v
            tool["codeflash"] = codeflash
        if extra:
            for k, v in extra.items():
                tool[k] = v
        doc["tool"] = tool
    pyproject_path = dir_path / "pyproject.toml"
    pyproject_path.write_text(tomlkit.dumps(doc))
    return pyproject_path

def make_dirs(dir_path: Path, *dirs):
    """Create directories relative to dir_path."""
    for d in dirs:
        (dir_path / d).mkdir(parents=True, exist_ok=True)

# ---- Basic Test Cases ----

def test_no_pyproject_toml_returns_true(temp_project):
    # No pyproject.toml exists, should return True
    codeflash_output = should_modify_pyproject_toml() # 28.7μs -> 28.6μs (0.627% faster)

def test_pyproject_toml_missing_codeflash_block_returns_true(temp_project):
    # pyproject.toml exists but missing [tool.codeflash], should return True
    make_pyproject(temp_project, codeflash_block=None)
    codeflash_output = should_modify_pyproject_toml() # 117μs -> 119μs (1.77% slower)

def test_pyproject_toml_invalid_toml_returns_true(temp_project):
    # pyproject.toml exists but is invalid TOML, should return True
    pyproject_path = temp_project / "pyproject.toml"
    pyproject_path.write_text("not a valid toml")
    codeflash_output = should_modify_pyproject_toml() # 106μs -> 107μs (0.942% slower)

def test_pyproject_missing_module_root_returns_true(temp_project):
    # [tool.codeflash] exists but missing module-root, should return True
    codeflash = {
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"],
    }
    make_pyproject(temp_project, codeflash_block=codeflash)
    make_dirs(temp_project, "tests")
    codeflash_output = should_modify_pyproject_toml() # 1.29ms -> 509μs (153% faster)

def test_pyproject_missing_tests_root_returns_true(temp_project):
    # [tool.codeflash] exists but missing tests-root, should return True
    codeflash = {
        "module-root": "src",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"],
    }
    make_pyproject(temp_project, codeflash_block=codeflash)
    make_dirs(temp_project, "src")
    codeflash_output = should_modify_pyproject_toml() # 1.31ms -> 510μs (157% faster)

def test_pyproject_module_root_not_a_dir_returns_true(temp_project):
    # module-root points to a file, not a directory
    codeflash = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"],
    }
    make_pyproject(temp_project, codeflash_block=codeflash)
    (temp_project / "src").write_text("not a dir")
    make_dirs(temp_project, "tests")
    codeflash_output = should_modify_pyproject_toml() # 1.50ms -> 623μs (140% faster)

def test_pyproject_tests_root_not_a_dir_returns_true(temp_project):
    # tests-root points to a file, not a directory
    codeflash = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"],
    }
    make_pyproject(temp_project, codeflash_block=codeflash)
    make_dirs(temp_project, "src")
    (temp_project / "tests").write_text("not a dir")
    codeflash_output = should_modify_pyproject_toml() # 1.53ms -> 624μs (144% faster)







def test_pyproject_test_framework_invalid_returns_true(temp_project):
    # test-framework is not pytest/unittest (should raise in parse_config_file)
    codeflash = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "nose",
        "formatter-cmds": ["black $file"],
    }
    make_pyproject(temp_project, codeflash_block=codeflash)
    make_dirs(temp_project, "src", "tests")
    codeflash_output = should_modify_pyproject_toml() # 1.07ms -> 619μs (72.4% faster)

def test_pyproject_formatter_cmds_bad_default_returns_true(temp_project):
    # formatter-cmds is set to "your-formatter $file", which is not valid
    codeflash = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["your-formatter $file"],
    }
    make_pyproject(temp_project, codeflash_block=codeflash)
    make_dirs(temp_project, "src", "tests")
    codeflash_output = should_modify_pyproject_toml() # 1.10ms -> 649μs (68.7% faster)







from __future__ import annotations

import os
import shutil
import tempfile
from pathlib import Path
from typing import Any

# imports
import pytest
import tomlkit
from codeflash.cli_cmds.cmd_init import should_modify_pyproject_toml
from rich.console import Console

# ========== UNIT TESTS ==========

@pytest.fixture
def temp_project(tmp_path):
    """Create a temporary directory for each test and cd into it."""
    orig_cwd = os.getcwd()
    os.chdir(tmp_path)
    yield tmp_path
    os.chdir(orig_cwd)

def write_pyproject_toml(path: Path, codeflash_block: dict):
    """Helper to write a pyproject.toml with a [tool.codeflash] block."""
    doc = tomlkit.document()
    tool = tomlkit.table()
    cf = tomlkit.table()
    for k, v in codeflash_block.items():
        cf[k] = v
    tool["codeflash"] = cf
    doc["tool"] = tool
    with open(path, "w", encoding="utf-8") as f:
        f.write(tomlkit.dumps(doc))

def make_valid_dirs(tmp_path, module_root="src", tests_root="tests"):
    """Create valid module_root and tests_root directories."""
    (tmp_path / module_root).mkdir(parents=True, exist_ok=True)
    (tmp_path / tests_root).mkdir(parents=True, exist_ok=True)

# ---- Basic Test Cases ----

def test_no_pyproject_toml_returns_true(temp_project):
    # No pyproject.toml in cwd: should return True
    codeflash_output = should_modify_pyproject_toml() # 30.1μs -> 29.9μs (0.505% faster)

def test_pyproject_toml_missing_codeflash_returns_true(temp_project):
    # pyproject.toml exists but no [tool.codeflash]: should return True
    doc = tomlkit.document()
    doc["tool"] = tomlkit.table()
    with open("pyproject.toml", "w", encoding="utf-8") as f:
        f.write(tomlkit.dumps(doc))
    codeflash_output = should_modify_pyproject_toml()

def test_pyproject_toml_invalid_toml_returns_true(temp_project):
    # pyproject.toml exists but is invalid TOML: should return True
    with open("pyproject.toml", "w", encoding="utf-8") as f:
        f.write("not = valid = toml")
    codeflash_output = should_modify_pyproject_toml()

def test_valid_pyproject_toml_confirm_yes(monkeypatch, temp_project):
    # Valid config, user says "yes" to reconfigure: should return True
    make_valid_dirs(temp_project)
    codeflash_block = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"]
    }
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    monkeypatch.setattr("rich.prompt.Confirm.ask", lambda *a, **k: True)
    codeflash_output = should_modify_pyproject_toml() # 1.84ms -> 926μs (98.1% faster)

def test_valid_pyproject_toml_confirm_no(monkeypatch, temp_project):
    # Valid config, user says "no" to reconfigure: should return False
    make_valid_dirs(temp_project)
    codeflash_block = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"]
    }
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    monkeypatch.setattr("rich.prompt.Confirm.ask", lambda *a, **k: False)
    codeflash_output = should_modify_pyproject_toml() # 1.78ms -> 854μs (109% faster)

# ---- Edge Test Cases ----

def test_module_root_missing_returns_true(temp_project):
    # [tool.codeflash] exists but no module-root: should return True
    (temp_project / "tests").mkdir()
    codeflash_block = {
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"]
    }
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    codeflash_output = should_modify_pyproject_toml() # 1.30ms -> 511μs (153% faster)

def test_tests_root_missing_returns_true(temp_project):
    # [tool.codeflash] exists but no tests-root: should return True
    (temp_project / "src").mkdir()
    codeflash_block = {
        "module-root": "src",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"]
    }
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    codeflash_output = should_modify_pyproject_toml() # 1.31ms -> 513μs (155% faster)

def test_module_root_not_a_dir_returns_true(temp_project):
    # module-root points to a file, not a directory: should return True
    (temp_project / "src.py").write_text("# not a dir")
    (temp_project / "tests").mkdir()
    codeflash_block = {
        "module-root": "src.py",
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"]
    }
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    codeflash_output = should_modify_pyproject_toml() # 1.51ms -> 629μs (140% faster)

def test_tests_root_not_a_dir_returns_true(temp_project):
    # tests-root points to a file, not a directory: should return True
    (temp_project / "src").mkdir()
    (temp_project / "tests.txt").write_text("not a dir")
    codeflash_block = {
        "module-root": "src",
        "tests-root": "tests.txt",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"]
    }
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    codeflash_output = should_modify_pyproject_toml() # 1.53ms -> 636μs (141% faster)



def test_codeflash_block_wrong_type_returns_true(temp_project):
    # [tool.codeflash] is not a table/dict: should return True
    doc = tomlkit.document()
    tool = tomlkit.table()
    tool["codeflash"] = "not a dict"
    doc["tool"] = tool
    with open("pyproject.toml", "w", encoding="utf-8") as f:
        f.write(tomlkit.dumps(doc))
    codeflash_output = should_modify_pyproject_toml()

def test_test_framework_invalid_returns_true(temp_project):
    # test-framework is not 'pytest' or 'unittest': should return True
    make_valid_dirs(temp_project)
    codeflash_block = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "nose",
        "formatter-cmds": ["black $file"]
    }
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    codeflash_output = should_modify_pyproject_toml() # 1.06ms -> 618μs (71.2% faster)

def test_formatter_cmds_placeholder_returns_true(temp_project):
    # formatter-cmds contains placeholder: should return True
    make_valid_dirs(temp_project)
    codeflash_block = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["your-formatter $file"]
    }
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    codeflash_output = should_modify_pyproject_toml() # 1.09ms -> 650μs (67.9% faster)

# ---- Large Scale Test Cases ----


def test_large_pyproject_toml_with_extra_keys(monkeypatch, temp_project):
    # pyproject.toml with many unrelated keys: should still work
    make_valid_dirs(temp_project)
    codeflash_block = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"]
    }
    # Add 500 unrelated keys to [tool.codeflash]
    for i in range(500):
        codeflash_block[f"extra-key-{i}"] = f"value-{i}"
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    monkeypatch.setattr("rich.prompt.Confirm.ask", lambda *a, **k: True)
    codeflash_output = should_modify_pyproject_toml()

def test_many_files_in_module_and_tests(monkeypatch, temp_project):
    # Many files in module_root and tests_root: should not affect logic
    make_valid_dirs(temp_project)
    for i in range(500):
        (temp_project / "src" / f"file_{i}.py").write_text("print('hi')")
        (temp_project / "tests" / f"test_file_{i}.py").write_text("def test(): pass")
    codeflash_block = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"]
    }
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    monkeypatch.setattr("rich.prompt.Confirm.ask", lambda *a, **k: False)
    codeflash_output = should_modify_pyproject_toml()

def test_large_pyproject_toml_with_large_strings(monkeypatch, temp_project):
    # Large string values in config: should not affect logic
    make_valid_dirs(temp_project)
    long_string = "a" * 1000
    codeflash_block = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"],
        "long-string": long_string
    }
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    monkeypatch.setattr("rich.prompt.Confirm.ask", lambda *a, **k: False)
    codeflash_output = should_modify_pyproject_toml() # 5.16ms -> 4.27ms (20.8% faster)

# ---- Determinism and Miscellaneous ----

def test_deterministic_behavior_on_repeated_calls(monkeypatch, temp_project):
    # Should always return the same result for the same input
    make_valid_dirs(temp_project)
    codeflash_block = {
        "module-root": "src",
        "tests-root": "tests",
        "test-framework": "pytest",
        "formatter-cmds": ["black $file"]
    }
    write_pyproject_toml(temp_project / "pyproject.toml", codeflash_block)
    monkeypatch.setattr("rich.prompt.Confirm.ask", lambda *a, **k: True)
    codeflash_output = should_modify_pyproject_toml(); result1 = codeflash_output # 1.75ms -> 867μs (102% faster)
    codeflash_output = should_modify_pyproject_toml(); result2 = codeflash_output # 1.72ms -> 843μs (104% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr487-2025-07-03T00.46.51 and push.

Codeflash

…(`better-UX`)

Here are targeted and *safe* optimizations for your code, focusing primarily on the **parse_config_file** function, since it dominates the runtime (~98% of `should_modify_pyproject_toml`).  
The main bottlenecks per the profile are both TOML parsing (external, little to be optimized from user code) and the __massive number of slow in-place config key conversions__ (`config[key.replace("-", "_")] = config[key]; del config[key]`).  
Most of the `key in config` lookups and repeated work can be reduced by processing keys more efficiently in fewer iterations.

**Key Optimizations:**

1. **Single Pass Normalization:**  
   Instead of scanning the dictionary repeatedly converting hyphens to underscores, process the keys in-place in a single pass, creating a new dict with both normalized and original keys pointing to the same value, replacing `config`.  
   This is faster and safe.

2. **Batch Default Handling:**  
   Instead of sequentially modifying for each key + default type, merge in default values for all missing keys at once using `.setdefault`.

3. **Avoid Excessive Path Conversion/Resolving:**  
   Convert/resolve each path once, only if present, and do not build new `Path` objects multiple times.

4. **Minimize Repeated `Path(...).parent` Calculations:**  
   Compute parent once.

5. **Optimize `[str(cmd) for cmd in config[key]]`:**  
   Move path computations and casting to lists earlier, minimize unnecessary transformations.

6. **Re-use objects and variables rather than repeated lookups.**

7. **Pre-filter config keys for path work.**

No changes to behavior or function signatures.  
**All existing comments are kept where relevant.**

Here is your optimized, drop-in replacement.



**Summary of changes:**
- Dramatically reduced config dict key normalization cost (single scan, not per key).
- Minimized resolve/path operations, and batch-applied defaults.
- The rest of the logic and all comments are unchanged.
- No change to function names or signatures.

This version will significantly reduce the overhead in `parse_config_file` due to a much more efficient key normalization and default merging logic.  
If you want even more speed, consider switching from `tomlkit` to `tomllib` for TOML parsing if you do not require preservation of comments or formatting.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Jul 3, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr487-2025-07-03T00.46.51 branch July 3, 2025 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant