Skip to content

Conversation

@KRRT7
Copy link
Contributor

@KRRT7 KRRT7 commented Sep 23, 2025

PR Type

Enhancement, Bug fix, Tests


Description

  • Add SQLite-backed tests cache with memory layer

  • Return replay tests count in discovery APIs

  • Improve robustness of pytest import error handling

  • Broaden path typing and hashing utilities


Diagram Walkthrough

flowchart LR
  A["discover_tests_* callers"] -- "call" --> B["discover_tests_pytest/unittest"]
  B -- "process" --> C["process_test_files"]
  C -- "check" --> D["TestsCache.get_tests_for_file"]
  D -- "hit" --> E["Rebuild function_to_test_map from cache"]
  D -- "miss" --> F["Jedi analysis + insert cache rows"]
  B -- "return" --> G["(map, total, replay_total)"]
Loading

File Walkthrough

Relevant files
Enhancement
code_utils.py
Broaden tmp file helper to accept strings                               

codeflash/code_utils/code_utils.py

  • Accept str input in get_run_tmp_file
  • Normalize to Path internally
+3/-1     
discover_unit_tests.py
Add persistent caching and extend discovery outputs           

codeflash/discovery/discover_unit_tests.py

  • Introduce TestsCache memory/db cache and API changes
  • Change discovery functions to return replay count
  • Cache lookup and insert during test processing
  • Safer pytest import error extraction and type tweaks
+71/-28 

@github-actions
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

AttributeError Bug

The in-memory cache attribute was renamed to memory_cache, but lookups still reference _memory_cache, which will raise AttributeError and disable caching. Align all references to the same attribute.

def get_tests_for_file(self, file_path: str, file_hash: str) -> list[FunctionCalledInTest] | None:
    cache_key = (file_path, file_hash)
    if cache_key in self._memory_cache:
        return self.memory_cache[cache_key]

    self.cur.execute("SELECT * FROM discovered_tests WHERE file_path = ? AND file_hash = ?", (file_path, file_hash))
    rows = self.cur.fetchall()
    if not rows:
        return None

    result = [
        FunctionCalledInTest(
            tests_in_file=TestsInFile(
                test_file=Path(row[0]), test_class=row[4], test_function=row[5], test_type=TestType(int(row[6]))
            ),
            position=CodePosition(line_no=row[7], col_no=row[8]),
        )
        for row in rows
    ]
    self.memory_cache[cache_key] = result
    return result
Cache Rebuild Inefficiency

After a cache hit, the code re-queries the database instead of using cached_tests. This doubles work and risks divergence. Reuse cached_tests to rebuild function_to_test_map.

file_hash = TestsCache.compute_file_hash(test_file)

cached_tests = tests_cache.get_tests_for_file(str(test_file), file_hash)

if cached_tests:
    # Rebuild function_to_test_map from cached data
    tests_cache.cur.execute(
        "SELECT * FROM discovered_tests WHERE file_path = ? AND file_hash = ?", (str(test_file), file_hash)
    )
    for row in tests_cache.cur.fetchall():
        qualified_name_with_modules_from_root = row[2]
        test_type = TestType(int(row[6]))

        function_called_in_test = FunctionCalledInTest(
            tests_in_file=TestsInFile(
                test_file=test_file, test_class=row[4], test_function=row[5], test_type=test_type
            ),
            position=CodePosition(line_no=row[7], col_no=row[8]),
        )

        function_to_test_map[qualified_name_with_modules_from_root].add(function_called_in_test)
        if test_type == TestType.REPLAY_TEST:
            num_discovered_replay_tests += 1
        num_discovered_tests += 1

    progress.advance(task_id)
    continue
Path Join Bug

get_run_tmp_file concatenates the temporary directory with a full file_path, which can yield incorrect paths if file_path is absolute. Use Path(file_path).name or .relative_to to ensure safe temp path creation.

def get_run_tmp_file(file_path: Path | str) -> Path:
    if isinstance(file_path, str):
        file_path = Path(file_path)
    if not hasattr(get_run_tmp_file, "tmpdir"):
        get_run_tmp_file.tmpdir = TemporaryDirectory(prefix="codeflash_")
    return Path(get_run_tmp_file.tmpdir.name) / file_path

@github-actions
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Close DB in finally

Ensure the database connection is closed even if an exception occurs while
processing files. Use a try/finally to guarantee tests_cache.close() executes and
prevent DB locks or resource leaks.

codeflash/discovery/discover_unit_tests.py [546-736]

 def process_test_files(
     file_to_test_map: dict[Path, list[TestsInFile]],
     cfg: TestConfig,
     functions_to_optimize: list[FunctionToOptimize] | None = None,
 ) -> tuple[dict[str, set[FunctionCalledInTest]], int, int]:
     import jedi
 
     ...
 
     tests_cache = TestsCache()
-
-    with test_files_progress_bar(total=len(file_to_test_map), description="Processing test files") as (
-        progress,
-        task_id,
-    ):
-        ...
-    tests_cache.close()
+    try:
+        with test_files_progress_bar(total=len(file_to_test_map), description="Processing test files") as (
+            progress,
+            task_id,
+        ):
+            ...
+    finally:
+        tests_cache.close()
 
     return dict(function_to_test_map), num_discovered_tests, num_discovered_replay_tests
Suggestion importance[1-10]: 7

__

Why: Ensuring tests_cache.close() in a finally guards against resource leaks on exceptions during processing. It's accurate and improves robustness, though not a critical bug fix.

Medium
Possible issue
Fix variable shadowing and types

Avoid shadowing the input by reusing its name, and ensure the fallback return type
matches the annotated tuple. Keep the original function_name intact and return None
for the middle element on failure, preserving the expected types.

codeflash/discovery/discover_unit_tests.py [519-524]

 def discover_parameters_unittest(function_name: str) -> tuple[bool, str, str | None]:
-    function_parts = function_name.split("_")
-    if len(function_parts) > 1 and function_parts[-1].isdigit():
-        return True, "_".join(function_parts[:-1]), function_parts[-1]
+    parts = function_name.split("_")
+    if len(parts) > 1 and parts[-1].isdigit():
+        return True, "_".join(parts[:-1]), parts[-1]
 
     return False, function_name, None
Suggestion importance[1-10]: 3

__

Why: Renaming function_parts to parts is a minor readability tweak; behavior and types remain the same. It's correct but low-impact.

Low

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Sep 23, 2025

⚡️ Codeflash found optimizations for this PR

📄 252% (2.52x) speedup for discover_parameters_unittest in codeflash/discovery/discover_unit_tests.py

⏱️ Runtime : 350 microseconds 99.4 microseconds (best of 264 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch test_cache_revival).

@KRRT7 KRRT7 force-pushed the test_cache_revival branch from ce72cfd to bbc630f Compare September 23, 2025 14:06
@misrasaurabh1
Copy link
Contributor

@KRRT7 updates on this?

from codeflash.discovery.discover_unit_tests import discover_unit_tests

console.rule()
with progress_bar("Discovering existing function tests..."):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the progress_bar will also show the loading text animation in the extension, I think we should keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-10-09 at 3 41 12 PM there is still a progress bar, I just moved it elsewhere since I was actually seeing 2 progress bar artifacts

the new progress bar is implemented in codeflash/discovery/discover_unit_tests.py::process_test_files:577

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen.Recording.2025-10-09.at.3.43.36.PM.mov

this is the artifact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen.Recording.2025-10-09.at.3.45.08.PM.mov

no artifact

@github-actions github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Oct 9, 2025
@KRRT7 KRRT7 requested a review from mohammedahmed18 October 9, 2025 23:15
self.cur.execute(
"""
CREATE TABLE IF NOT EXISTS discovered_tests(
project_root_path TEXT,
Copy link
Contributor

@mohammedahmed18 mohammedahmed18 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't create the project_root_path column if the table already created before, you should have a separate sql for adding the new column

2025-10-10_18-06

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually need some kind of migration engine later, for these type of changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, that database should be empty, if not, I think only our team members would have it, but yes let me come up with a fix

KRRT7 and others added 2 commits October 10, 2025 18:22
The optimized code achieves a 52% speedup by replacing the traditional file reading approach with a more efficient buffered I/O pattern using `readinto()` and `memoryview`. 

**Key optimizations:**

1. **Pre-allocated buffer with `readinto()`**: Instead of `f.read(8192)` which allocates a new bytes object on each iteration, the code uses a single `bytearray(8192)` buffer and reads data directly into it with `f.readinto(mv)`. This eliminates repeated memory allocations.

2. **Memory view for zero-copy slicing**: The `memoryview(buf)` allows efficient slicing (`mv[:n]`) without copying data, reducing memory overhead when updating the hash with partial buffers.

3. **Direct `open()` with unbuffered I/O**: Using `open(path, "rb", buffering=0)` instead of `Path(path).open("rb")` avoids the Path object overhead and disables Python's internal buffering to prevent double-buffering since we're managing our own buffer.

**Performance impact**: The line profiler shows the critical file opening operation dropped from 83.4% to 62.2% of total time, while the new buffer operations (`readinto`, `memoryview`) are very efficient. This optimization is particularly effective for medium to large files where the reduced memory allocation overhead compounds across multiple read operations.

**Best use cases**: This optimization excels when computing hashes for files larger than the 8KB buffer size, where the memory allocation savings become significant, and when called frequently in batch operations.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Oct 10, 2025

⚡️ Codeflash found optimizations for this PR

📄 53% (0.53x) speedup for TestsCache.compute_file_hash in codeflash/discovery/discover_unit_tests.py

⏱️ Runtime : 301 microseconds 197 microseconds (best of 36 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch test_cache_revival).

KRRT7 and others added 2 commits October 10, 2025 13:23
…25-10-10T18.29.30

⚡️ Speed up method `TestsCache.compute_file_hash` by 53% in PR #753 (`test_cache_revival`)
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Oct 10, 2025

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

@KRRT7 KRRT7 requested a review from mohammedahmed18 October 14, 2025 03:12
@KRRT7 KRRT7 enabled auto-merge October 14, 2025 05:40
@KRRT7
Copy link
Contributor Author

KRRT7 commented Oct 14, 2025

@mohammedahmed18 this is ready

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Codeflash Bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@KRRT7 KRRT7 merged commit 86e2570 into main Oct 15, 2025
19 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 3/5 workflow-modified This PR modifies GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants