Test cache revival #753

KRRT7 · 2025-09-23T13:28:39Z

PR Type

Enhancement, Bug fix, Tests

Description

Add SQLite-backed tests cache with memory layer
Return replay tests count in discovery APIs
Improve robustness of pytest import error handling
Broaden path typing and hashing utilities

Diagram Walkthrough

flowchart LR
  A["discover_tests_* callers"] -- "call" --> B["discover_tests_pytest/unittest"]
  B -- "process" --> C["process_test_files"]
  C -- "check" --> D["TestsCache.get_tests_for_file"]
  D -- "hit" --> E["Rebuild function_to_test_map from cache"]
  D -- "miss" --> F["Jedi analysis + insert cache rows"]
  B -- "return" --> G["(map, total, replay_total)"]

File Walkthrough

Relevant files

Enhancement

code_utils.py `Broaden tmp file helper to accept strings` codeflash/code_utils/code_utils.py Accept `str` input in `get_run_tmp_file` Normalize to `Path` internally	+3/-1
discover_unit_tests.py `Add persistent caching and extend discovery outputs` codeflash/discovery/discover_unit_tests.py Introduce `TestsCache` memory/db cache and API changes Change discovery functions to return replay count Cache lookup and insert during test processing Safer pytest import error extraction and type tweaks	+71/-28

github-actions · 2025-09-23T13:30:04Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review AttributeError Bug The in-memory cache attribute was renamed to `memory_cache`, but lookups still reference `_memory_cache`, which will raise AttributeError and disable caching. Align all references to the same attribute. def get_tests_for_file(self, file_path: str, file_hash: str) -> list[FunctionCalledInTest] \| None: cache_key = (file_path, file_hash) if cache_key in self._memory_cache: return self.memory_cache[cache_key] self.cur.execute("SELECT * FROM discovered_tests WHERE file_path = ? AND file_hash = ?", (file_path, file_hash)) rows = self.cur.fetchall() if not rows: return None result = [ FunctionCalledInTest( tests_in_file=TestsInFile( test_file=Path(row[0]), test_class=row[4], test_function=row[5], test_type=TestType(int(row[6])) ), position=CodePosition(line_no=row[7], col_no=row[8]), ) for row in rows ] self.memory_cache[cache_key] = result return result Cache Rebuild Inefficiency After a cache hit, the code re-queries the database instead of using `cached_tests`. This doubles work and risks divergence. Reuse `cached_tests` to rebuild `function_to_test_map`. file_hash = TestsCache.compute_file_hash(test_file) cached_tests = tests_cache.get_tests_for_file(str(test_file), file_hash) if cached_tests: # Rebuild function_to_test_map from cached data tests_cache.cur.execute( "SELECT * FROM discovered_tests WHERE file_path = ? AND file_hash = ?", (str(test_file), file_hash) ) for row in tests_cache.cur.fetchall(): qualified_name_with_modules_from_root = row[2] test_type = TestType(int(row[6])) function_called_in_test = FunctionCalledInTest( tests_in_file=TestsInFile( test_file=test_file, test_class=row[4], test_function=row[5], test_type=test_type ), position=CodePosition(line_no=row[7], col_no=row[8]), ) function_to_test_map[qualified_name_with_modules_from_root].add(function_called_in_test) if test_type == TestType.REPLAY_TEST: num_discovered_replay_tests += 1 num_discovered_tests += 1 progress.advance(task_id) continue Path Join Bug `get_run_tmp_file` concatenates the temporary directory with a full `file_path`, which can yield incorrect paths if `file_path` is absolute. Use `Path(file_path).name` or `.relative_to` to ensure safe temp path creation. def get_run_tmp_file(file_path: Path \| str) -> Path: if isinstance(file_path, str): file_path = Path(file_path) if not hasattr(get_run_tmp_file, "tmpdir"): get_run_tmp_file.tmpdir = TemporaryDirectory(prefix="codeflash_") return Path(get_run_tmp_file.tmpdir.name) / file_path

github-actions · 2025-09-23T13:30:50Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category Suggestion Impact

General

Close DB in finally

Ensure the database connection is closed even if an exception occurs while
processing files. Use a try/finally to guarantee tests_cache.close() executes and
prevent DB locks or resource leaks.

codeflash/discovery/discover_unit_tests.py [546-736]

 def process_test_files(
     file_to_test_map: dict[Path, list[TestsInFile]],
     cfg: TestConfig,
     functions_to_optimize: list[FunctionToOptimize] | None = None,
 ) -> tuple[dict[str, set[FunctionCalledInTest]], int, int]:
     import jedi
 
     ...
 
     tests_cache = TestsCache()
-
-    with test_files_progress_bar(total=len(file_to_test_map), description="Processing test files") as (
-        progress,
-        task_id,
-    ):
-        ...
-    tests_cache.close()
+    try:
+        with test_files_progress_bar(total=len(file_to_test_map), description="Processing test files") as (
+            progress,
+            task_id,
+        ):
+            ...
+    finally:
+        tests_cache.close()
 
     return dict(function_to_test_map), num_discovered_tests, num_discovered_replay_tests

Suggestion importance[1-10]: 7

__

Why: Ensuring tests_cache.close() in a finally guards against resource leaks on exceptions during processing. It's accurate and improves robustness, though not a critical bug fix.

Medium

Possible issue

Fix variable shadowing and types

Avoid shadowing the input by reusing its name, and ensure the fallback return type
matches the annotated tuple. Keep the original function_name intact and return None
for the middle element on failure, preserving the expected types.

codeflash/discovery/discover_unit_tests.py [519-524]

 def discover_parameters_unittest(function_name: str) -> tuple[bool, str, str | None]:
-    function_parts = function_name.split("_")
-    if len(function_parts) > 1 and function_parts[-1].isdigit():
-        return True, "_".join(function_parts[:-1]), function_parts[-1]
+    parts = function_name.split("_")
+    if len(parts) > 1 and parts[-1].isdigit():
+        return True, "_".join(parts[:-1]), parts[-1]
 
     return False, function_name, None

Suggestion importance[1-10]: 3

__

Why: Renaming function_parts to parts is a minor readability tweak; behavior and types remain the same. It's correct but low-impact.

Low

codeflash-ai · 2025-09-23T14:02:37Z

⚡️ Codeflash found optimizations for this PR

📄 252% (2.52x) speedup for `discover_parameters_unittest` in `codeflash/discovery/discover_unit_tests.py`

⏱️ Runtime : 350 microseconds → 99.4 microseconds (best of 264 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function discover_parameters_unittest by 252% in PR #753 (test_cache_revival) #754

If you approve, it will be merged into this PR (branch test_cache_revival).

misrasaurabh1 · 2025-10-09T19:34:03Z

@KRRT7 updates on this?

and use_cache as a test_config

…codeflash into test_cache_revival

mohammedahmed18 · 2025-10-03T01:39:52Z

codeflash/optimization/optimizer.py

        from codeflash.discovery.discover_unit_tests import discover_unit_tests

        console.rule()
-        with progress_bar("Discovering existing function tests..."):


the progress_bar will also show the loading text animation in the extension, I think we should keep it.

there is still a progress bar, I just moved it elsewhere since I was actually seeing 2 progress bar artifacts
the new progress bar is implemented in codeflash/discovery/discover_unit_tests.py::process_test_files:577

Screen.Recording.2025-10-09.at.3.43.36.PM.mov

this is the artifact

Screen.Recording.2025-10-09.at.3.45.08.PM.mov

no artifact

…che_revival

mohammedahmed18 · 2025-10-10T15:06:13Z

codeflash/discovery/discover_unit_tests.py

        self.cur.execute(
            """
            CREATE TABLE IF NOT EXISTS discovered_tests(
+                project_root_path TEXT,


this won't create the project_root_path column if the table already created before, you should have a separate sql for adding the new column

we actually need some kind of migration engine later, for these type of changes

huh, that database should be empty, if not, I think only our team members would have it, but yes let me come up with a fix

The optimized code achieves a 52% speedup by replacing the traditional file reading approach with a more efficient buffered I/O pattern using `readinto()` and `memoryview`. **Key optimizations:** 1. **Pre-allocated buffer with `readinto()`**: Instead of `f.read(8192)` which allocates a new bytes object on each iteration, the code uses a single `bytearray(8192)` buffer and reads data directly into it with `f.readinto(mv)`. This eliminates repeated memory allocations. 2. **Memory view for zero-copy slicing**: The `memoryview(buf)` allows efficient slicing (`mv[:n]`) without copying data, reducing memory overhead when updating the hash with partial buffers. 3. **Direct `open()` with unbuffered I/O**: Using `open(path, "rb", buffering=0)` instead of `Path(path).open("rb")` avoids the Path object overhead and disables Python's internal buffering to prevent double-buffering since we're managing our own buffer. **Performance impact**: The line profiler shows the critical file opening operation dropped from 83.4% to 62.2% of total time, while the new buffer operations (`readinto`, `memoryview`) are very efficient. This optimization is particularly effective for medium to large files where the reduced memory allocation overhead compounds across multiple read operations. **Best use cases**: This optimization excels when computing hashes for files larger than the 8KB buffer size, where the memory allocation savings become significant, and when called frequently in batch operations.

codeflash-ai · 2025-10-10T18:29:39Z

⚡️ Codeflash found optimizations for this PR

📄 53% (0.53x) speedup for `TestsCache.compute_file_hash` in `codeflash/discovery/discover_unit_tests.py`

⏱️ Runtime : 301 microseconds → 197 microseconds (best of 36 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up method TestsCache.compute_file_hash by 53% in PR #753 (test_cache_revival) #810

If you approve, it will be merged into this PR (branch test_cache_revival).

…25-10-10T18.29.30 ⚡️ Speed up method `TestsCache.compute_file_hash` by 53% in PR #753 (`test_cache_revival`)

codeflash-ai · 2025-10-10T20:23:49Z

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

⚡️ Speed up method TestsCache.compute_file_hash by 53% in PR #753 (test_cache_revival) #810

KRRT7 · 2025-10-14T05:40:27Z

@mohammedahmed18 this is ready

…h into test_cache_revival

…che_revival

CLAassistant · 2025-10-15T16:40:34Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Codeflash Bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

KRRT7 added 2 commits September 23, 2025 04:37

tests cache

d5fa1ef

make ty happy

28eee4a

KRRT7 requested a review from mohammedahmed18 September 23, 2025 13:28

github-actions bot added the Review effort 3/5 label Sep 23, 2025

codeflash-ai bot mentioned this pull request Sep 23, 2025

⚡️ Speed up function discover_parameters_unittest by 252% in PR #753 (test_cache_revival) #754

Closed

prevent progress bar artifact & respond to code review

bbc630f

KRRT7 force-pushed the test_cache_revival branch from ce72cfd to bbc630f Compare September 23, 2025 14:06

formatting

c4b8a4b

KRRT7 and others added 7 commits October 9, 2025 20:05

Merge branch 'main' into test_cache_revival

caac361

Merge branch 'main' into test_cache_revival

fd34f8d

add project_root as a index key

180c479

and use_cache as a test_config

add unit tests for caching

a891be4

Merge branch 'test_cache_revival' of https://github.com/codeflash-ai/…

924de86

…codeflash into test_cache_revival

Merge branch 'main' into test_cache_revival

119e8ec

formatting

49e44ee

mohammedahmed18 reviewed Oct 9, 2025

View reviewed changes

loosen E2E init

ea9878d

github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Oct 9, 2025

KRRT7 requested a review from mohammedahmed18 October 9, 2025 23:15

KRRT7 and others added 2 commits October 9, 2025 16:29

Update discover_unit_tests.py

a0c333c

Merge branch 'main' of github.com:codeflash-ai/codeflash into test_ca…

bdc062a

…che_revival

mohammedahmed18 requested changes Oct 10, 2025

View reviewed changes

KRRT7 and others added 2 commits October 10, 2025 18:22

Merge branch 'main' into test_cache_revival

b210ba4

codeflash-ai bot mentioned this pull request Oct 10, 2025

⚡️ Speed up method TestsCache.compute_file_hash by 53% in PR #753 (test_cache_revival) #810

Merged

KRRT7 and others added 2 commits October 10, 2025 13:23

it's a pathy objectey

c3e2ec2

Merge pull request #810 from codeflash-ai/codeflash/optimize-pr753-20…

fd64a22

…25-10-10T18.29.30 ⚡️ Speed up method `TestsCache.compute_file_hash` by 53% in PR #753 (`test_cache_revival`)

KRRT7 and others added 4 commits October 10, 2025 14:32

use path objects consistently

bb982cb

formatter

d8dd14d

add schema to testcache

b243158

Merge branch 'main' into test_cache_revival

2bead9c

KRRT7 requested a review from mohammedahmed18 October 14, 2025 03:12

KRRT7 enabled auto-merge October 14, 2025 05:40

KRRT7 and others added 3 commits October 14, 2025 18:27

Merge branch 'main' into test_cache_revival

1b58dd1

Merge branch 'test_cache_revival' of github.com:codeflash-ai/codeflas…

d5cf24b

…h into test_cache_revival

Merge branch 'main' of github.com:codeflash-ai/codeflash into test_ca…

e187f98

…che_revival

lsp log for discovering tests loading message

882a2e0

mohammedahmed18 approved these changes Oct 15, 2025

View reviewed changes

KRRT7 merged commit 86e2570 into main Oct 15, 2025
19 of 22 checks passed

Test cache revival #753

Test cache revival #753

Conversation

KRRT7 commented Sep 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Sep 23, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Sep 23, 2025

PR Code Suggestions ✨

Uh oh!

codeflash-ai bot commented Sep 23, 2025

⚡️ Codeflash found optimizations for this PR

📄 252% (2.52x) speedup for discover_parameters_unittest in codeflash/discovery/discover_unit_tests.py

I created a new dependent PR with the suggested changes. Please review:

Uh oh!

misrasaurabh1 commented Oct 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mohammedahmed18 Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codeflash-ai bot commented Oct 10, 2025

⚡️ Codeflash found optimizations for this PR

📄 53% (0.53x) speedup for TestsCache.compute_file_hash in codeflash/discovery/discover_unit_tests.py

A dependent PR with the suggested changes has been created. Please review:

Uh oh!

codeflash-ai bot commented Oct 10, 2025

Uh oh!

KRRT7 commented Oct 14, 2025

Uh oh!

CLAassistant commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

KRRT7 commented Sep 23, 2025 •

edited by github-actions bot

Loading

📄 252% (2.52x) speedup for `discover_parameters_unittest` in `codeflash/discovery/discover_unit_tests.py`

mohammedahmed18 Oct 10, 2025 •

edited

Loading

📄 53% (0.53x) speedup for `TestsCache.compute_file_hash` in `codeflash/discovery/discover_unit_tests.py`