Parallelize test discovery #299

KRRT7 · 2025-06-06T18:24:16Z

PR Type

Enhancement, Tests

Description

Parallelize unit test discovery using processes
Introduce caching to skip processed files
Return total discovered count from discovery
Update call sites and tests for new API

Changes walkthrough 📝

Relevant files

Enhancement

discover_unit_tests.py `Add parallel test discovery and caching` codeflash/discovery/discover_unit_tests.py Imported multiprocessing and ProcessPoolExecutor Added _process_single_test_file helper function Parallelized processing with futures executor Implemented caching skip logic and result counting	+226/-173

Bug_fix

functions_to_optimize.py `Unpack discover_unit_tests return tuple` codeflash/discovery/functions_to_optimize.py Unpacked discover_unit_tests tuple return Adjusted function_tests assignment	+1/-1
optimizer.py `Unpack discovery results in optimizer` codeflash/optimization/optimizer.py Unpacked discover_unit_tests tuple return Updated num_discovered_tests assignment	+1/-2
concolic_testing.py `Support new discovery return in concolic testing` codeflash/verification/concolic_testing.py Unpacked discover_unit_tests return tuple Updated logging of discovered test count	+1/-2

Tests

test_unit_test_discovery.py `Update tests for new discover_unit_tests signature` tests/test_unit_test_discovery.py Updated tests to unpack tuple return Ensured tests consume new signature	+17/-17

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

github-actions · 2025-06-06T18:25:41Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Missing Commit The SQLite connection does not call `commit()` after inserting cache entries, so cached test data may never be persisted to disk. for test_file, functions in uncached_files.items(): _, results, count, cache_entries = _process_single_test_file( test_file, functions, project_root_path, test_framework ) total_count += count file_hash = TestsCache.compute_file_hash(str(test_file)) for cache_entry in cache_entries: tests_cache.insert_test(file_path=str(test_file), file_hash=file_hash, cache_entry) for qualified_name, function_called in results: Cache Staleness** The `insert_test` method no longer deletes old entries for a file, which can lead to duplicate or stale records when tests are re-discovered. Consider clearing or replacing entries by `file_path` and `file_hash`. def insert_test( self, file_path: str, file_hash: str, qualified_name_with_modules_from_root: str, function_name: str, test_class: str, test_function: str, test_type: TestType, line_number: int, col_number: int, ) -> None: test_type_value = test_type.value if hasattr(test_type, "value") else test_type self.cur.execute( "INSERT INTO discovered_tests VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)", (

github-actions · 2025-06-06T18:27:13Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Persist cache updates Add a DELETE statement before inserting to remove stale cache entries for the same file and hash, and call `self.connection.commit()` after the insert so the database updates actually persist. codeflash/discovery/discover_unit_tests.py [80-96] def insert_test( self, file_path: str, file_hash: str, qualified_name_with_modules_from_root: str, function_name: str, test_class: str, test_function: str, test_type: TestType, line_number: int, col_number: int, ) -> None: + # purge old entries for this file and hash + self.cur.execute( + "DELETE FROM discovered_tests WHERE file_path = ? AND file_hash = ?", + (file_path, file_hash), + ) test_type_value = test_type.value if hasattr(test_type, "value") else test_type self.cur.execute( "INSERT INTO discovered_tests VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)", ( file_path, file_hash, qualified_name_with_modules_from_root, function_name, test_class, test_function, test_type_value, line_number, col_number, ), ) + self.connection.commit() Suggestion importance[1-10]: 7 __ Why: Adding a `DELETE` before the insert prevents stale rows accumulating, and calling `self.connection.commit()` ensures the new entries are persisted immediately.	Medium
General	Limit worker pool correctly Bound the worker pool size to the number of uncached files rather than the total file map so you don’t spawn idle processes when most files are already cached. codeflash/discovery/discover_unit_tests.py [458] -max_workers = min(len(file_to_test_map) or 1, multiprocessing.cpu_count()) +# after segregating uncached_files +max_workers = min(len(uncached_files) or 1, multiprocessing.cpu_count()) Suggestion importance[1-10]: 6 __ Why: Bounding `max_workers` to `len(uncached_files)` avoids spawning idle processes when many files are already cached, improving resource usage.	Low
General	Use serializable worker results Serializing custom `FunctionCalledInTest` objects across processes can fail; convert each result into plain dictionaries or tuples before returning and rebuild the objects in the main process. codeflash/discovery/discover_unit_tests.py [445] -return str(test_file), results, len(results), cache_entries +# at end of _process_single_test_file +serial_results = [ + ( + qualified_name, + { + "test_file": str(fc.tests_in_file.test_file), + "test_class": fc.tests_in_file.test_class, + "test_function": fc.tests_in_file.test_function, + "test_type": fc.tests_in_file.test_type.value if hasattr(fc.tests_in_file.test_type, "value") else fc.tests_in_file.test_type, + "line": fc.position.line_no, + "col": fc.position.col_no, + }, + ) + for qualified_name, fc in results +] +return str(test_file), serial_results, len(results), cache_entries Suggestion importance[1-10]: 5 __ Why: Converting `FunctionCalledInTest` objects to primitive types avoids pickling errors in `ProcessPoolExecutor`, enhancing robustness when deserializing results.	Low

github-actions bot added the Review effort 4/5 label Jun 6, 2025

KRRT7 closed this Jun 6, 2025

KRRT7 force-pushed the parallelize-test-discovery branch from 75acf5e to 9be93b6 Compare June 6, 2025 19:51

KRRT7 deleted the parallelize-test-discovery branch June 6, 2025 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallelize test discovery #299

Parallelize test discovery #299

Uh oh!

KRRT7 commented Jun 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Parallelize test discovery #299

Parallelize test discovery #299

Uh oh!

Conversation

KRRT7 commented Jun 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Changes walkthrough 📝

Uh oh!

github-actions bot commented Jun 6, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Jun 6, 2025

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KRRT7 commented Jun 6, 2025 •

edited by github-actions bot

Loading