paralelize test discovery #264

KRRT7 · 2025-06-02T19:16:04Z

User description

note: max_workers is os.cpu_count() or 1, for GitHub Actions that means the min is 2 cores.

PR Type

Enhancement, Tests

Description

Parallelize test file discovery with multiple processes
Refactor into process_single_test_file helper
Implement caching to skip unchanged test files
Strengthen type assertion and simplify insertion logic

Changes walkthrough 📝

Relevant files

Enhancement

discover_unit_tests.py `Parallelize and optimize test discovery` codeflash/discovery/discover_unit_tests.py Add `ProcessPoolExecutor` for concurrent test processing Extract and optimize single-file test logic Early return on cache hit for unchanged files Assert `test_type` and streamline DB insertion	+173/-169

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

github-actions · 2025-06-02T19:17:14Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Picklable Objects Passing a `jedi.Project` instance to worker processes may raise pickling errors since it's likely not serializable. Consider initializing `jedi.Project` inside each process or using a serializable representation instead. futures = { executor.submit(process_single_test_file, test_file, functions, cfg, jedi_project): test_file for test_file, functions in file_to_test_map.items() } for future in futures: result = future.result() for k, v in result.items(): SQLite Concurrency Multiple processes writing to the same SQLite database could lead to database locks or write failures. Verify that the caching layer handles concurrent writes safely or use a process-safe mechanism. def process_test_files( file_to_test_map: dict[Path, list[TestsInFile]], cfg: TestConfig ) -> dict[str, list[FunctionCalledInTest]]: project_root_path = cfg.project_root_path function_to_test_map = defaultdict(set) jedi_project = jedi.Project(path=project_root_path) with ( test_files_progress_bar(total=len(file_to_test_map), description="Processing test files") as ( progress, task_id, ), ProcessPoolExecutor() as executor, ): futures = { executor.submit(process_single_test_file, test_file, functions, cfg, jedi_project): test_file for test_file, functions in file_to_test_map.items() } for future in futures: result = future.result() for k, v in result.items(): function_to_test_map[k].update(v) progress.update(task_id) return {function: list(tests) for function, tests in function_to_test_map.items()} Duplicate Inserts The `insert_test` method no longer deletes existing records before insert, which may lead to duplicate entries for unchanged files over multiple runs. Ensure old entries are cleaned or use `INSERT OR REPLACE`. assert isinstance(test_type, TestType), "test_type must be an instance of TestType" self.cur.execute( "INSERT INTO discovered_tests VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)", ( file_path, file_hash, qualified_name_with_modules_from_root, function_name, test_class, test_function, test_type.value, line_number,

github-actions · 2025-06-02T19:18:42Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Initialize jedi in child processes Avoid passing a non-picklable `jedi.Project` instance into worker processes. Instead, construct a new `jedi.Project` inside `process_single_test_file` using the config’s `project_root_path`. codeflash/discovery/discover_unit_tests.py [454-463] -jedi_project = jedi.Project(path=project_root_path) with ( test_files_progress_bar(total=len(file_to_test_map), description="Processing test files") as ( progress, task_id, ), ProcessPoolExecutor() as executor, ): futures = { - executor.submit(process_single_test_file, test_file, functions, cfg, jedi_project): test_file + executor.submit(process_single_test_file, test_file, functions, cfg): test_file for test_file, functions in file_to_test_map.items() } Suggestion importance[1-10]: 9 __ Why: Passing a `jedi.Project` into worker processes will fail to pickle; constructing it inside each child avoids serialization errors and ensures correct setup.	High
	Delete existing records before insert Ensure stale test records are removed before inserting new ones to avoid duplicate or conflicting entries. Re-add a DELETE statement or use an UPSERT/REPLACE strategy. codeflash/discovery/discover_unit_tests.py [84-94] +self.cur.execute("DELETE FROM discovered_tests WHERE file_path = ?", (file_path,)) assert isinstance(test_type, TestType), "test_type must be an instance of TestType" self.cur.execute( "INSERT INTO discovered_tests VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)", ( file_path, file_hash, qualified_name_with_modules_from_root, function_name, test_class, test_function, test_type.value, line_number, col_number, ), ) Suggestion importance[1-10]: 8 __ Why: Re-adding the DELETE ensures stale rows for `file_path` are cleaned up before inserting new test records, preventing duplicate/conflicting entries in `discovered_tests`.	Medium
	Use correct test_type per function The code always uses `functions[0].test_type`, which may not correspond to the current `function`. Retrieve the `TestType` from the matching `TestsInFile` (e.g. by mapping `function` back to its original object). codeflash/discovery/discover_unit_tests.py [351-358] if is_parameterized and new_function == def_name.name: + original = next(f for f in functions if f.test_function == function) test_functions.add( TestFunction( function_name=def_name.name, test_class=matched_name, parameters=parameters, - test_type=functions[0].test_type, + test_type=original.test_type, ) ) Suggestion importance[1-10]: 7 __ Why: Always using `functions[0].test_type` can misassign types when multiple tests exist; mapping back to the specific `TestsInFile` preserves each function’s actual `test_type`.	Medium
General	Handle task exceptions gracefully If a worker task raises an exception, calling `future.result()` will halt the entire discovery. Wrap it in try/except to log errors and continue processing remaining files. codeflash/discovery/discover_unit_tests.py [466-470] for future in futures: - result = future.result() + try: + result = future.result() + except Exception as e: + logger.debug(f"Error processing test file: {e}") + continue for k, v in result.items(): function_to_test_map[k].update(v) progress.update(task_id) Suggestion importance[1-10]: 6 __ Why: Wrapping `future.result()` in try/except prevents one failing test file from halting the entire process, improving robustness and logging errors for debugging.	Low

openhands-ai · 2025-06-06T06:10:19Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- CodeFlash
- end-to-end-test
- end-to-end-test
- end-to-end-test
- end-to-end-test
- CodeFlash
- end-to-end-test
- end-to-end-test

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #264

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

KRRT7 marked this pull request as draft June 2, 2025 19:17

github-actions bot added the Review effort 4/5 label Jun 2, 2025

KRRT7 force-pushed the parallelize-test-discovery branch 2 times, most recently from 8f1d85a to 77e7053 Compare June 6, 2025 06:00

KRRT7 closed this Jun 6, 2025

KRRT7 force-pushed the parallelize-test-discovery branch from cecf857 to 1eb54df Compare June 6, 2025 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

paralelize test discovery #264

paralelize test discovery #264

Uh oh!

KRRT7 commented Jun 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 2, 2025

Uh oh!

github-actions bot commented Jun 2, 2025

Uh oh!

openhands-ai bot commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paralelize test discovery #264

paralelize test discovery #264

Uh oh!

Conversation

KRRT7 commented Jun 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Changes walkthrough 📝

Uh oh!

github-actions bot commented Jun 2, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Jun 2, 2025

PR Code Suggestions ✨

Uh oh!

openhands-ai bot commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KRRT7 commented Jun 2, 2025 •

edited by github-actions bot

Loading