-
Notifications
You must be signed in to change notification settings - Fork 22
Parallelize discovery #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Archan Das seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
|
i believe making jedi parallel can involve some potential hiccups. see if they are a real problem or if its fine https://jedi.readthedocs.io/en/stable/docs/development.html#module-jedi.cache |
| process_inputs.append((str(test_file), serializable_functions, config_dict)) | ||
|
|
||
| # Determine optimal number of processes | ||
| max_processes = min(cpu_count() * 2, len(process_inputs), 32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why cpu_count * 2? you don't want to create more processes than the number of cpus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was under the impression this is largely a disk i/o bound task, so multiple processes per core would help
| # These version placeholders will be replaced by poetry-dynamic-versioning during `poetry build`. | ||
| __version__ = "0.10.3" | ||
| __version_tuple__ = (0, 10, 3) | ||
| __version__ = "0.10.3.post7.dev0+86aa1cd6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont change this version
| processed_files += 1 | ||
|
|
||
| # Log progress | ||
| if processed_files % 100 == 0 or processed_files == len(process_inputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can do something with a progress bar
|
|
||
| position = CodePosition(line_no=entry['line_no'], col_no=entry['col_no']) | ||
|
|
||
| function_to_test_map[qualified_name].append( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the initial code, function_to_test_map is a dictionary of lists, but i don't see why we can't make it a dictionary of sets so they're inmmediately deduped.
|
|
||
| for test in tests: | ||
| # Create a hashable representation of the test | ||
| test_hash = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary, should be hashable already, the pydantic models are frozen
|
|
||
| try: | ||
| script = jedi.Script(path=test_file, project=jedi_project) | ||
| all_names = script.get_names(all_scopes=True, references=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're going for performance here, i wonder if we can do some 'caching' here. for example - if two tests are using the same function(in this case it'll be a jedi name in all_names) we shouldn't have to process it twice
| continue | ||
| scope = m.group(1) | ||
| indices = [i for i, x in enumerate(test_functions_raw) if x == scope] | ||
| for index in indices: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems inefficient to do name.goto multiple times. am i missing something here? @misrasaurabh1
|
Revisit this after #72 is done |
User description
Use multiprocessing in process_test_files to speed up. Moderate speedup seen on dev server - running on systems with more CPU cores should have larger benefits.
PR Type
Description
Use fallback to single-threaded for <25 files.
Add module-level worker for multiprocessing.
Refactor test file processing for parallel discovery.
Update version metadata.
Changes walkthrough 📝
discover_unit_tests.py
Enhance test discovery with parallel processingcodeflash/discovery/discover_unit_tests.py
version.py
Update project version metadatacodeflash/version.py