-
Notifications
You must be signed in to change notification settings - Fork 105
docs: Add parallel processing feature proposal for issue #8 #308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: krrish175-byte The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @krrish175-byte, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Ianvs benchmarking system by introducing parallel processing capabilities for test case execution. Previously, test cases ran serially, which could lead to excessive execution times for complex or numerous tests. The changes enable concurrent execution across multiple CPU cores, drastically reducing overall benchmarking duration. This is achieved through new command-line interface options, corresponding configuration file parameters, and a refactored internal mechanism that leverages Python's multiprocessing to distribute workloads efficiently and robustly. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces parallel processing for test case execution, which is a great feature for improving performance on multi-core systems. The implementation using concurrent.futures.ProcessPoolExecutor is solid. I've made a few suggestions to improve robustness, efficiency, and maintainability. Specifically, I've pointed out a potential crash when determining the number of worker processes, suggested an optimization to reduce data transfer from worker processes, and recommended using the project's standard logger for consistency. Overall, this is a valuable addition.
| if parallel: | ||
| # Determine number of workers | ||
| if workers is None: | ||
| workers = max(1, os.cpu_count() - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core/cmd/obj/benchmarkingjob.py
Outdated
| if k == "parallel_execution": | ||
| self.parallel = v | ||
| if k == "num_workers": | ||
| self.workers = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Run training with config parameters | ||
| # Evaluate and collect metrics | ||
| res = testcase.run(workspace) | ||
| return {"status": "success", "config": testcase.algorithm.name, "results": res, "testcase": testcase} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning the full testcase object from the worker process is unnecessary and can be inefficient. The testcase object is already available in the main process through the future_to_testcase map. Removing it from the returned dictionary will reduce serialization/deserialization overhead.
| return {"status": "success", "config": testcase.algorithm.name, "results": res, "testcase": testcase} | |
| return {"status": "success", "config": testcase.algorithm.name, "results": res} |
| res = testcase.run(workspace) | ||
| return {"status": "success", "config": testcase.algorithm.name, "results": res, "testcase": testcase} | ||
| except Exception as e: | ||
| return {"status": "failed", "config": testcase.algorithm.name, "error": str(e), "testcase": testcase} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning the full testcase object from the worker process is unnecessary and can be inefficient. The testcase object is already available in the main process through the future_to_testcase map. Removing it from the returned dictionary will reduce serialization/deserialization overhead.
| return {"status": "failed", "config": testcase.algorithm.name, "error": str(e), "testcase": testcase} | |
| return {"status": "failed", "config": testcase.algorithm.name, "error": str(e)} |
| if workers is None: | ||
| workers = max(1, os.cpu_count() - 1) | ||
|
|
||
| print(f"Running {len(self.test_cases)} test cases on {workers} workers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses print for logging, but the project seems to have a configured LOGGER (e.g., in core/cmd/benchmarking.py). For consistent and manageable logging, it's better to use the logger instance. This also applies to the print statements on lines 82 and 84. Consider using LOGGER.info, LOGGER.warning, or LOGGER.error as appropriate. You will need to import LOGGER from core.common.log.
| print(f"Running {len(self.test_cases)} test cases on {workers} workers") | |
| LOGGER.info(f"Running {len(self.test_cases)} test cases on {workers} workers") |
eace59c to
fe28f17
Compare
|
Hi @MooreZheng, can you please provide updates on this pr? |
Welcome, Krrish. The work will be appreciated. This is related to issue #8. Please note that parallel processing is an important feature in ianvs core code that will impact all ianvs developers' work, past, present, and future. For such important features, you need to provide a proposal for community reviewers to show how it would affect all current running examples. Then a formal presentation is needed to launch a review of architecture design in the KubeEdge SIG AI before getting to any implementation. See a proposal example in https://github.com/kubeedge/ianvs/blob/main/docs/proposals/scenarios/GovDoc2Poster/GovDoc2Poster.md |
da98e87 to
679c314
Compare
679c314 to
1297b27
Compare
What type of PR is this?
/kind documentation
/kind proposal
What this PR does / why we need it:
This PR adds a comprehensive technical proposal for implementing parallel processing of test cases in Ianvs core, as requested in issue #8 and discussed in PR #308.
The proposal addresses the need to reduce benchmarking time when testing multiple parameter configurations or algorithms. Currently, Ianvs executes test cases serially, which can lead to excessive execution times (e.g., 5 test cases × 2 hours each = 10 hours total). This feature will enable concurrent execution across multiple CPU cores, significantly reducing total benchmarking time.
Proposal Contents:
Key Design Principles:
Which issue(s) this PR fixes:
Related to #8
Related to #308
Special notes for reviewers:
This proposal is intended for community review and architectural discussion before proceeding with implementation, as requested by @MooreZheng in PR #308.
The proposal will be presented at the next KubeEdge SIG AI meeting for formal architectural review. Feedback on the following aspects is particularly welcome:
Next Steps:
Note: This is a documentation-only PR. The actual implementation will be submitted separately after proposal approval.
/cc @MooreZheng @hsj576 @Poorunga
/assign @MooreZheng