docs: Add parallel processing feature proposal for issue #8 #308

krrish175-byte · 2026-01-29T13:51:01Z

What type of PR is this?

/kind documentation
/kind proposal

What this PR does / why we need it:

This PR adds a comprehensive technical proposal for implementing parallel processing of test cases in Ianvs core, as requested in issue #8 and discussed in PR #308.

The proposal addresses the need to reduce benchmarking time when testing multiple parameter configurations or algorithms. Currently, Ianvs executes test cases serially, which can lead to excessive execution times (e.g., 5 test cases × 2 hours each = 10 hours total). This feature will enable concurrent execution across multiple CPU cores, significantly reducing total benchmarking time.

Proposal Contents:

Motivation & Problem Statement: Why parallel processing is essential for all Ianvs developers
Architecture Design: Detailed technical design showing integration with Ianvs core components
Backward Compatibility Analysis: Comprehensive demonstration that all existing examples continue to work without modification
Impact Assessment: Analysis of how this affects current running examples across all scenarios
Testing & Validation Strategy: Plan for validating all existing examples in both serial and parallel modes
Implementation Roadmap: Phased 4-week approach with clear milestones
Risk Assessment: Identified risks and mitigation strategies
Performance Analysis: Expected speedup metrics (3-6x for typical workloads)

Key Design Principles:

Backward Compatible: Parallel execution is opt-in; default behavior remains serial
Zero Breaking Changes: All existing examples and workflows continue to function unchanged
Robust Error Handling: Failures in one test case don't crash the entire benchmarking job
User Control: Flexible configuration via CLI flags and YAML configuration

Which issue(s) this PR fixes:

Related to #8
Related to #308

Special notes for reviewers:

This proposal is intended for community review and architectural discussion before proceeding with implementation, as requested by @MooreZheng in PR #308.

The proposal will be presented at the next KubeEdge SIG AI meeting for formal architectural review. Feedback on the following aspects is particularly welcome:

Overall architecture and design approach
Impact analysis completeness
Testing and validation strategy
Implementation timeline and phases
Any concerns about backward compatibility or existing examples

Next Steps:

Community review of this proposal
Presentation at KubeEdge SIG AI meeting (pending schedule)
Revisions based on community feedback
Approval for implementation
Update PR docs: Add parallel processing feature proposal for issue #8 #308 with implementation based on approved design

Note: This is a documentation-only PR. The actual implementation will be submitted separately after proposal approval.

/cc @MooreZheng @hsj576 @Poorunga
/assign @MooreZheng

kubeedge-bot · 2026-01-29T13:51:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: krrish175-byte
To complete the pull request process, please assign moorezheng after the PR has been reviewed.
You can assign the PR to them by writing /assign @moorezheng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2026-01-29T13:51:20Z

Summary of Changes

Hello @krrish175-byte, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Ianvs benchmarking system by introducing parallel processing capabilities for test case execution. Previously, test cases ran serially, which could lead to excessive execution times for complex or numerous tests. The changes enable concurrent execution across multiple CPU cores, drastically reducing overall benchmarking duration. This is achieved through new command-line interface options, corresponding configuration file parameters, and a refactored internal mechanism that leverages Python's multiprocessing to distribute workloads efficiently and robustly.

Highlights

Parallel Execution: Implemented parallel processing for test case execution using concurrent.futures.ProcessPoolExecutor, allowing multiple test cases to run concurrently.
CLI Arguments: Introduced new command-line flags: -p / --parallel to enable parallel execution and -w / --workers to specify the number of worker processes.
Configuration Support: Added support for parallel_execution (boolean) and num_workers (integer) parameters within the benchmarking job configuration files.
Code Refactoring: Extracted the core test case execution logic into a new top-level, picklable function run_testcase_func to ensure compatibility with Python's multiprocessing module.
Robust Error Handling: Enhanced error handling during parallel execution, ensuring that a failure in one test case does not halt the entire benchmarking job, with errors being logged and other cases continuing to process.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces parallel processing for test case execution, which is a great feature for improving performance on multi-core systems. The implementation using concurrent.futures.ProcessPoolExecutor is solid. I've made a few suggestions to improve robustness, efficiency, and maintainability. Specifically, I've pointed out a potential crash when determining the number of worker processes, suggested an optimization to reduce data transfer from worker processes, and recommended using the project's standard logger for consistency. Overall, this is a valuable addition.

gemini-code-assist · 2026-01-29T13:52:53Z

core/testcasecontroller/testcasecontroller.py

+        if parallel:
+             # Determine number of workers
+            if workers is None:
+                workers = max(1, os.cpu_count() - 1)


os.cpu_count() can return None on some platforms, which would cause a TypeError when you try to subtract 1 from it. You should handle this case to prevent the program from crashing.

Suggested change

workers = max(1, os.cpu_count() - 1)

workers = max(1, (os.cpu_count() or 2) - 1)

gemini-code-assist · 2026-01-29T13:52:53Z

core/cmd/obj/benchmarkingjob.py

+            if k == "parallel_execution":
+                self.parallel = v
+            if k == "num_workers":
+                self.workers = v


This logic for parsing parallel execution parameters is a bit disconnected from the main configuration parsing structure. For better readability and maintainability, consider integrating it into the if/elif/else block above. This would make the flow of control clearer.

gemini-code-assist · 2026-01-29T13:52:53Z

core/testcasecontroller/testcase/testcase.py

+        # Run training with config parameters
+        # Evaluate and collect metrics
+        res = testcase.run(workspace)
+        return {"status": "success", "config": testcase.algorithm.name, "results": res, "testcase": testcase}


Returning the full testcase object from the worker process is unnecessary and can be inefficient. The testcase object is already available in the main process through the future_to_testcase map. Removing it from the returned dictionary will reduce serialization/deserialization overhead.

Suggested change

return {"status": "success", "config": testcase.algorithm.name, "results": res, "testcase": testcase}

return {"status": "success", "config": testcase.algorithm.name, "results": res}

gemini-code-assist · 2026-01-29T13:52:53Z

core/testcasecontroller/testcase/testcase.py

+        res = testcase.run(workspace)
+        return {"status": "success", "config": testcase.algorithm.name, "results": res, "testcase": testcase}
+    except Exception as e:
+        return {"status": "failed", "config": testcase.algorithm.name, "error": str(e), "testcase": testcase}


Returning the full testcase object from the worker process is unnecessary and can be inefficient. The testcase object is already available in the main process through the future_to_testcase map. Removing it from the returned dictionary will reduce serialization/deserialization overhead.

Suggested change

return {"status": "failed", "config": testcase.algorithm.name, "error": str(e), "testcase": testcase}

return {"status": "failed", "config": testcase.algorithm.name, "error": str(e)}

gemini-code-assist · 2026-01-29T13:52:53Z

core/testcasecontroller/testcasecontroller.py

+            if workers is None:
+                workers = max(1, os.cpu_count() - 1)
+
+            print(f"Running {len(self.test_cases)} test cases on {workers} workers")


This uses print for logging, but the project seems to have a configured LOGGER (e.g., in core/cmd/benchmarking.py). For consistent and manageable logging, it's better to use the logger instance. This also applies to the print statements on lines 82 and 84. Consider using LOGGER.info, LOGGER.warning, or LOGGER.error as appropriate. You will need to import LOGGER from core.common.log.

Suggested change

print(f"Running {len(self.test_cases)} test cases on {workers} workers")

LOGGER.info(f"Running {len(self.test_cases)} test cases on {workers} workers")

krrish175-byte · 2026-02-01T10:17:56Z

Hi @MooreZheng, can you please provide updates on this pr?

MooreZheng · 2026-02-02T01:40:27Z

Hi @MooreZheng, can you please provide updates on this pr?

Welcome, Krrish. The work will be appreciated.

This is related to issue #8. Please note that parallel processing is an important feature in ianvs core code that will impact all ianvs developers' work, past, present, and future. For such important features, you need to provide a proposal for community reviewers to show how it would affect all current running examples. Then a formal presentation is needed to launch a review of architecture design in the KubeEdge SIG AI before getting to any implementation.

See a proposal example in https://github.com/kubeedge/ianvs/blob/main/docs/proposals/scenarios/GovDoc2Poster/GovDoc2Poster.md

kubeedge-bot added do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 29, 2026

kubeedge-bot requested review from MooreZheng and Poorunga January 29, 2026 13:51

kubeedge-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 29, 2026

gemini-code-assist bot reviewed Jan 29, 2026

View reviewed changes

krrish175-byte force-pushed the fix/issue-8 branch from eace59c to fe28f17 Compare January 29, 2026 13:57

MooreZheng requested review from hsj576 and removed request for Poorunga February 2, 2026 01:21

kubeedge-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 2, 2026

krrish175-byte force-pushed the fix/issue-8 branch from da98e87 to 679c314 Compare February 2, 2026 13:36

feat: Proposal for parallel processing in Ianvs

1297b27

krrish175-byte force-pushed the fix/issue-8 branch from 679c314 to 1297b27 Compare February 2, 2026 13:42

krrish175-byte changed the title ~~feat: Add parallel processing for multiple test case execution~~ feat: Proposal for parallel processing in Ianvs Feb 2, 2026

krrish175-byte changed the title ~~feat: Proposal for parallel processing in Ianvs~~ docs: Add parallel processing feature proposal for issue #8 Feb 2, 2026

docs: replace ASCII diagrams with images and ensure professional tone

6d8f20d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add parallel processing feature proposal for issue #8 #308

docs: Add parallel processing feature proposal for issue #8 #308

Uh oh!

krrish175-byte commented Jan 29, 2026 •

edited

Loading

Uh oh!

kubeedge-bot commented Jan 29, 2026

Uh oh!

gemini-code-assist bot commented Jan 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

krrish175-byte commented Feb 1, 2026

Uh oh!

MooreZheng commented Feb 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	workers = max(1, os.cpu_count() - 1)
	workers = max(1, (os.cpu_count() or 2) - 1)

	return {"status": "success", "config": testcase.algorithm.name, "results": res, "testcase": testcase}
	return {"status": "success", "config": testcase.algorithm.name, "results": res}

	return {"status": "failed", "config": testcase.algorithm.name, "error": str(e), "testcase": testcase}
	return {"status": "failed", "config": testcase.algorithm.name, "error": str(e)}

	print(f"Running {len(self.test_cases)} test cases on {workers} workers")
	LOGGER.info(f"Running {len(self.test_cases)} test cases on {workers} workers")

docs: Add parallel processing feature proposal for issue #8 #308

Are you sure you want to change the base?

docs: Add parallel processing feature proposal for issue #8 #308

Uh oh!

Conversation

krrish175-byte commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Proposal Contents:

Key Design Principles:

Which issue(s) this PR fixes:

Special notes for reviewers:

Next Steps:

Uh oh!

kubeedge-bot commented Jan 29, 2026

Uh oh!

gemini-code-assist bot commented Jan 29, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

krrish175-byte commented Feb 1, 2026

Uh oh!

MooreZheng commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

krrish175-byte commented Jan 29, 2026 •

edited

Loading

MooreZheng commented Feb 2, 2026 •

edited

Loading