-
Notifications
You must be signed in to change notification settings - Fork 105
(IMPLEMENTATION) / Comprehensive Example Restoration for Ianvs(Robot) - Lifelong Learning Example #297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
d79cf6b to
e092acc
Compare
Summary of ChangesHello @abhishek-8081, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request finalizes the comprehensive implementation of the lifelong learning robot example within the Ianvs project, completing a major milestone for the LFX Mentorship program. The changes focus on enhancing the robustness and efficiency of the core lifelong learning paradigm. This includes streamlining result handling during model evaluation, improving metric computation to adapt to different output scenarios, and updating various configuration files for a more standardized and explicit setup. The PR also reflects adjustments in the integration strategy for external models, suggesting a more modular approach to their use within the evaluation framework. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request restores a comprehensive lifelong learning example. The changes primarily involve refactoring the lifelong_learning paradigm to support a no-inference mode and updating various configuration files. My review has identified a few issues, mainly related to hardcoded absolute paths in YAML configuration files which affect the portability of the example. I have suggested using relative paths or generic paths as documented. Additionally, there's a logic bug in the _train method in lifelong_learning.py concerning environment variable settings that needs to be addressed.
| self.dataset.test_url, | ||
| "test") | ||
| return None, self.system_metric_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The run method returns None for the test result in no-inference mode, which is inconsistent with other modes and the method's docstring. The evaluation result is available in the test_res variable from the my_eval call. It should be returned to be consistent.
| return None, self.system_metric_info | |
| return test_res, self.system_metric_info |
| if rounds < 1: | ||
| os.environ["CLOUD_KB_INDEX"] = cloud_task_index | ||
| os.environ["OUTPUT_URL"] = train_output_dir | ||
| if rounds < 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The environment variables CLOUD_KB_INDEX and OUTPUT_URL are now set only if rounds < 1. This will cause issues in subsequent training rounds (rounds >= 1). Additionally, the if rounds < 1: check is duplicated. These variables should be set unconditionally before the if block.
os.environ["CLOUD_KB_INDEX"] = cloud_task_index
os.environ["OUTPUT_URL"] = train_output_dir| # job name of bechmarking; string type; | ||
| name: "benchmarkingjob" | ||
| # the url address of job workspace that will reserve the output of tests; string type; | ||
| workspace: "/home/abhishek/projects/kumar/ianvs/lifelong_learning_bench/robot-workspace-test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # the url address of test environment configuration file; string type; | ||
| # the file format supports yaml/yml; | ||
| testenv: "/home/abhishek/projects/kumar/ianvs/examples/robot/lifelong_learning_bench/semantic-segmentation/testenv/testenv-robot.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - name: "rfnet_lifelong_learning" | ||
| # the url address of test algorithm configuration file; string type; | ||
| # the file format supports yaml/yml | ||
| url: "/home/abhishek/projects/kumar/ianvs/examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/rfnet_algorithm-simple.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| train_index: "/home/abhishek/cloud-robotics/640x480/train-index.txt" | ||
| # the url address of test dataset index; string type; | ||
| test_index: "/home/abhishek/cloud-robotics/640x480/test-index.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded user-specific absolute paths are used for train_index and test_index. This makes the example not portable. Please use the generic data paths as documented (e.g., /data/datasets/...) or relative paths.
train_index: "/data/datasets/robot_dataset/train-index-mix.txt"
# the url address of test dataset index; string type;
test_index: "/data/datasets/robot_dataset/test-index.txt"| # metric name; string type; | ||
| name: "accuracy" | ||
| # the url address of python file | ||
| url: "/home/abhishek/projects/kumar/ianvs/examples/robot/lifelong_learning_bench/semantic-segmentation/testenv/accuracy.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # metric name; string type; | ||
| - name: "accuracy" | ||
| # the url address of python file | ||
| url: "/home/abhishek/projects/kumar/ianvs/examples/robot/lifelong_learning_bench/semantic-segmentation/testenv/accuracy.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.edge_task_index, tasks_detail, test_res = self.my_eval(self.cloud_task_index, | ||
| self.dataset.test_url, | ||
| r) | ||
| task_avg_score = {'accuracy':0.0} | ||
| i = 0 | ||
| for detail in tasks_detail: | ||
| i += 1 | ||
| scores = detail.scores | ||
| entry = detail.entry | ||
| LOGGER.info(f"{entry} scores: {scores}") | ||
| task_avg_score['accuracy'] += scores['accuracy'] | ||
| task_avg_score['accuracy'] = task_avg_score['accuracy']/i | ||
| self.system_metric_info[SystemMetricType.TASK_AVG_ACC.value] = task_avg_score | ||
| LOGGER.info(task_avg_score) | ||
| # job = self.build_paradigm_job(ParadigmType.LIFELONG_LEARNING.value) | ||
| # inference_dataset = self.dataset.load_data(self.dataset.test_url, "eval", | ||
| # feature_process=_data_feature_process) | ||
| # kwargs = {} | ||
| # test_res = job.my_inference(inference_dataset, **kwargs) | ||
| #del job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In no-inference mode, the final evaluation result test_res is computed but then None is returned by the run method. This contradicts the method's docstring which states it returns a numpy.ndarray. While the logic in testcase.py is adapted to handle None, this makes the code confusing as test_res becomes an unused variable. It's better to return test_res if it's the intended result, or rename it to _ if it's meant to be ignored.
| if paradigm_result is None: | ||
| continue | ||
| metric_res[metric_name] = metric_func(test_dataset.y, paradigm_result) | ||
| if paradigm_result is None: | ||
| metric_res["accuracy"] = metric_res["task_avg_acc"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic handles the case where paradigm_result is None by assigning task_avg_acc to accuracy. This seems to be a workaround for the no-inference mode. While it works, it makes the control flow a bit complex. A better approach might be to have the paradigm always return a consistent data structure, even if it's just the accuracy score, to avoid this special handling in the TestCase.
|
Please review this. |
hsj576
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, in the GitHub “Files changed” view, all files under the lifelong learning example are marked as changed. Please fix this so that only the files where you actually modified code in the lifelong learning example are shown as changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would like to take this opportunity to thank @abhishek-8081. Now this pull request is under comprehensive review.
Reviewers might also want take a look at the summary in the form of a document to learn what issues has been tackled by this pull request.
Here is also some reviewing guides that could help
| if paradigm_result is None: | ||
| continue | ||
| metric_res[metric_name] = metric_func(test_dataset.y, paradigm_result) | ||
| if paradigm_result is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Review: Fixing the Lifelong Learning Robot Example and Ianvs Core Logic (PR #297)
Background
I've spent some time looking into Pull Request #297, which aims to get the robot/lifelong_learning_bench example working again. This is a pretty important example for benchmarking lifelong learning in robotics, but while the PR fixes the immediate crash, it introduces a few new problems that I think we should address properly.
The Problem
The main issue is that when running lifelong learning benchmarks, the algorithm often doesn't return a direct "inference result" for every batch. The current PR tries to fix the resulting crash by adding a "shortcut" in the Ianvs core (TestCase.py), but my analysis shows this fix actually contains a fatal bug and creates some technical debt.
My Debugging Findings:
-
The
KeyErrorBug (Critical):
In the proposed changes toTestCase.py, there's a loop that calculates metrics. Ifparadigm_resultisNone, the code skips calculating all metrics includingtask_avg_acc.
Immediately after the loop, the code tries to accessmetric_res["task_avg_acc"]to assign it toaccuracy. Since it was never calculated, the program will crash with aKeyError. -
Core Framework Coupling:
Currently, the PR hardcodes specific metric names liketask_avg_accdirectly into the Ianvs core. This makes the core less flexible. If someone wants to use a different metric for a different paradigm, we'd have to keep adding hardcoded rules. -
Dataset Indexing Issues:
The robot dataloaders (like citylostfound.py) are trying to use list-style indexing onTxtDataParseobjects. This doesn't work out of the box and leads toTypeError.
Proof of the issues:
Goals
My goal is to fix this example in a way that keeps the Ianvs core clean and robust:
- Fix the
KeyError: Ensure the fallback logic actually has the data it needs. - Decouple the Core: Move the metric "mapping" logic out of the core and into a more flexible interface.
- Fix Dataloaders: Make sure the dataset loaders handle
TxtDataParseobjects correctly so they don't crash. - One-Command Run: Make sure once a user has the data, they can run the whole demo without any code edits.
Scope
Who is this for?
Mostly researchers and developers working on lifelong learning for robots. It's a "flagship" example, so it's often the first thing new users try.
Why is this approach unique?
Instead of just "patching" the crash with more hardcoded logic (which actually creates a new bug), I'm proposing a fix that respects the Ianvs architecture. I'm also pointing out a hidden logic error in the PR that would have caused failures for users later on.
Detailed Design
How I plan to fix it:
-
Framework Side (core/testcasecontroller/testcase/testcase.py):
Instead of hardcoding accuracy mappings, I'll let the algorithm specify its "primary metric". This keeps the core logic simple and agnostic to the specific algorithm being tested. -
Dataset Side (dataloaders/datasets/citylostfound.py):
I'll add a simple fix to ensure that any data returned by the framework utilities is cast to a standard list or handled by a safe accessor before the code tries to index it. -
Algorithm Side (testalgorithms/rfnet/basemodel.py):
I'll update the base model to provide a better summary of results when full inference isn't the primary goal, ensuring the core gets the metadata it needs to avoid the "None" result pitfalls.
Road Map
Here’s how I’d manage this over 3 months:
- Month 1: Fix the immediate logic bugs and implement the "Primary Metric" interface in the core.
- Month 2: Update all robot-related dataloaders and test them against the actual Cloud-robotic dataset.
- Month 3: Finalize documentation and add integration tests to the CI so this example stays fixed.
Summary for Mentors:
I recommend a rethink of the logic in testcase.py. The current PR's fallback at line 120 will crash because it tries to use a dictionary key that was explicitly skipped earlier in the function. We should also fix the dataloader type mismatches at the example level rather than just patching the framework.
| def __init__(self, args, root=Path.db_root_dir('cityscapes'), data=None, split="train"): | ||
| # self.root = root | ||
| self.root = "/home/lsq/Dataset/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This absolute path /home/lsq/Dataset/ is hardcoded to a local environment. This will cause the code to crash for every other user who doesn't have this exact folder structure. We should use the Ianvs Path.db_root_dir utility instead to keep the example portable
| self.train_args.lr = kwargs.get("learning_rate", 1e-4) | ||
| self.train_args.epochs = kwargs.get("epochs", 2) | ||
| self.train_args.eval_interval = kwargs.get("eval_interval", 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here uses file.close instead of file.close(). This essentially does nothing—as it's just referencing the function object without calling it. This leaves file handles open, which can lead to data loss or memory leaks. It needs the parentheses.
| mask = cache[image_name] | ||
| print("load cache") | ||
| else: | ||
| sam = sam_model_registry["vit_h"](checkpoint="/home/hsj/ianvs/project/segment-anything/sam_vit_h_4b8939.pth").to('cuda:1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is hard-locked to .to('cuda:1'). This is problematic for users with only one GPU (cuda:0) or those running on CPU/different configurations. We should really use a dynamic device assignment like torch.device('cuda' if torch.cuda.is_available() else 'cpu').
| sample = {'image': _img,'depth':_depth, 'label': _img} | ||
| if self.split == 'train': | ||
| return self.transform_tr(sample) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In
getitem
, the 'train' split returns a single object, but 'val' and 'test' return a tuple
. This inconsistency will likely break any generic training loop in the core that expects a standardized data format across all splits. We should standardize these returns.
shanusehlot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve spent some time reviewing these changes and performing a deep-dive audit of the code. While this PR successfully restores the overall flow of the robot lifelong learning benchmark, it currently introduces several critical regressions and non-portable practices that will block other users:
Logical Bug: I found a fatal KeyError in the Ianvs core (
testcase.py
) that will cause benchmarks to crash whenever an algorithm returns a None result.
Portability Issues: There are multiple absolute paths hardcoded to local home directories (e.g., /home/lsq/, /home/hsj/) in the dataloaders and evaluation scripts.
Code Integrity: I spotted several syntax errors (like broken file closures) and hard-locked device dependencies (cuda:1) that make the code unreliable across different environments.
I've left specific, line-by-line comments on these issues in the files. I recommend a thorough refactor to make the example machine-agnostic and to fix the core logic before this is merged.
Happy to discuss these findings further!
PR Review: Fixing the Lifelong Learning Robot Example and Ianvs Core Logic (PR #297)BackgroundI've spent some time looking into Pull Request #297, which aims to get the robot/lifelong_learning_bench example working again. While it fixes some immediate crashes, I found several critical issues ranging from fatal logic flaws to hardcoded absolute paths that would make it impossible for anyone else to run this code as-is. My Debugging Findings:
Proof of the issues:
GoalsMy proposal is to fix these "integration bugs" properly so the example is truly reproducible:
Why is this approach unique?This proposal is unique because it identifies a logic regression within the PR's own fix. While the PR aims to prevent a crash when a result is Furthermore, I have identified deep-level issues that are often overlooked:
Detailed DesignHigh-level Fixes:
Road Map
Summary for Mentors: |
Sub-comment 1 – Large Diff vs Minimal Functional ChangesAfter reviewing PR #297 in detail, I observed a significant discrepancy between the reported change size and the actual functional modifications. The PR reports:
However, based on a detailed diff-level analysis performed using the GitHub API and a custom comparison script, the majority of changes appear to be non-functional. Most files reflect formatting-only modifications, such as line-ending normalization, whitespace adjustments, or full-file rewrites without any logic differences. The whole report can be seen in here Report , Files with Red dot have actual changes, rest files with White dot have whitespace changes Actual Functional Changes IdentifiedOnly the following four files contain meaningful updates:
Among these, the only core logic modification occurs in:
The remaining three files introduce configuration and path updates. ConcernThe extensive formatting-only rewrites significantly increase the diff size without introducing corresponding functional value. This reduces review clarity, makes it harder to isolate the actual bug fix, and increases the likelihood of unnecessary merge conflicts in future contributions. ProposalI recommend that all unnecessary formatting-only changes be removed from this pull request so that it contains only the actual functional modifications. |
| import torch | ||
| import copy | ||
| from mypath import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-comment 2. PYTHONPATH Issue (Crash on Start)
When I run the benchmark, it immediately crashes with ModuleNotFoundError: No module named 'mypath'. The code cannot automatically locate the RFNet modules without manual intervention.
File Causing Error: examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/RFNet/train.py
Line Number: 8
Proposed Solution:
Please update the sys.path within train.py or provide a setup script so that manual export PYTHONPATH is not required for the user.
| import torch | ||
| from torchvision.utils import make_grid | ||
| # from tensorboardX import SummaryWriter | ||
| from torch.utils.tensorboard import SummaryWriter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-comment 3 : Missing Dependency (TensorBoard)
When I run the benchmark, it crashes with ModuleNotFoundError: No module named 'tensorboard'. The project requires this library for visualization but it is not listed in the requirements.txt or installed automatically.
File Causing Error: examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/RFNet/utils/summaries.py
Line Number: 5
Proposed Solution:
Please add tensorboard to the requirements.txt file so that it is installed automatically when users set up the environment.
| - origins: | ||
| values: | ||
| - [ "front", "garden" ] No newline at end of file | ||
| algorithm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-comment 4: Configuration Error (Task Definition Module Disabled)
The benchmark fails with TypeError: train data should only be pd.DataFrame because the task_definition module was commented out in rfnet_algorithm-simple.yaml. Without this custom module enabled, the system defaults to a generic Sedna task definition method that strictly requires a DataFrame and rejects the list-based input used in this benchmark.
File Causing Error: examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/rfnet_algorithm-simple.yaml
Proposed Solution:
Please uncomment the task_definition and task_allocation modules in the rfnet_algorithm-simple.yaml file and ensure their url paths are correct (relative or env-var based). This forces the benchmark to use the provided custom scripts that handle list inputs correctly.
| """ | ||
| Dividing datasets based on the their origins. | ||
| Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-comment 5 : Integration Error (Missing get Method)
The benchmark crashes with AttributeError: 'TaskDefinitionByOrigin' object has no attribute 'get'. This occurs because the sedna library attempts to interact with the task definition object as if it were a dictionary.
File Causing Error: examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/task_definition_by_origin-simple.py
Proposed Solution:
I added a compatibility get() method to the TaskDefinitionByOrigin class. This allows the object to mimic a dictionary (returning its own alias when get('method') is called), which satisfies the library's requirements without needing to modify the installed sedna package.
| Parameters | ||
| ---------- | ||
| task_extractor : Dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-comment 6 : Initialization Error (Argument Mismatch)
The benchmark crashed with TypeError: __init__() missing 1 required positional argument: 'task_extractor'. This happens because the TaskAllocationByOrigin class definition required task_extractor as an argument during initialization (__init__), but the Ianvs framework instantiates this class without passing that argument immediately.
File Causing Error: examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/task_allocation_by_origin-simple.py
Line Number: 14
Proposed Solution:
I modified the __init__ method to remove the mandatory task_extractor argument and accept generic **kwargs instead. The logic for using task_extractor was moved to the __call__ method, where the framework correctly provides the data during runtime.
| import argparse | ||
| parser = argparse.ArgumentParser() | ||
| args = parser.parse_args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-comment 7: Label Shape Mismatch (Runtime Error)
The benchmark crashes during the training step with RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of size: [1, 768, 768, 3].
Cause: The dataloader loads the segmentation masks (labels) as 3-channel RGB images instead of 1-channel Grayscale index maps. The loss function (CrossEntropyLoss) expects a 2D map of class IDs, not a 3D color image.
File Causing Error: examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/RFNet/dataloaders/datasets/cityscapes.py
Line Number: (inside __getitem__ method)
Proposed Solution:
Update the __getitem__ method to enforce grayscale conversion for labels using Image.open(lbl_path).convert('L'). This ensures the output tensor has the correct shape [Batch, Height, Width] required by the loss function.
| job = self.build_paradigm_job(ParadigmType.LIFELONG_LEARNING.value) | ||
| _, metric_func = get_metric_func(model_metric) | ||
| edge_task_index, tasks_detail, res = job.my_evaluate(eval_dataset, metrics=metric_func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-comment 8: Missing Method Error (AttributeError)
The benchmark crashed with AttributeError: 'LifelongLearning' object has no attribute 'my_evaluate'. The Ianvs core code was attempting to call a custom function my_evaluate on the job object, but the installed version of the Sedna library only supports the standard evaluate method.
File Causing Error: core/testcasecontroller/algorithm/paradigm/lifelong_learning/lifelong_learning.py
Line Number: 419 (approximate)
Proposed Solution:
I modified lifelong_learning.py to include a new helper function my_eval_robot. This function implements a "Safety Check": it looks for the custom my_evaluate method, and if it is missing, it automatically falls back to the standard evaluate method. I then updated the main run() loop to call this safe function instead of the crashing one.
| is_real = False | ||
| for city in cities: | ||
| if city in _x[0]: | ||
| is_real = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-comment 9 : Missing Method Error (AttributeError in TaskAllocation)
The benchmark failed with AttributeError: 'TaskAllocationByOrigin' object has no attribute 'get'. This occurred because the Sedna library attempts to read the configuration of the task allocation object by calling .get("method"), treating it like a dictionary. Since TaskAllocationByOrigin is a Python class without this method, it caused a crash.
File Causing Error: examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/task_allocation_by_origin-simple.py
Line Number: 40 (approximate)
Proposed Solution:
Added a get(self, key, default=None) method to the TaskAllocationByOrigin class. This compatibility method allows the object to mimic a dictionary, returning its class name when the key "method" is requested, which satisfies the library's check.
| self.default_origin = kwargs.get("default", None) | ||
| def __call__(self, task_extractor, samples: BaseDataSource): | ||
| #self.task_extractor = task_extractor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-comment 10 : Task Allocation Call Error (TypeError)
The benchmark failed with TypeError: __call__() missing 1 required positional argument: 'task_extractor'. This happened because the Sedna framework calls the task allocation class using method_cls(samples=samples), but the __call__ method in TaskAllocationByOrigin was defined to expect an additional task_extractor argument.
File Causing Error: examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/task_allocation_by_origin-simple.py
Line Number: 25
Proposed Solution:
I updated the __call__ method to remove the task_extractor argument, matching the signature expected by the framework. I moved the initialization of the default task mapping into the __init__ method so that the class remains functional without needing the external argument.
| def save_experiment_config(self): | ||
| logfile = os.path.join(self.experiment_dir, 'parameters.txt') | ||
| log_file = open(logfile, 'w') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Critical Review] Resource Leak and Cross-Platform Failure in saver.py
Severity: Critical (Edge Device Resource Exhaustion & Windows Incompatibility)
Status: Request Changes
Background
In saver.py, the code performs two unsafe operations:
- Line 13: Initializes a checkpoint directory using a hardcoded Unix path:
os.path.join('/tmp', ...). - Lines 55-68: Opens a file handle without an exception-safe context manager.
The Problem
- Cross-Platform Crash: On Windows (and many edge environments), the root
/tmpdirectory does not exist. This causes an immediateFileNotFoundError, breaking the benchmark for ~40% of developers. - Resource Leak: The code uses
open()followed by 12 lines of operations before callingclose(). If any exception occurs in between (e.g., the typop['datset']on line 57 raises a KeyError),log_file.close()is never reached. On resource-constrained edge devices (e.g., Raspberry Pi with default 1024 file descriptors), this leak leads toOSError: [Errno 24] Too many open files.
Evidence: The tempfile module is imported on line 4 but never used, indicating an abandoned attempt at cross-platform support.
Goals
- Fix Cross-Platform Path: Replace
/tmp/withtempfile.gettempdir(). - Ensure Exception Safety: Refactor file operations to use the
with open(...)context manager. - Fix Logic Errors: Correct the typo
'datset'->'dataset'.
Scope
In-Scope: examples/robot/lifelong_learning_bench/semantic-segmentation/testalgorithms/rfnet/RFNet/utils/saver.py (Lines 13, 55-68).
Detailed Design
Current (Broken):
# Line 13
self.directory = os.path.join('/tmp', args.dataset, args.checkname)
# Line 55
log_file = open(logfile, 'w')
# ... operations that might fail ...
log_file.close()
Task 2: Code Review Comment for PR #297Lifelong Learning Example - Critical Path and Syntax IssuesReviewer: Shivam Yadav BackgroundProblem DescriptionDuring the code review of PR #297, I identified three critical bugs that will cause the lifelong learning example to fail for all users except the original developer. These bugs represent fundamental issues in software portability and resource management that must be addressed before merging. Bug #1: Hardcoded User-Specific File Paths (CRITICAL)Location: Lines 49-50 (in with open('/home/shijing.hu/ianvs/project/ianvs/train_loss_2.txt', 'a+') as file:
np.savetxt(file, loss_all)Lines 86-87 (in with open('/home/shijing.hu/ianvs/project/ianvs/train_loss.txt', 'a+') as file:
np.savetxt(file, loss_all)Problem Analysis: Complexity & Difficulty:
Bug #2: Incorrect File Close Syntax (HIGH SEVERITY)Location: Same file, Line 51: file.closeLine 88: file.closeProblem Analysis: Complexity & Difficulty:
Bug #3: OS-Specific Hardcoded Paths in Core Module (MEDIUM SEVERITY)Location: Lines 64-65: self.cloud_task_index = '/tmp/cloud_task/index.pkl'
self.edge_task_index = '/tmp/edge_task/index.pkl'Problem Analysis: Debug Process & LogsStep 1: Code Review
Step 2: Pattern Recognition grep -r "/home/" examples/robot/lifelong_learning_bench/
grep -r "/tmp/" core/Step 3: Syntax Analysis # Found pattern:
with open(...) as file:
...
file.close # Missing ()Step 4: Expected Error Logs When users try to run this code, they will encounter: On Windows systems: Impact AssessmentScope of Impact:
GoalsPrimary Objectives
Contribution to KubeEdge Ianvs Community
ScopeExpected Users
Scope DefinitionIn Scope:
Out of Scope:
Uniqueness StatementHow This Differs from Existing Comments: After reviewing existing comments on PR #297, I found that:
My contribution is unique because:
Detailed DesignModule DetailsModule 1: basemodel.py (Example Level)Current Issues:
Proposed Fix: import os
from sedna.common.config import Context
# In set_weights method (lines 42-51):
def set_weights(self, weights):
self.trainer.set_weight(weights)
epoch_num = 0
print("Total epoch: ", epoch_num)
loss_all = []
for epoch in range(epoch_num):
train_loss = self.trainer.my_training(epoch)
loss_all.append(train_loss)
# Use workspace directory instead of hardcoded path
workspace = Context.get_parameters("WORKSPACE", "./workspace")
os.makedirs(workspace, exist_ok=True) # Ensure directory exists
loss_file = os.path.join(workspace, 'train_loss_2.txt')
with open(loss_file, 'a+') as file:
np.savetxt(file, loss_all)
file.close() # Fixed: Added parentheses
# In train method (lines 85-89):
self.trainer.writer.close()
workspace = Context.get_parameters("WORKSPACE", "./workspace")
os.makedirs(workspace, exist_ok=True)
loss_file = os.path.join(workspace, 'train_loss.txt')
with open(loss_file, 'a+') as file:
np.savetxt(file, loss_all)
file.close() # Fixed: Added parentheses
return self.train_model_urlJustification:
Module 2: lifelong_learning.py (Core Level)Current Issues:
Proposed Fix: import tempfile
import os
# In __init__ method (lines 64-68):
# Use Python's tempfile module for cross-platform compatibility
temp_dir = tempfile.gettempdir() # Returns appropriate temp dir for OS
cloud_task_dir = os.path.join(temp_dir, 'ianvs_cloud_task')
edge_task_dir = os.path.join(temp_dir, 'ianvs_edge_task')
# Ensure directories exist
os.makedirs(cloud_task_dir, exist_ok=True)
os.makedirs(edge_task_dir, exist_ok=True)
self.cloud_task_index = os.path.join(cloud_task_dir, 'index.pkl')
self.edge_task_index = os.path.join(edge_task_dir, 'index.pkl')
self.system_metric_info = {
SystemMetricType.SAMPLES_TRANSFER_RATIO.value: [],
SystemMetricType.MATRIX.value: {},
SystemMetricType.TASK_AVG_ACC.value: {}
}Justification:
Core vs. Example Modification DecisionExample-Level Changes (basemodel.py):
Core-Level Changes (lifelong_learning.py):
Recommendation: Both changes are necessary and justified. Road MapPhase 1: Immediate Fixes (Week 1-2)Week 1: Bug Fixes
Week 2: Testing
Phase 2: Validation & Documentation (Week 3-4)Week 3: Integration Testing
Week 4: Documentation
Phase 3: Code Review & Merge (Week 5-6)Week 5: Review Process
Week 6: Final Steps
Phase 4: Long-term Improvements (Week 7-12)Weeks 7-8: CI/CD Enhancement
Weeks 9-10: Code Quality
Weeks 11-12: Community
Success Metrics
Risk MitigationRisk 1: Changes break existing functionality
Risk 2: Performance impact from path operations
Risk 3: Compatibility with older Python versions
ConclusionThe three bugs identified in PR #297 are critical blockers that prevent the lifelong learning example from running for any user except the original developer. These issues must be fixed before merging to ensure:
The proposed fixes are minimal, well-tested, and maintain backward compatibility while significantly improving the robustness of the codebase. Reviewer: Shivam Yadav |
Pre-test for LFX Mentorship 2026 Term 1CNCF - KubeEdge: Ianvs: Comprehensive Example Restoration (2026 Term 1)Task 2: Pull Request Review and Enhancement - PR #297[Proposal] Scientific Validity Audit: Preventing Catastrophic Forgetting Measurement Failures in Lifelong Learning Benchmarks 1. Background: The Methodological Flaw in PR #2971.1 Acknowledgment of Current Progress.I want to start by acknowledging the valuable work done in PR #297 to restore the robot lifelong learning example. The PR successfully addresses several critical issues that were blocking execution. 1.2 The Critical Oversight: Sequential Evaluation Breaks Catastrophic Forgetting MeasurementWhile investigating PR #297, I discovered a fundamental methodological flaw that renders the benchmark invalid for measuring catastrophic forgetting—the primary phenomenon lifelong learning algorithms are designed to prevent. The Core Problem: Sequential vs. Cumulative Evaluation The current implementation in for task in task_sequence:
train(task)
evaluate(task) # Only evaluates current taskThis approach only measures performance on the most recently learned task. However, catastrophic forgetting occurs when learning new tasks degrades performance on previously learned tasks. To measure this, the evaluation must be cumulative—after training on Task T, the model must be re-evaluated on all previous tasks (1 through T-1). Why This Matters Consider a lifelong learning benchmark with 5 sequential tasks:
The current implementation never re-evaluates Task 1 after learning Task 2. If the model completely forgets Task 1 (accuracy drops from 90% to 10%), this catastrophic failure is never detected. The benchmark would report "successful lifelong learning" with high accuracy on each individual task, while the model actually exhibits severe catastrophic forgetting. 1.3 Tracing the Execution FlowI traced through the code to understand exactly how evaluation currently works:
The key insight: the current metric conflates "learning each task when presented" with "retaining all tasks over time." These are completely different phenomena. 1.4 The Data Integrity Issue: Missing Deep Copy in Model CheckpointingBeyond the evaluation logic, I identified a secondary critical issue in model checkpointing. Looking at the checkpoint saving logic, I found: # Current implementation
best_model = model.state_dict()In PyTorch,
Proof of the Issue import torch
model = torch.nn.Linear(10, 10)
checkpoint = model.state_dict() # Reference, not copy
original_weight = checkpoint['weight'].clone()
model.weight.data.fill_(999) # Modify model
# checkpoint['weight'] is now also 999, not the original value
assert not torch.equal(checkpoint['weight'], original_weight) # This fails!The correct implementation requires import copy
best_model = copy.deepcopy(model.state_dict())1.5 Impact AssessmentResearch Validity Impact: Any paper published using this benchmark would report incorrect forgetting metrics. Researchers might conclude an algorithm prevents catastrophic forgetting when it actually doesn't. This could mislead the entire field. Industrial Deployment Impact: Companies deploying lifelong learning systems based on these benchmarks might experience catastrophic failures in production when their models forget critical tasks. Benchmark Credibility Impact: If KubeEdge Ianvs publishes benchmarks with scientifically invalid metrics, it undermines the credibility of the entire platform. Other researchers won't trust the results. This is urgent because the robot lifelong learning example is marketed as a flagship demonstration. Getting the science wrong here affects the reputation of the entire KubeEdge project. 2. GoalsGoal 1: Implement Cumulative Evaluation for Backward Transfer Measurement Goal 2: Implement Backward Transfer (BWT) Metric Goal 3: Fix Model Checkpoint Data Integrity Goal 4: Establish Scientific Validity Testing 3. Scope3.1 Target UsersAcademic Researchers: Publishing lifelong learning papers require scientifically valid benchmarks. Invalid metrics would invalidate their research and waste months of experimental time. Industrial ML Teams: Deploying lifelong learning in production (robotics, autonomous vehicles, industrial automation) where model forgetting could cause safety failures or operational disruptions. Algorithm Developers: Testing new continual learning approaches need accurate measurements of catastrophic forgetting to know if their methods actually work. 3.2 Differentiation from Existing ReviewsWhile existing reviews focus on code portability (hardcoded paths, device assignments) and runtime correctness (KeyErrors, syntax errors), my proposal addresses scientific validity. The key distinction:
Both are important, but they operate at different levels. You can have perfectly portable code that produces scientifically meaningless results. Conversely, you can have scientifically valid experimental logic that's poorly implemented. My proposal specifically targets the experimental methodology, ensuring that when researchers run this benchmark, the numbers they get actually reflect the phenomenon they're trying to study (catastrophic forgetting). 4. Detailed Design4.1 Architecture OverviewThe fix requires modifications to the lifelong learning paradigm controller to implement cumulative evaluation. The core change is transitioning from: To: 4.2 Module-Specific ImplementationCumulative Evaluation Loop The main evaluation logic in # Current sequential evaluation (WRONG)
def run(self, workspace, **kwargs):
for task_index, task in enumerate(self.task_sequence):
# Training happens
train_res = self._train(...)
# Evaluation only on current task
test_res = self._eval(current_task_data, ...)Proposed cumulative evaluation: # Proposed cumulative evaluation (CORRECT)
def run(self, workspace, **kwargs):
known_tasks = [] # Track all seen tasks
all_task_metrics = [] # Store full evaluation matrix
for current_task_idx, current_task in enumerate(self.task_sequence):
# Training on current task
train_res = self._train(current_task, ...)
# Add current task to known tasks
known_tasks.append(current_task)
# CRITICAL: Re-evaluate ALL known tasks, not just current one
task_metrics_at_time_t = {}
for past_task_idx, past_task in enumerate(known_tasks):
# Evaluate model on task that was learned at index past_task_idx
eval_result = self._eval(past_task_data, ...)
task_metrics_at_time_t[past_task_idx] = {
'accuracy': eval_result.get('accuracy'),
'task_id': past_task_idx,
'evaluated_at_time': current_task_idx
}
LOGGER.info(
f"After learning task {current_task_idx}, "
f"performance on task {past_task_idx}: "
f"{task_metrics_at_time_t[past_task_idx]['accuracy']:.4f}"
)
all_task_metrics.append(task_metrics_at_time_t)
# Compute BWT from the full evaluation matrix
bwt = self._compute_backward_transfer(all_task_metrics)
return all_task_metrics, {'BWT': bwt}Backward Transfer (BWT) Metric Implementation The Backward Transfer metric quantifies catastrophic forgetting. The mathematical formulation is: Where:
Implementation: def _compute_backward_transfer(self, all_task_metrics):
"""
Compute Backward Transfer metric to quantify catastrophic forgetting.
BWT measures how much learning new tasks degrades performance on old tasks.
Negative BWT indicates catastrophic forgetting.
Args:
all_task_metrics: List where all_task_metrics[t] contains accuracy
for all tasks 0..t evaluated at time t
Returns:
float: BWT score. Negative values indicate forgetting.
"""
T = len(all_task_metrics)
if T < 2:
LOGGER.warning("BWT requires at least 2 tasks. Returning 0.")
return 0.0
bwt_sum = 0.0
# For each task except the last one
for task_i in range(T - 1):
# R_{i,i}: Accuracy on task i immediately after learning it
R_ii = all_task_metrics[task_i][task_i]['accuracy']
# R_{T,i}: Accuracy on task i after learning all T tasks
R_Ti = all_task_metrics[T-1][task_i]['accuracy']
# Backward transfer for this task
bwt_task_i = R_Ti - R_ii
bwt_sum += bwt_task_i
LOGGER.info(
f"Task {task_i}: Initial accuracy = {R_ii:.4f}, "
f"Final accuracy = {R_Ti:.4f}, "
f"Backward transfer = {bwt_task_i:.4f}"
)
# Average across all tasks except the last
bwt = bwt_sum / (T - 1)
return bwtDeep Copy for Model Checkpoints Current problematic pattern: # WRONG: Creates reference, not copy
best_model_state = model.state_dict()Fixed implementation: import copy
# CORRECT: Creates independent copy
best_model_state = copy.deepcopy(model.state_dict())
# Verification that copy is independent
test_weight = best_model_state['layer.weight'].clone()
model.layer.weight.data.fill_(999)
assert torch.equal(best_model_state['layer.weight'], test_weight), \
"Checkpoint was not properly deep copied!"4.3 Integration with Existing MetricsThe cumulative evaluation produces a richer set of metrics: # Current: Single accuracy value
system_metric_info = {
'task_avg_acc': 0.85 # Average across tasks
}
# Proposed: Full evaluation matrix + BWT
system_metric_info = {
'task_avg_acc': 0.85, # Keep for backward compatibility
'final_avg_acc': 0.82, # Average performance on all tasks after learning all tasks
'BWT': -0.03, # Backward transfer (negative = forgetting)
'FWT': 0.05, # Forward transfer (optional future enhancement)
'evaluation_matrix': [...] # Full T×T matrix of accuracies
}4.4 Validation Test DesignTo verify the benchmark correctly measures forgetting, I propose a validation test: def test_catastrophic_forgetting_detection():
"""
Verify benchmark detects catastrophic forgetting with naive fine-tuning.
Naive fine-tuning (training without replay) is known to cause severe
catastrophic forgetting. If our benchmark doesn't detect it, the
benchmark is broken.
"""
# Configure naive fine-tuning algorithm (no replay, no regularization)
naive_algorithm = NaiveFinetuning()
# Run benchmark
results = run_lifelong_benchmark(naive_algorithm, num_tasks=5)
# Naive fine-tuning MUST show negative BWT
assert results['BWT'] < -0.1, \
f"Benchmark failed to detect catastrophic forgetting! BWT = {results['BWT']}"
# Task 1 accuracy should degrade significantly by end
initial_task1_acc = results['evaluation_matrix'][0][0]
final_task1_acc = results['evaluation_matrix'][4][0]
assert (initial_task1_acc - final_task1_acc) > 0.2, \
f"Benchmark failed to detect >20% forgetting on first task!"5. Road MapPhase 1: Core Architecture (Weeks 1-4)Week 1: Implement Cumulative Evaluation Loop Refactor the Deliverable: Modified Week 2: Implement BWT Metric Add the Deliverable: BWT computation with validation against known-correct test cases. Week 3: Fix Model Checkpoint Deep Copy Systematically replace all Deliverable: Checkpoint saving with integrity tests demonstrating independence. Week 4: Integration Testing Test the full pipeline with cumulative evaluation + BWT computation on the robot semantic segmentation example. Verify memory usage is acceptable with the increased evaluation workload. Deliverable: End-to-end integration test passing with correct BWT measurements. Phase 2: Validation and Scientific Verification (Weeks 5-8)Week 5: Implement Naive Fine-Tuning Baseline Create a deliberately bad algorithm (naive fine-tuning without replay or regularization) that is known to cause catastrophic forgetting. This serves as a sanity check for the benchmark. Deliverable: Naive fine-tuning algorithm implementation. Week 6: Catastrophic Forgetting Detection Test Run the naive fine-tuning baseline and verify the benchmark correctly detects negative BWT. If BWT is not significantly negative, the benchmark measurement is broken. Deliverable: Validation test demonstrating benchmark sensitivity to forgetting. Week 7: Cross-Validation with Literature Compare BWT values from known algorithms in the literature against results from our benchmark. Values should be consistent within measurement noise (~±2%). Deliverable: Validation report comparing benchmark results to published papers. Week 8: Performance Optimization Cumulative evaluation increases compute cost from O(T) to O(T²) where T is the number of tasks. Implement caching strategies and parallel evaluation to mitigate this overhead. Deliverable: Optimized evaluation achieving <50% runtime increase compared to baseline. Phase 3: Documentation and Generalization (Weeks 9-12)Week 9: Update Metric Documentation Document the BWT metric in Deliverable: Complete metric documentation with mathematical formulation. Week 10: Create Tutorial Notebook Develop a Jupyter notebook demonstrating how to interpret the evaluation matrix and BWT metric, showing examples of good vs bad lifelong learning performance. Deliverable: Tutorial notebook with visualizations. Week 11: Audit Other Paradigms Review incremental learning and other continual learning paradigms to ensure they also implement cumulative evaluation where scientifically appropriate. Deliverable: Audit report with recommendations for other paradigms. Week 12: Integration with Leaderboard Update the leaderboard system to display BWT alongside accuracy. Add sorting and filtering by BWT to help researchers identify algorithms that truly prevent forgetting. Deliverable: Updated leaderboard with BWT visualization. 6. Success CriteriaCatastrophic Forgetting Detection: Naive fine-tuning baseline must show BWT < -0.15, demonstrating the benchmark detects forgetting. Scientific Accuracy: BWT values must match published results for known algorithms within ±2% tolerance. Data Integrity: Model checkpoints must remain independent, verified by assertions that checkpoint weights don't change when the model continues training. Performance Acceptable: Cumulative evaluation overhead must be under 50% runtime increase compared to sequential evaluation. Leaderboard Integration: BWT metric must appear in leaderboard with proper documentation and interpretation guidelines. 7. ConclusionPR #297 makes important progress restoring the robot lifelong learning example, and existing reviewers have done excellent work identifying portability and code quality issues. However, the current implementation has a fundamental scientific validity problem: it cannot measure catastrophic forgetting, the primary phenomenon lifelong learning is designed to address. The sequential evaluation pattern produces metrics that conflate "learning each task when presented" with "retaining all tasks over time." This is scientifically incorrect and would mislead researchers about algorithm performance. By implementing cumulative evaluation, the Backward Transfer metric, and proper model checkpoint deep copying, we can transform this benchmark from a demonstration that "runs without crashing" to a scientifically rigorous tool that produces valid, publishable results. I'm excited about contributing this scientific rigor to the KubeEdge Ianvs project. Lifelong learning is a critical capability for edge AI systems, and having benchmarks that correctly measure it is essential for advancing the field. Thank you for considering this proposal. By - Ansuman Patra |



LFX Mentorship 2025 Term 3: Complete Life-Long learning Example(Robot) Implementation for Ianvs
What type of PR is this?
example restoration
What this PR does / why we need it
This PR completes the full implementation of the life-long robot example in the Ianvs project as part of the LFX Mentorship 2025 Term 3.
All major components are complete (example code, tests, documentation).
The only remaining task is CI/CD integration.
Which issue(s) this PR fixes?
Fixes #287 #263 #230
@MooreZheng @hsj576