Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughAdds a new DOPMerging class for data-free continual model merging using SVD-based, layer-wise optimization with optional MGDA/EMA; centralizes leaf-module utilities in Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant DOPMerging as "DOPMerging"
participant ModelPool as "ModelPool"
participant SVDSolver as "SVD / MGDA Solver"
participant Layer as "Layer (nn.Linear / other)"
User->>DOPMerging: run(modelpool)
DOPMerging->>ModelPool: fetch pretrained + finetuned models
loop per model
DOPMerging->>DOPMerging: _layer_wise_optimize()
loop per leaf module
alt module is nn.Linear
DOPMerging->>SVDSolver: compute per-task SVD & gradients
SVDSolver-->>DOPMerging: MGDA weights / merged projection
DOPMerging->>Layer: update weight/bias
else non-linear or excluded
DOPMerging->>Layer: apply simple_average()
end
end
end
DOPMerging-->>User: return merged model
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR introduces an architecture-agnostic implementation of the DOP (Dual Projections for Balancing Stability and Plasticity) merging algorithm. The changes refactor common utility functions into shared modules and add a new general-purpose DOP implementation that can work with any model architecture, not just CLIP vision models.
Changes:
- Refactored utility functions (
is_leaf_module,named_leaf_modules,print_params) from method-specific modules to shared utility modules for reusability - Added new utility functions (
is_ray_available,dtype_support_svd) to support future functionality and SVD operations - Implemented a new architecture-agnostic DOP merging algorithm (
DOPMerging) that works with any PyTorch model, complementing the existing CLIP-specific implementation
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 22 comments.
Show a summary per file
| File | Description |
|---|---|
| fusion_bench/utils/packages.py | Added is_ray_available() function for future Ray framework integration |
| fusion_bench/utils/dtype.py | Added dtype_support_svd() to check if a PyTorch dtype supports SVD operations |
| fusion_bench/models/utils.py | Added is_leaf_module() and named_leaf_modules() utility functions for traversing model architectures |
| fusion_bench/method/opcm/utils.py | Removed duplicate is_leaf_module() and updated to import from shared utils |
| fusion_bench/method/opcm/opcm.py | Updated imports to use is_leaf_module from shared utils |
| fusion_bench/method/dop/utils.py | Removed duplicate is_leaf_module(), added print_params() utility function |
| fusion_bench/method/dop/dop_general.py | New architecture-agnostic DOP implementation with layer-wise optimization for any model type |
| fusion_bench/method/dop/dop.py | Removed print_params() method (now in utils) |
| # Scaled back-propagation | ||
| loss = 0 | ||
| for i, finetuned_weight in enumerate(finetuned_weights.values()): | ||
| # Comptue gradients of each loss function wrt parameters |
There was a problem hiding this comment.
Spelling error: "Comptue" should be "Compute".
| # Comptue gradients of each loss function wrt parameters | |
| # Compute gradients of each loss function wrt parameters |
| def is_ray_available(): | ||
| return _is_package_available("ray") |
There was a problem hiding this comment.
The function is_ray_available() is added but doesn't appear to be used anywhere in the codebase. While it follows the established pattern of other availability check functions in this file, consider whether this addition is necessary for this PR or if it should be included when Ray functionality is actually implemented.
| def run(self, modelpool: BaseModelPool): | ||
| """ | ||
| Execute the DOP merging algorithm on a pool of models. | ||
|
|
||
| Merges models sequentially, where each new model is merged with the | ||
| previously merged result. The first model is used as-is, and subsequent | ||
| models are merged using layer-wise optimization. | ||
|
|
||
| Args: | ||
| modelpool: The model pool containing models to merge and the pretrained model. | ||
|
|
||
| Returns: | ||
| The final merged model after sequentially merging all models in the pool. | ||
| """ | ||
| model_names = modelpool.model_names | ||
| if self.shuffle_order: | ||
| random.shuffle(model_names) | ||
|
|
||
| pretrained_model = modelpool.load_pretrained_model() | ||
|
|
||
| merged_model = None | ||
| for model_idx, model_name in enumerate(model_names): | ||
| print( | ||
| f"--------- Optimizing {model_idx + 1}/{len(model_names)}-th with {model_name} ---------" | ||
| ) | ||
| if model_idx == 0: | ||
| merged_model = modelpool.load_model(model_names[0]) | ||
| else: | ||
| merged_model = self._layer_wise_optimize( | ||
| model_names=["merged", model_name], | ||
| pretrained_model=deepcopy(pretrained_model), | ||
| finetuned_models={ | ||
| "merged": merged_model, | ||
| model_name: modelpool.load_model(model_name), | ||
| }, | ||
| model_idx=model_idx, | ||
| ) | ||
|
|
||
| return merged_model | ||
|
|
There was a problem hiding this comment.
The run method does not set the random seed despite accepting a seed parameter in __init__. The original CLIP-specific implementation (ContinualDOPForCLIP) in dop.py properly handles seeding with either L.seed_everything(self.seed) or seed_everything_by_time(self.fabric). This architecture-agnostic version should implement similar seeding behavior for reproducibility.
| def run(self, modelpool: BaseModelPool): | ||
| """ | ||
| Execute the DOP merging algorithm on a pool of models. | ||
|
|
||
| Merges models sequentially, where each new model is merged with the | ||
| previously merged result. The first model is used as-is, and subsequent | ||
| models are merged using layer-wise optimization. | ||
|
|
||
| Args: | ||
| modelpool: The model pool containing models to merge and the pretrained model. | ||
|
|
||
| Returns: | ||
| The final merged model after sequentially merging all models in the pool. | ||
| """ | ||
| model_names = modelpool.model_names | ||
| if self.shuffle_order: | ||
| random.shuffle(model_names) | ||
|
|
||
| pretrained_model = modelpool.load_pretrained_model() | ||
|
|
||
| merged_model = None | ||
| for model_idx, model_name in enumerate(model_names): | ||
| print( | ||
| f"--------- Optimizing {model_idx + 1}/{len(model_names)}-th with {model_name} ---------" | ||
| ) | ||
| if model_idx == 0: | ||
| merged_model = modelpool.load_model(model_names[0]) | ||
| else: | ||
| merged_model = self._layer_wise_optimize( | ||
| model_names=["merged", model_name], | ||
| pretrained_model=deepcopy(pretrained_model), | ||
| finetuned_models={ | ||
| "merged": merged_model, | ||
| model_name: modelpool.load_model(model_name), | ||
| }, | ||
| model_idx=model_idx, | ||
| ) | ||
|
|
||
| return merged_model | ||
|
|
There was a problem hiding this comment.
The save_on_every_step and evaluate_on_every_step parameters are accepted in __init__ but never used in the run method. In the original CLIP-specific implementation (ContinualDOPForCLIP), these parameters control intermediate model saving and evaluation during the merging process. This functionality should either be implemented or these parameters should be removed if they're not intended for this architecture-agnostic version.
| from fusion_bench.utils.json import save_to_json | ||
|
|
||
| from .min_norm_solvers import MinNormSolver, gradient_normalizers | ||
| from .utils import is_leaf_module, print_params, svd |
There was a problem hiding this comment.
The function is_leaf_module is imported from .utils but never used in this file. The code uses named_leaf_modules instead (which internally calls is_leaf_module). This unused import should be removed.
| from .utils import is_leaf_module, print_params, svd | |
| from .utils import print_params, svd |
| from fusion_bench.models.utils import named_leaf_modules | ||
| from fusion_bench.utils import seed_everything_by_time | ||
| from fusion_bench.utils.dtype import dtype_support_svd | ||
| from fusion_bench.utils.json import save_to_json |
There was a problem hiding this comment.
The save_to_json function is imported but never used in this file. This unused import should be removed (though it would be needed if the save_on_every_step and evaluate_on_every_step features are properly implemented as suggested in another comment).
| from fusion_bench.utils.json import save_to_json |
| optimizer = torch.optim.Adam([merged_weight], lr=self.lr) | ||
| all_losses = [[], []] | ||
| all_alphas = [[], []] | ||
| for step_idx in tqdm( |
There was a problem hiding this comment.
For loop variable 'step_idx' is not used in the loop body.
| for step_idx in tqdm( | |
| for _ in tqdm( |
| else: | ||
| # This is a naive weighted optimization | ||
| optimizer = torch.optim.Adam([merged_weight], lr=self.lr) | ||
| for step_idx in tqdm( |
There was a problem hiding this comment.
For loop variable 'step_idx' is not used in the loop body.
| for step_idx in tqdm( | |
| for _ in tqdm( |
| from fusion_bench.utils.state_dict_arithmetic import state_dict_sub | ||
|
|
||
| from .utils import frobenius_inner_product, get_task_vector_norm, is_leaf_module, svd | ||
| from .utils import frobenius_inner_product, get_task_vector_norm, svd |
There was a problem hiding this comment.
Import of 'frobenius_inner_product' is not used.
| from .utils import frobenius_inner_product, get_task_vector_norm, svd | |
| from .utils import get_task_vector_norm, svd |
| import torch | ||
| from torch import Tensor, nn | ||
|
|
||
| from fusion_bench.models.utils import is_leaf_module |
There was a problem hiding this comment.
Import of 'is_leaf_module' is not used.
| from fusion_bench.models.utils import is_leaf_module |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@fusion_bench/method/dop/dop_general.py`:
- Around line 64-114: The constructor sets alpha: float = None but immediately
asserts numeric bounds, causing a crash when alpha is omitted; fix by giving
alpha a safe default (e.g., alpha: float = 0.5) in the __init__ signature or by
handling None before the assertions (e.g., if alpha is None: self.alpha = 0.5
else: self.alpha = alpha), then perform the existing bounds check against
self.alpha and update the docstring to reflect the new default; ensure the
associated symbols are the alpha parameter, self.alpha assignment, and the
subsequent assertions that check self.alpha and self.svd_epsilon.
- Around line 36-37: The class DOPMerging currently declares inheritance as
"class DOPMerging(BaseAlgorithm, LightningFabricMixin)" which places
LightningFabricMixin after BaseAlgorithm and can break method resolution order;
change the declaration to put the mixin first (i.e., inherit from
LightningFabricMixin before BaseAlgorithm) so the MRO calls mixin methods before
base class methods and ensure any super() calls resolve correctly.
- Around line 131-134: The shuffle of model_names currently happens without
applying self.seed, making runs nondeterministic; before calling
random.shuffle(model_names) (where model_names = modelpool.model_names and
guarded by self.shuffle_order), call random.seed(self.seed) (and if numpy or
torch RNGs are used later for optimizer initialization, seed them as well) so
the model order and subsequent optimizer initialization become reproducible;
ensure self.seed is optional and only seed when not None to preserve existing
behavior.
- Around line 286-292: The SVD rank selection can produce NaNs when all singular
values are zero (finetuned_tv == 0); guard the computation by checking the
singular-value sum before computing cumsum_ratio: compute total = s.sum() (or
use torch.isclose to 0) and if total is zero set split_rank = 0 (or another safe
default) rather than running cumsum/torch.searchsorted; then slice u into u_main
= u[:, :split_rank] as before. Apply this check in the loop handling
finetuned_weights/finetuned_tv (refer to finetuned_tv, svd(...)->u,s,v, s.sum(),
cumsum_ratio, split_rank, and u_main) so downstream code never receives NaN
split_rank or cumsum_ratio.
In `@fusion_bench/method/dop/utils.py`:
- Line 77: The loop currently uses an unused variable "module_name" in "for
module_name, module in model.named_modules()"; rename it to "_" (or
"_module_name") or switch to "for module in model.modules()" so the lint
no-unused-variable warning is resolved—update the loop header wherever "for
module_name, module in model.named_modules()" appears (keeping the "module"
variable intact).
- Around line 73-88: The function print_params computes linear_ratio and
linear_weight_ratio by dividing by total_params which can be zero; update
print_params to guard against division by zero by checking total_params (in
function print_params) before computing ratios and use 0.0 (or an appropriate
default) when total_params == 0, ensuring linear_ratio and linear_weight_ratio
are computed only when total_params > 0; reference variables total_params,
linear_params, linear_weight_params and the ratio calculations linear_ratio and
linear_weight_ratio in print_params.
In `@fusion_bench/method/opcm/opcm.py`:
- Around line 17-26: Remove unused imports from the top of opcm.py to resolve
F401: inspect and delete any names from the import block that are not referenced
in this module (candidates shown in the diff include BaseModelPool,
LightningFabricMixin, SimpleProfilerMixin, is_leaf_module,
CLIPVisionModelTaskPool, instantiate, load_from_json, save_to_json,
state_dict_to_vector, state_dict_sub) while keeping only the actually used
symbols (e.g., BaseAlgorithm and the .utils imports like
frobenius_inner_product, get_task_vector_norm, svd if they are used); update the
import statement(s) to import only the needed identifiers so linting no longer
reports unused imports.
In `@fusion_bench/method/opcm/utils.py`:
- Line 6: Remove the unused import is_leaf_module from
fusion_bench/method/opcm/utils.py to resolve the F401 lint error: locate the
import statement "from fusion_bench.models.utils import is_leaf_module" at the
top of that module and delete it (or replace it with a used symbol if intended),
ensuring no other references to is_leaf_module remain in the file.
🧹 Nitpick comments (1)
fusion_bench/method/dop/dop_general.py (1)
6-31: Remove unused imports and addlightningto the cleanup list.The proposed diff should also include
import lightning as L(line 14), which is imported but never used in the code. All other removals are correct.import logging -import os import random -import time from copy import deepcopy -from pathlib import Path -from typing import Dict, List, Literal, Optional, Tuple, cast +from typing import Dict, List, Optional -import lightning as L import numpy as np import torch -from omegaconf import DictConfig from torch import Tensor, nn from torch.autograd import Variable from tqdm.auto import tqdm from fusion_bench import BaseAlgorithm, BaseModelPool, auto_register_config from fusion_bench.method.simple_average import simple_average from fusion_bench.mixins import LightningFabricMixin from fusion_bench.models.utils import named_leaf_modules from fusion_bench.utils import seed_everything_by_time from fusion_bench.utils.dtype import dtype_support_svd -from fusion_bench.utils.json import save_to_json from .min_norm_solvers import MinNormSolver, gradient_normalizers -from .utils import is_leaf_module, print_params, svd +from .utils import svdPer the coding guidelines: avoid importing PyTorch and Transformers at module level in method implementations; defer imports to function/method bodies.
| alpha: float = None, | ||
| svd_epsilon: float = 1.0, | ||
| svd_proj_space: str = "uv", | ||
| exclude_keys: List[str] | None = None, | ||
| **kwargs, | ||
| ): | ||
| """ | ||
| Initialize the DOP merging algorithm. | ||
|
|
||
| Args: | ||
| seed: Random seed for reproducibility. If None, uses time-based seeding. | ||
| shuffle_order: Whether to shuffle the order of models before merging. | ||
| save_on_every_step: Whether to save the model after each merge step. | ||
| evaluate_on_every_step: Whether to evaluate the model after each merge step. | ||
| lr: Learning rate for the optimization process. | ||
| num_steps: Number of optimization steps per layer. | ||
| mgda: Whether to use Multi-Gradient Descent Algorithm for balancing losses. | ||
| ema: Whether to use exponential moving average for MGDA weights. | ||
| ema_beta: EMA decay rate for MGDA weights (only used if ema=True). | ||
| alpha: Weight for balancing between stability and plasticity (0-1). | ||
| When mgda=False, used as a fixed weight. When mgda=True with ema=True, | ||
| used as initial weight. | ||
| svd_epsilon: Threshold for SVD rank selection (0-1). Determines how much | ||
| variance to preserve in the projection space. | ||
| svd_proj_space: SVD projection space to use: 'u', 'v', or 'uv' (both). | ||
| exclude_keys: List of module names to exclude from optimization. | ||
| **kwargs: Additional arguments passed to BaseAlgorithm. | ||
| """ | ||
| self.lr = lr | ||
| self.num_steps = num_steps | ||
| self.mgda = mgda | ||
| self.ema = ema | ||
| self.ema_beta = ema_beta | ||
| self.alpha = alpha | ||
| self.svd_epsilon = svd_epsilon | ||
| self.svd_proj_space = svd_proj_space | ||
| self.seed = seed | ||
| self.shuffle_order = shuffle_order | ||
| self.save_on_every_step = save_on_every_step | ||
| self.evaluate_on_every_step = evaluate_on_every_step | ||
|
|
||
| if exclude_keys is None: | ||
| exclude_keys = [] | ||
| self.exclude_keys = exclude_keys | ||
|
|
||
| assert ( | ||
| self.svd_epsilon >= 0 and self.svd_epsilon <= 1 | ||
| ), "The svd_epsilon should be in the range of [0, 1]" | ||
| assert ( | ||
| self.alpha >= 0 and self.alpha <= 1 | ||
| ), "The alpha should be in the range of [0, 1]" |
There was a problem hiding this comment.
Fix alpha=None causing an assertion crash.
alpha defaults to None, but is immediately compared to numbers. This raises at runtime when alpha isn’t provided.
🛠️ Proposed fix
- alpha: float = None,
+ alpha: float | None = None,
@@
- assert (
- self.alpha >= 0 and self.alpha <= 1
- ), "The alpha should be in the range of [0, 1]"
+ if self.alpha is None:
+ self.alpha = 0.5
+ if not (0 <= self.alpha <= 1):
+ raise ValueError("alpha must be in the range of [0, 1]")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| alpha: float = None, | |
| svd_epsilon: float = 1.0, | |
| svd_proj_space: str = "uv", | |
| exclude_keys: List[str] | None = None, | |
| **kwargs, | |
| ): | |
| """ | |
| Initialize the DOP merging algorithm. | |
| Args: | |
| seed: Random seed for reproducibility. If None, uses time-based seeding. | |
| shuffle_order: Whether to shuffle the order of models before merging. | |
| save_on_every_step: Whether to save the model after each merge step. | |
| evaluate_on_every_step: Whether to evaluate the model after each merge step. | |
| lr: Learning rate for the optimization process. | |
| num_steps: Number of optimization steps per layer. | |
| mgda: Whether to use Multi-Gradient Descent Algorithm for balancing losses. | |
| ema: Whether to use exponential moving average for MGDA weights. | |
| ema_beta: EMA decay rate for MGDA weights (only used if ema=True). | |
| alpha: Weight for balancing between stability and plasticity (0-1). | |
| When mgda=False, used as a fixed weight. When mgda=True with ema=True, | |
| used as initial weight. | |
| svd_epsilon: Threshold for SVD rank selection (0-1). Determines how much | |
| variance to preserve in the projection space. | |
| svd_proj_space: SVD projection space to use: 'u', 'v', or 'uv' (both). | |
| exclude_keys: List of module names to exclude from optimization. | |
| **kwargs: Additional arguments passed to BaseAlgorithm. | |
| """ | |
| self.lr = lr | |
| self.num_steps = num_steps | |
| self.mgda = mgda | |
| self.ema = ema | |
| self.ema_beta = ema_beta | |
| self.alpha = alpha | |
| self.svd_epsilon = svd_epsilon | |
| self.svd_proj_space = svd_proj_space | |
| self.seed = seed | |
| self.shuffle_order = shuffle_order | |
| self.save_on_every_step = save_on_every_step | |
| self.evaluate_on_every_step = evaluate_on_every_step | |
| if exclude_keys is None: | |
| exclude_keys = [] | |
| self.exclude_keys = exclude_keys | |
| assert ( | |
| self.svd_epsilon >= 0 and self.svd_epsilon <= 1 | |
| ), "The svd_epsilon should be in the range of [0, 1]" | |
| assert ( | |
| self.alpha >= 0 and self.alpha <= 1 | |
| ), "The alpha should be in the range of [0, 1]" | |
| alpha: float | None = None, | |
| svd_epsilon: float = 1.0, | |
| svd_proj_space: str = "uv", | |
| exclude_keys: List[str] | None = None, | |
| **kwargs, | |
| ): | |
| """ | |
| Initialize the DOP merging algorithm. | |
| Args: | |
| seed: Random seed for reproducibility. If None, uses time-based seeding. | |
| shuffle_order: Whether to shuffle the order of models before merging. | |
| save_on_every_step: Whether to save the model after each merge step. | |
| evaluate_on_every_step: Whether to evaluate the model after each merge step. | |
| lr: Learning rate for the optimization process. | |
| num_steps: Number of optimization steps per layer. | |
| mgda: Whether to use Multi-Gradient Descent Algorithm for balancing losses. | |
| ema: Whether to use exponential moving average for MGDA weights. | |
| ema_beta: EMA decay rate for MGDA weights (only used if ema=True). | |
| alpha: Weight for balancing between stability and plasticity (0-1). | |
| When mgda=False, used as a fixed weight. When mgda=True with ema=True, | |
| used as initial weight. | |
| svd_epsilon: Threshold for SVD rank selection (0-1). Determines how much | |
| variance to preserve in the projection space. | |
| svd_proj_space: SVD projection space to use: 'u', 'v', or 'uv' (both). | |
| exclude_keys: List of module names to exclude from optimization. | |
| **kwargs: Additional arguments passed to BaseAlgorithm. | |
| """ | |
| self.lr = lr | |
| self.num_steps = num_steps | |
| self.mgda = mgda | |
| self.ema = ema | |
| self.ema_beta = ema_beta | |
| self.alpha = alpha | |
| self.svd_epsilon = svd_epsilon | |
| self.svd_proj_space = svd_proj_space | |
| self.seed = seed | |
| self.shuffle_order = shuffle_order | |
| self.save_on_every_step = save_on_every_step | |
| self.evaluate_on_every_step = evaluate_on_every_step | |
| if exclude_keys is None: | |
| exclude_keys = [] | |
| self.exclude_keys = exclude_keys | |
| if self.alpha is None: | |
| self.alpha = 0.5 | |
| if not (0 <= self.alpha <= 1): | |
| raise ValueError("alpha must be in the range of [0, 1]") | |
| assert ( | |
| self.svd_epsilon >= 0 and self.svd_epsilon <= 1 | |
| ), "The svd_epsilon should be in the range of [0, 1]" |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 64-64: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
In `@fusion_bench/method/dop/dop_general.py` around lines 64 - 114, The
constructor sets alpha: float = None but immediately asserts numeric bounds,
causing a crash when alpha is omitted; fix by giving alpha a safe default (e.g.,
alpha: float = 0.5) in the __init__ signature or by handling None before the
assertions (e.g., if alpha is None: self.alpha = 0.5 else: self.alpha = alpha),
then perform the existing bounds check against self.alpha and update the
docstring to reflect the new default; ensure the associated symbols are the
alpha parameter, self.alpha assignment, and the subsequent assertions that check
self.alpha and self.svd_epsilon.
| model_names = modelpool.model_names | ||
| if self.shuffle_order: | ||
| random.shuffle(model_names) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n fusion_bench/method/dop/dop_general.py | head -200Repository: tanganke/fusion_bench
Length of output: 9274
🏁 Script executed:
rg "self\.seed" fusion_bench/method/dop/dop_general.py -nRepository: tanganke/fusion_bench
Length of output: 93
🏁 Script executed:
cat -n fusion_bench/method/dop/dop_general.py | sed -n '200,300p'Repository: tanganke/fusion_bench
Length of output: 5034
🏁 Script executed:
cat -n fusion_bench/method/dop/dop_general.py | sed -n '300,450p'Repository: tanganke/fusion_bench
Length of output: 5845
🏁 Script executed:
rg "L\.seed_everything|seed_everything" --type py | head -20Repository: tanganke/fusion_bench
Length of output: 1732
🏁 Script executed:
cat -n fusion_bench/method/opcm/opcm.py | grep -A 5 -B 5 "L.seed_everything"Repository: tanganke/fusion_bench
Length of output: 470
🏁 Script executed:
cat -n fusion_bench/utils/fabric.py | grep -A 15 "def seed_everything_by_time"Repository: tanganke/fusion_bench
Length of output: 843
🏁 Script executed:
cat -n fusion_bench/utils/fabric.py | sed -n '16,50p'Repository: tanganke/fusion_bench
Length of output: 1627
🏁 Script executed:
cat -n fusion_bench/utils/fabric.py | sed -n '16,70p'Repository: tanganke/fusion_bench
Length of output: 2396
🏁 Script executed:
rg "on_run_start|on_run_end" fusion_bench/method/dop/dop_general.pyRepository: tanganke/fusion_bench
Length of output: 47
🏁 Script executed:
rg "def on_run_start|def on_run_end" fusion_bench --type py | head -10Repository: tanganke/fusion_bench
Length of output: 584
🏁 Script executed:
cat -n fusion_bench/method/base_algorithm.py | grep -A 20 "def on_run_start"Repository: tanganke/fusion_bench
Length of output: 1828
Apply seeding before shuffling to honor the reproducibility contract.
self.seed is never used, so runs are always nondeterministic even when a seed is provided. This affects both the model order shuffling and the subsequent optimizer initialization.
🎯 Proposed fix
+ if self.seed is not None:
+ L.seed_everything(self.seed)
+ else:
+ seed_everything_by_time()
model_names = modelpool.model_names
if self.shuffle_order:
random.shuffle(model_names)🤖 Prompt for AI Agents
In `@fusion_bench/method/dop/dop_general.py` around lines 131 - 134, The shuffle
of model_names currently happens without applying self.seed, making runs
nondeterministic; before calling random.shuffle(model_names) (where model_names
= modelpool.model_names and guarded by self.shuffle_order), call
random.seed(self.seed) (and if numpy or torch RNGs are used later for optimizer
initialization, seed them as well) so the model order and subsequent optimizer
initialization become reproducible; ensure self.seed is optional and only seed
when not None to preserve existing behavior.
| for i, finetuned_weight in enumerate(finetuned_weights.values()): | ||
| finetuned_tv = finetuned_weight - pretrained_weight | ||
| u, s, v = svd(finetuned_tv, full_matrices=True) | ||
| epsilon = 1.0 if self.svd_epsilon > 1.0 else self.svd_epsilon | ||
| cumsum_ratio = s.cumsum(dim=0) / s.sum() | ||
| split_rank = torch.searchsorted(cumsum_ratio, epsilon).item() | ||
| u_main = u[:, :split_rank] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "dop_general.py" | head -5Repository: tanganke/fusion_bench
Length of output: 105
🏁 Script executed:
cat -n fusion_bench/method/dop/dop_general.py | sed -n '280,310p'Repository: tanganke/fusion_bench
Length of output: 1630
🏁 Script executed:
cat -n fusion_bench/method/dop/dop_general.py | sed -n '1,50p'Repository: tanganke/fusion_bench
Length of output: 2206
🏁 Script executed:
cat -n fusion_bench/method/dop/utils.py | grep -A 20 "def svd"Repository: tanganke/fusion_bench
Length of output: 922
🏁 Script executed:
cat -n fusion_bench/method/dop/dop_general.py | sed -n '295,350p'Repository: tanganke/fusion_bench
Length of output: 2799
🏁 Script executed:
rg "s\.sum\(\)" fusion_bench/method/dop/Repository: tanganke/fusion_bench
Length of output: 240
🏁 Script executed:
python3 << 'EOF'
import torch
# Test 1: What happens with NaN in cumsum_ratio?
s = torch.tensor([0.0, 0.0, 0.0])
cumsum_ratio = s.cumsum(dim=0) / s.sum()
print("Test 1 - NaN case:")
print(f" s = {s}")
print(f" s.sum() = {s.sum()}")
print(f" cumsum_ratio = {cumsum_ratio}")
epsilon = 0.95
result = torch.searchsorted(cumsum_ratio, epsilon)
print(f" searchsorted(cumsum_ratio, {epsilon}) = {result}")
print(f" .item() = {result.item()}")
# Test 2: What about empty slices?
u = torch.randn(10, 10)
split_rank = 0
u_main = u[:, :split_rank]
print(f"\nTest 2 - Empty slice:")
print(f" u.shape = {u.shape}")
print(f" u[:, :0].shape = {u_main.shape}")
# Test 3: Normal case
s_normal = torch.tensor([5.0, 3.0, 1.0, 0.1])
cumsum_normal = s_normal.cumsum(dim=0) / s_normal.sum()
print(f"\nTest 3 - Normal case:")
print(f" cumsum_ratio = {cumsum_normal}")
split_rank_normal = torch.searchsorted(cumsum_normal, epsilon).item()
print(f" searchsorted result = {split_rank_normal}")
EOFRepository: tanganke/fusion_bench
Length of output: 182
🏁 Script executed:
pip install torch -q && python3 << 'EOF'
import torch
# Test 1: What happens with NaN in cumsum_ratio?
s = torch.tensor([0.0, 0.0, 0.0])
cumsum_ratio = s.cumsum(dim=0) / s.sum()
print("Test 1 - NaN case:")
print(f" s = {s}")
print(f" s.sum() = {s.sum()}")
print(f" cumsum_ratio = {cumsum_ratio}")
epsilon = 0.95
result = torch.searchsorted(cumsum_ratio, epsilon)
print(f" searchsorted(cumsum_ratio, {epsilon}) = {result}")
print(f" .item() = {result.item()}")
# Test 2: What about empty slices?
u = torch.randn(10, 10)
split_rank = 0
u_main = u[:, :split_rank]
print(f"\nTest 2 - Empty slice:")
print(f" u.shape = {u.shape}")
print(f" u[:, :0].shape = {u_main.shape}")
# Test 3: Normal case
s_normal = torch.tensor([5.0, 3.0, 1.0, 0.1])
cumsum_normal = s_normal.cumsum(dim=0) / s_normal.sum()
print(f"\nTest 3 - Normal case:")
print(f" cumsum_ratio = {cumsum_normal}")
split_rank_normal = torch.searchsorted(cumsum_normal, epsilon).item()
print(f" searchsorted result = {split_rank_normal}")
EOFRepository: tanganke/fusion_bench
Length of output: 151
🏁 Script executed:
rg -A 10 "def cal_loss_i" fusion_bench/method/dop/dop_general.pyRepository: tanganke/fusion_bench
Length of output: 576
🏁 Script executed:
cat -n fusion_bench/method/dop/dop.py | sed -n '1,50p'Repository: tanganke/fusion_bench
Length of output: 1868
🏁 Script executed:
rg -B 5 -A 10 "cumsum_ratio = s.cumsum" fusion_bench/method/dop/dop.pyRepository: tanganke/fusion_bench
Length of output: 758
Guard SVD rank selection when all singular values are zero.
When finetuned_weight == pretrained_weight, all singular values are zero and s.sum() evaluates to 0. This causes the cumsum ratio computation to produce NaN values, which propagates through torch.searchsorted() and poisons subsequent loss calculations in the optimization loop.
Proposed fix
- cumsum_ratio = s.cumsum(dim=0) / s.sum()
- split_rank = torch.searchsorted(cumsum_ratio, epsilon).item()
- u_main = u[:, :split_rank]
- v_main = v[:, :split_rank]
- s_main = s[:split_rank]
+ s_sum = s.sum()
+ if s_sum == 0:
+ split_rank = 0
+ u_main = u[:, :0]
+ v_main = v[:, :0]
+ s_main = s[:0]
+ else:
+ cumsum_ratio = s.cumsum(dim=0) / s_sum
+ split_rank = torch.searchsorted(cumsum_ratio, epsilon).item()
+ u_main = u[:, :split_rank]
+ v_main = v[:, :split_rank]
+ s_main = s[:split_rank]🤖 Prompt for AI Agents
In `@fusion_bench/method/dop/dop_general.py` around lines 286 - 292, The SVD rank
selection can produce NaNs when all singular values are zero (finetuned_tv ==
0); guard the computation by checking the singular-value sum before computing
cumsum_ratio: compute total = s.sum() (or use torch.isclose to 0) and if total
is zero set split_rank = 0 (or another safe default) rather than running
cumsum/torch.searchsorted; then slice u into u_main = u[:, :split_rank] as
before. Apply this check in the loop handling finetuned_weights/finetuned_tv
(refer to finetuned_tv, svd(...)->u,s,v, s.sum(), cumsum_ratio, split_rank, and
u_main) so downstream code never receives NaN split_rank or cumsum_ratio.
| def print_params(model): | ||
| total_params = 0 | ||
| linear_params = 0 | ||
| linear_weight_params = 0 | ||
| for module_name, module in model.named_modules(): | ||
| if not is_leaf_module(module): | ||
| continue | ||
| if isinstance(module, nn.Linear): | ||
| linear_params += sum(p.numel() for n, p in module.named_parameters()) | ||
| linear_weight_params += sum( | ||
| p.numel() for n, p in module.named_parameters() if "weight" in n | ||
| ) | ||
| total_params += sum(p.numel() for p in module.parameters()) | ||
|
|
||
| linear_ratio = linear_params / total_params * 100 | ||
| linear_weight_ratio = linear_weight_params / total_params * 100 |
There was a problem hiding this comment.
Guard against zero-parameter models to avoid ZeroDivision.
If total_params is 0, the ratio computation will raise. A small guard keeps this safe for edge cases.
🔧 Proposed fix
- linear_ratio = linear_params / total_params * 100
- linear_weight_ratio = linear_weight_params / total_params * 100
+ if total_params == 0:
+ linear_ratio = 0.0
+ linear_weight_ratio = 0.0
+ else:
+ linear_ratio = linear_params / total_params * 100
+ linear_weight_ratio = linear_weight_params / total_params * 100📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def print_params(model): | |
| total_params = 0 | |
| linear_params = 0 | |
| linear_weight_params = 0 | |
| for module_name, module in model.named_modules(): | |
| if not is_leaf_module(module): | |
| continue | |
| if isinstance(module, nn.Linear): | |
| linear_params += sum(p.numel() for n, p in module.named_parameters()) | |
| linear_weight_params += sum( | |
| p.numel() for n, p in module.named_parameters() if "weight" in n | |
| ) | |
| total_params += sum(p.numel() for p in module.parameters()) | |
| linear_ratio = linear_params / total_params * 100 | |
| linear_weight_ratio = linear_weight_params / total_params * 100 | |
| def print_params(model): | |
| total_params = 0 | |
| linear_params = 0 | |
| linear_weight_params = 0 | |
| for module_name, module in model.named_modules(): | |
| if not is_leaf_module(module): | |
| continue | |
| if isinstance(module, nn.Linear): | |
| linear_params += sum(p.numel() for n, p in module.named_parameters()) | |
| linear_weight_params += sum( | |
| p.numel() for n, p in module.named_parameters() if "weight" in n | |
| ) | |
| total_params += sum(p.numel() for p in module.parameters()) | |
| if total_params == 0: | |
| linear_ratio = 0.0 | |
| linear_weight_ratio = 0.0 | |
| else: | |
| linear_ratio = linear_params / total_params * 100 | |
| linear_weight_ratio = linear_weight_params / total_params * 100 |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 77-77: Loop control variable module_name not used within loop body
Rename unused module_name to _module_name
(B007)
🤖 Prompt for AI Agents
In `@fusion_bench/method/dop/utils.py` around lines 73 - 88, The function
print_params computes linear_ratio and linear_weight_ratio by dividing by
total_params which can be zero; update print_params to guard against division by
zero by checking total_params (in function print_params) before computing ratios
and use 0.0 (or an appropriate default) when total_params == 0, ensuring
linear_ratio and linear_weight_ratio are computed only when total_params > 0;
reference variables total_params, linear_params, linear_weight_params and the
ratio calculations linear_ratio and linear_weight_ratio in print_params.
| total_params = 0 | ||
| linear_params = 0 | ||
| linear_weight_params = 0 | ||
| for module_name, module in model.named_modules(): |
There was a problem hiding this comment.
Rename unused loop variable to satisfy lint.
🧹 Proposed fix
- for module_name, module in model.named_modules():
+ for _module_name, module in model.named_modules():📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for module_name, module in model.named_modules(): | |
| for _module_name, module in model.named_modules(): |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 77-77: Loop control variable module_name not used within loop body
Rename unused module_name to _module_name
(B007)
🤖 Prompt for AI Agents
In `@fusion_bench/method/dop/utils.py` at line 77, The loop currently uses an
unused variable "module_name" in "for module_name, module in
model.named_modules()"; rename it to "_" (or "_module_name") or switch to "for
module in model.modules()" so the lint no-unused-variable warning is
resolved—update the loop header wherever "for module_name, module in
model.named_modules()" appears (keeping the "module" variable intact).
| from fusion_bench import BaseAlgorithm, BaseModelPool | ||
| from fusion_bench.mixins import LightningFabricMixin, SimpleProfilerMixin | ||
| from fusion_bench.models.utils import is_leaf_module | ||
| from fusion_bench.taskpool import CLIPVisionModelTaskPool | ||
| from fusion_bench.utils import instantiate | ||
| from fusion_bench.utils.json import load_from_json, save_to_json | ||
| from fusion_bench.utils.parameters import state_dict_to_vector | ||
| from fusion_bench.utils.state_dict_arithmetic import state_dict_sub | ||
|
|
||
| from .utils import frobenius_inner_product, get_task_vector_norm, is_leaf_module, svd | ||
| from .utils import frobenius_inner_product, get_task_vector_norm, svd |
There was a problem hiding this comment.
Drop unused imports to satisfy F401.
🧹 Proposed fix
-from fusion_bench.utils import instantiate
-from fusion_bench.utils.json import load_from_json, save_to_json
-from fusion_bench.utils.parameters import state_dict_to_vector
-from fusion_bench.utils.state_dict_arithmetic import state_dict_sub
+from fusion_bench.utils.json import save_to_json
@@
-from .utils import frobenius_inner_product, get_task_vector_norm, svd
+from .utils import get_task_vector_norm, svd🧰 Tools
🪛 Flake8 (7.3.0)
[error] 21-21: 'fusion_bench.utils.instantiate' imported but unused
(F401)
[error] 22-22: 'fusion_bench.utils.json.load_from_json' imported but unused
(F401)
[error] 23-23: 'fusion_bench.utils.parameters.state_dict_to_vector' imported but unused
(F401)
[error] 24-24: 'fusion_bench.utils.state_dict_arithmetic.state_dict_sub' imported but unused
(F401)
[error] 26-26: '.utils.frobenius_inner_product' imported but unused
(F401)
🤖 Prompt for AI Agents
In `@fusion_bench/method/opcm/opcm.py` around lines 17 - 26, Remove unused imports
from the top of opcm.py to resolve F401: inspect and delete any names from the
import block that are not referenced in this module (candidates shown in the
diff include BaseModelPool, LightningFabricMixin, SimpleProfilerMixin,
is_leaf_module, CLIPVisionModelTaskPool, instantiate, load_from_json,
save_to_json, state_dict_to_vector, state_dict_sub) while keeping only the
actually used symbols (e.g., BaseAlgorithm and the .utils imports like
frobenius_inner_product, get_task_vector_norm, svd if they are used); update the
import statement(s) to import only the needed identifiers so linting no longer
reports unused imports.
| import torch | ||
| from torch import Tensor, nn | ||
|
|
||
| from fusion_bench.models.utils import is_leaf_module |
There was a problem hiding this comment.
Remove unused import to satisfy F401.
🧹 Proposed fix
-from fusion_bench.models.utils import is_leaf_module📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from fusion_bench.models.utils import is_leaf_module |
🧰 Tools
🪛 Flake8 (7.3.0)
[error] 6-6: 'fusion_bench.models.utils.is_leaf_module' imported but unused
(F401)
🤖 Prompt for AI Agents
In `@fusion_bench/method/opcm/utils.py` at line 6, Remove the unused import
is_leaf_module from fusion_bench/method/opcm/utils.py to resolve the F401 lint
error: locate the import statement "from fusion_bench.models.utils import
is_leaf_module" at the top of that module and delete it (or replace it with a
used symbol if intended), ensuring no other references to is_leaf_module remain
in the file.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@fusion_bench/method/dop/dop_general.py`:
- Around line 54-107: The constructor fails to set self.num_ray_actors, causing
AttributeError when run() or _layer_wise_optimize() access it; fix by assigning
the incoming num_ray_actors parameter to the instance (e.g., add
self.num_ray_actors = num_ray_actors inside __init__), ensuring the attribute
exists and matches the provided int value used by run() and
_layer_wise_optimize().
| def __init__( | ||
| self, | ||
| seed: Optional[int] = None, | ||
| shuffle_order: bool = False, | ||
| save_on_every_step: bool = True, | ||
| evaluate_on_every_step: bool = False, | ||
| lr: float = 1e-4, | ||
| num_steps: int = 200, | ||
| mgda: bool = True, | ||
| ema: bool = True, | ||
| ema_beta: float = 0.99, | ||
| alpha: float = None, | ||
| svd_epsilon: float = 1.0, | ||
| svd_proj_space: str = "uv", | ||
| exclude_keys: List[str] | None = None, | ||
| num_ray_actors: int = 0, | ||
| **kwargs, | ||
| ): | ||
| """ | ||
| Initialize the DOP merging algorithm. | ||
|
|
||
| Args: | ||
| seed: Random seed for reproducibility. If None, uses time-based seeding. | ||
| shuffle_order: Whether to shuffle the order of models before merging. | ||
| save_on_every_step: Whether to save the model after each merge step. | ||
| evaluate_on_every_step: Whether to evaluate the model after each merge step. | ||
| lr: Learning rate for the optimization process. | ||
| num_steps: Number of optimization steps per layer. | ||
| mgda: Whether to use Multi-Gradient Descent Algorithm for balancing losses. | ||
| ema: Whether to use exponential moving average for MGDA weights. | ||
| ema_beta: EMA decay rate for MGDA weights (only used if ema=True). | ||
| alpha: Weight for balancing between stability and plasticity (0-1). | ||
| When mgda=False, used as a fixed weight. When mgda=True with ema=True, | ||
| used as initial weight. | ||
| svd_epsilon: Threshold for SVD rank selection (0-1). Determines how much | ||
| variance to preserve in the projection space. | ||
| svd_proj_space: SVD projection space to use: 'u', 'v', or 'uv' (both). | ||
| exclude_keys: List of module names to exclude from optimization. | ||
| num_ray_actors: Number of Ray actors to use for parallel processing. If 0, ray is not used. | ||
| **kwargs: Additional arguments passed to BaseAlgorithm. | ||
| """ | ||
| self.lr = lr | ||
| self.num_steps = num_steps | ||
| self.mgda = mgda | ||
| self.ema = ema | ||
| self.ema_beta = ema_beta | ||
| self.alpha = alpha | ||
| self.svd_epsilon = svd_epsilon | ||
| self.svd_proj_space = svd_proj_space | ||
| self.seed = seed | ||
| self.shuffle_order = shuffle_order | ||
| self.save_on_every_step = save_on_every_step | ||
| self.evaluate_on_every_step = evaluate_on_every_step | ||
|
|
There was a problem hiding this comment.
Persist num_ray_actors to avoid AttributeError in run().
self.num_ray_actors is used in run() and _layer_wise_optimize() but never assigned, which will crash at runtime.
🛠️ Proposed fix
self.seed = seed
self.shuffle_order = shuffle_order
self.save_on_every_step = save_on_every_step
self.evaluate_on_every_step = evaluate_on_every_step
+ self.num_ray_actors = num_ray_actors🧰 Tools
🪛 Ruff (0.14.14)
[warning] 65-65: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
[warning] 69-69: Unused method argument: num_ray_actors
(ARG002)
🤖 Prompt for AI Agents
In `@fusion_bench/method/dop/dop_general.py` around lines 54 - 107, The
constructor fails to set self.num_ray_actors, causing AttributeError when run()
or _layer_wise_optimize() access it; fix by assigning the incoming
num_ray_actors parameter to the instance (e.g., add self.num_ray_actors =
num_ray_actors inside __init__), ensuring the attribute exists and matches the
provided int value used by run() and _layer_wise_optimize().
…model loading with profiling; update LightningFabricMixin to default to 1 device if no configuration is found.
Summary by CodeRabbit
New Features
Refactor
Chores