architecture-agnostic-dop by tanganke · Pull Request #204 · tanganke/fusion_bench

tanganke · 2026-02-05T12:23:49Z

Summary by CodeRabbit

New Features
- Added DOP continual model merging with layer-wise SVD projections, MGDA multi-task optimization and optional EMA; configurable steps and per-layer weighting.
- Added SVD dtype support check, Ray availability check, and an end-to-end distributed DOP example script.
- New configuration file for DOP merging options.
Refactor
- Consolidated leaf-module utilities into a shared module.
Chores
- Removed a legacy parameter-printing debug helper and improved logging when no Fabric config is provided.

…eter analysis

coderabbitai · 2026-02-05T12:24:07Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds a new DOPMerging class for data-free continual model merging using SVD-based, layer-wise optimization with optional MGDA/EMA; centralizes leaf-module utilities in fusion_bench.models.utils; moves a print_params utility to fusion_bench/method/dop/utils.py and removes the legacy print_params method from ContinualDOPForCLIP; adds SVD dtype and Ray availability helpers.

Changes

Cohort / File(s)	Summary
DOP Implementation `fusion_bench/method/dop/dop_general.py`	Adds `DOPMerging` class: layer-wise merging using SVD projections, MGDA (with optional EMA), per-layer optimization for `nn.Linear`, simple-averaging fallbacks, rank control via `svd_epsilon`, and optional Ray-based actor parallelism.
DOP Refactor & Utils `fusion_bench/method/dop/dop.py`, `fusion_bench/method/dop/utils.py`	Removes `print_params` method from `ContinualDOPForCLIP`; introduces standalone `print_params(model)` in `fusion_bench/method/dop/utils.py`; updates imports to use centralized `is_leaf_module`.
Model utils consolidation `fusion_bench/models/utils.py`, `fusion_bench/method/opcm/opcm.py`, `fusion_bench/method/opcm/utils.py`	Adds `is_leaf_module` and `named_leaf_modules` to `fusion_bench.models.utils`; OPCM modules switch to importing `is_leaf_module` from that centralized location.
SVD dtype & packages helpers `fusion_bench/utils/dtype.py`, `fusion_bench/utils/packages.py`	Adds `dtype_support_svd(dtype)` to check torch dtype compatibility for SVD; adds `is_ray_available()` to detect Ray availability.
Config & Example `config/method/dop/dop_general.yaml`, `examples/dop/distributed_dop.py`	Adds YAML config for DOPMerging defaults and a distributed example demonstrating merging with multiple CLIP vision models and Ray actor usage.
Misc (Lightning Fabric) `fusion_bench/mixins/lightning_fabric.py`	When no fabric config provided, explicitly logs and constructs Fabric with `devices=1` instead of default constructor.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant DOPMerging as "DOPMerging"
    participant ModelPool as "ModelPool"
    participant SVDSolver as "SVD / MGDA Solver"
    participant Layer as "Layer (nn.Linear / other)"

    User->>DOPMerging: run(modelpool)
    DOPMerging->>ModelPool: fetch pretrained + finetuned models
    loop per model
        DOPMerging->>DOPMerging: _layer_wise_optimize()
        loop per leaf module
            alt module is nn.Linear
                DOPMerging->>SVDSolver: compute per-task SVD & gradients
                SVDSolver-->>DOPMerging: MGDA weights / merged projection
                DOPMerging->>Layer: update weight/bias
            else non-linear or excluded
                DOPMerging->>Layer: apply simple_average()
            end
        end
    end
    DOPMerging-->>User: return merged model

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Poem

🐰 I nibble at matrices, SVD in my paw,

I stitch old and new with a careful law.
Merge layer by layer, softly I hum,
MGDA and EMA help balance the sum.
A tiny rabbit cheers for the code we become.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'architecture-agnostic-dop' is vague and does not clearly convey what changes were made; it uses hyphenated jargon without explaining the specific improvement or refactoring involved.	Consider a more descriptive title such as 'Refactor DOP to support architecture-agnostic model merging' or 'Extract common utilities for DOP and OPCM methods' that better explains the main objective.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/architecture-agnostic-dop

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR introduces an architecture-agnostic implementation of the DOP (Dual Projections for Balancing Stability and Plasticity) merging algorithm. The changes refactor common utility functions into shared modules and add a new general-purpose DOP implementation that can work with any model architecture, not just CLIP vision models.

Changes:

Refactored utility functions (is_leaf_module, named_leaf_modules, print_params) from method-specific modules to shared utility modules for reusability
Added new utility functions (is_ray_available, dtype_support_svd) to support future functionality and SVD operations
Implemented a new architecture-agnostic DOP merging algorithm (DOPMerging) that works with any PyTorch model, complementing the existing CLIP-specific implementation

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 22 comments.

Show a summary per file

File	Description
fusion_bench/utils/packages.py	Added `is_ray_available()` function for future Ray framework integration
fusion_bench/utils/dtype.py	Added `dtype_support_svd()` to check if a PyTorch dtype supports SVD operations
fusion_bench/models/utils.py	Added `is_leaf_module()` and `named_leaf_modules()` utility functions for traversing model architectures
fusion_bench/method/opcm/utils.py	Removed duplicate `is_leaf_module()` and updated to import from shared utils
fusion_bench/method/opcm/opcm.py	Updated imports to use `is_leaf_module` from shared utils
fusion_bench/method/dop/utils.py	Removed duplicate `is_leaf_module()`, added `print_params()` utility function
fusion_bench/method/dop/dop_general.py	New architecture-agnostic DOP implementation with layer-wise optimization for any model type
fusion_bench/method/dop/dop.py	Removed `print_params()` method (now in utils)

Copilot · 2026-02-05T12:30:04Z

fusion_bench/method/dop/dop_general.py

+                # Scaled back-propagation
+                loss = 0
+                for i, finetuned_weight in enumerate(finetuned_weights.values()):
+                    # Comptue gradients of each loss function wrt parameters


Spelling error: "Comptue" should be "Compute".

Suggested change

# Comptue gradients of each loss function wrt parameters

# Compute gradients of each loss function wrt parameters

Copilot · 2026-02-05T12:30:05Z

fusion_bench/utils/packages.py

+def is_ray_available():
+    return _is_package_available("ray")


The function is_ray_available() is added but doesn't appear to be used anywhere in the codebase. While it follows the established pattern of other availability check functions in this file, consider whether this addition is necessary for this PR or if it should be included when Ray functionality is actually implemented.

Copilot · 2026-02-05T12:30:05Z

fusion_bench/method/dop/dop_general.py

+    def run(self, modelpool: BaseModelPool):
+        """
+        Execute the DOP merging algorithm on a pool of models.
+
+        Merges models sequentially, where each new model is merged with the
+        previously merged result. The first model is used as-is, and subsequent
+        models are merged using layer-wise optimization.
+
+        Args:
+            modelpool: The model pool containing models to merge and the pretrained model.
+
+        Returns:
+            The final merged model after sequentially merging all models in the pool.
+        """
+        model_names = modelpool.model_names
+        if self.shuffle_order:
+            random.shuffle(model_names)
+
+        pretrained_model = modelpool.load_pretrained_model()
+
+        merged_model = None
+        for model_idx, model_name in enumerate(model_names):
+            print(
+                f"--------- Optimizing {model_idx + 1}/{len(model_names)}-th with {model_name} ---------"
+            )
+            if model_idx == 0:
+                merged_model = modelpool.load_model(model_names[0])
+            else:
+                merged_model = self._layer_wise_optimize(
+                    model_names=["merged", model_name],
+                    pretrained_model=deepcopy(pretrained_model),
+                    finetuned_models={
+                        "merged": merged_model,
+                        model_name: modelpool.load_model(model_name),
+                    },
+                    model_idx=model_idx,
+                )
+
+        return merged_model
+


The run method does not set the random seed despite accepting a seed parameter in __init__. The original CLIP-specific implementation (ContinualDOPForCLIP) in dop.py properly handles seeding with either L.seed_everything(self.seed) or seed_everything_by_time(self.fabric). This architecture-agnostic version should implement similar seeding behavior for reproducibility.

Copilot · 2026-02-05T12:30:05Z

fusion_bench/method/dop/dop_general.py

+    def run(self, modelpool: BaseModelPool):
+        """
+        Execute the DOP merging algorithm on a pool of models.
+
+        Merges models sequentially, where each new model is merged with the
+        previously merged result. The first model is used as-is, and subsequent
+        models are merged using layer-wise optimization.
+
+        Args:
+            modelpool: The model pool containing models to merge and the pretrained model.
+
+        Returns:
+            The final merged model after sequentially merging all models in the pool.
+        """
+        model_names = modelpool.model_names
+        if self.shuffle_order:
+            random.shuffle(model_names)
+
+        pretrained_model = modelpool.load_pretrained_model()
+
+        merged_model = None
+        for model_idx, model_name in enumerate(model_names):
+            print(
+                f"--------- Optimizing {model_idx + 1}/{len(model_names)}-th with {model_name} ---------"
+            )
+            if model_idx == 0:
+                merged_model = modelpool.load_model(model_names[0])
+            else:
+                merged_model = self._layer_wise_optimize(
+                    model_names=["merged", model_name],
+                    pretrained_model=deepcopy(pretrained_model),
+                    finetuned_models={
+                        "merged": merged_model,
+                        model_name: modelpool.load_model(model_name),
+                    },
+                    model_idx=model_idx,
+                )
+
+        return merged_model
+


The save_on_every_step and evaluate_on_every_step parameters are accepted in __init__ but never used in the run method. In the original CLIP-specific implementation (ContinualDOPForCLIP), these parameters control intermediate model saving and evaluation during the merging process. This functionality should either be implemented or these parameters should be removed if they're not intended for this architecture-agnostic version.

Copilot · 2026-02-05T12:30:05Z

fusion_bench/method/dop/dop_general.py

+from fusion_bench.utils.json import save_to_json
+
+from .min_norm_solvers import MinNormSolver, gradient_normalizers
+from .utils import is_leaf_module, print_params, svd


The function is_leaf_module is imported from .utils but never used in this file. The code uses named_leaf_modules instead (which internally calls is_leaf_module). This unused import should be removed.

Suggested change

from .utils import is_leaf_module, print_params, svd

from .utils import print_params, svd

Copilot · 2026-02-05T12:30:08Z

fusion_bench/method/dop/dop_general.py

+from fusion_bench.models.utils import named_leaf_modules
+from fusion_bench.utils import seed_everything_by_time
+from fusion_bench.utils.dtype import dtype_support_svd
+from fusion_bench.utils.json import save_to_json


The save_to_json function is imported but never used in this file. This unused import should be removed (though it would be needed if the save_on_every_step and evaluate_on_every_step features are properly implemented as suggested in another comment).

Suggested change

from fusion_bench.utils.json import save_to_json

Copilot · 2026-02-05T12:30:08Z

fusion_bench/method/dop/dop_general.py

+            optimizer = torch.optim.Adam([merged_weight], lr=self.lr)
+            all_losses = [[], []]
+            all_alphas = [[], []]
+            for step_idx in tqdm(


For loop variable 'step_idx' is not used in the loop body.

Suggested change

for step_idx in tqdm(

for _ in tqdm(

Copilot · 2026-02-05T12:30:09Z

fusion_bench/method/dop/dop_general.py

+        else:
+            # This is a naive weighted optimization
+            optimizer = torch.optim.Adam([merged_weight], lr=self.lr)
+            for step_idx in tqdm(


For loop variable 'step_idx' is not used in the loop body.

Suggested change

for step_idx in tqdm(

for _ in tqdm(

Copilot · 2026-02-05T12:30:09Z

fusion_bench/method/opcm/opcm.py

 from fusion_bench.utils.state_dict_arithmetic import state_dict_sub

-from .utils import frobenius_inner_product, get_task_vector_norm, is_leaf_module, svd
+from .utils import frobenius_inner_product, get_task_vector_norm, svd


Import of 'frobenius_inner_product' is not used.

Suggested change

from .utils import frobenius_inner_product, get_task_vector_norm, svd

from .utils import get_task_vector_norm, svd

Copilot · 2026-02-05T12:30:09Z

fusion_bench/method/opcm/utils.py

 import torch
 from torch import Tensor, nn

+from fusion_bench.models.utils import is_leaf_module


Import of 'is_leaf_module' is not used.

Suggested change

from fusion_bench.models.utils import is_leaf_module

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In `@fusion_bench/method/dop/dop_general.py`:
- Around line 64-114: The constructor sets alpha: float = None but immediately
asserts numeric bounds, causing a crash when alpha is omitted; fix by giving
alpha a safe default (e.g., alpha: float = 0.5) in the __init__ signature or by
handling None before the assertions (e.g., if alpha is None: self.alpha = 0.5
else: self.alpha = alpha), then perform the existing bounds check against
self.alpha and update the docstring to reflect the new default; ensure the
associated symbols are the alpha parameter, self.alpha assignment, and the
subsequent assertions that check self.alpha and self.svd_epsilon.
- Around line 36-37: The class DOPMerging currently declares inheritance as
"class DOPMerging(BaseAlgorithm, LightningFabricMixin)" which places
LightningFabricMixin after BaseAlgorithm and can break method resolution order;
change the declaration to put the mixin first (i.e., inherit from
LightningFabricMixin before BaseAlgorithm) so the MRO calls mixin methods before
base class methods and ensure any super() calls resolve correctly.
- Around line 131-134: The shuffle of model_names currently happens without
applying self.seed, making runs nondeterministic; before calling
random.shuffle(model_names) (where model_names = modelpool.model_names and
guarded by self.shuffle_order), call random.seed(self.seed) (and if numpy or
torch RNGs are used later for optimizer initialization, seed them as well) so
the model order and subsequent optimizer initialization become reproducible;
ensure self.seed is optional and only seed when not None to preserve existing
behavior.
- Around line 286-292: The SVD rank selection can produce NaNs when all singular
values are zero (finetuned_tv == 0); guard the computation by checking the
singular-value sum before computing cumsum_ratio: compute total = s.sum() (or
use torch.isclose to 0) and if total is zero set split_rank = 0 (or another safe
default) rather than running cumsum/torch.searchsorted; then slice u into u_main
= u[:, :split_rank] as before. Apply this check in the loop handling
finetuned_weights/finetuned_tv (refer to finetuned_tv, svd(...)->u,s,v, s.sum(),
cumsum_ratio, split_rank, and u_main) so downstream code never receives NaN
split_rank or cumsum_ratio.

In `@fusion_bench/method/dop/utils.py`:
- Line 77: The loop currently uses an unused variable "module_name" in "for
module_name, module in model.named_modules()"; rename it to "_" (or
"_module_name") or switch to "for module in model.modules()" so the lint
no-unused-variable warning is resolved—update the loop header wherever "for
module_name, module in model.named_modules()" appears (keeping the "module"
variable intact).
- Around line 73-88: The function print_params computes linear_ratio and
linear_weight_ratio by dividing by total_params which can be zero; update
print_params to guard against division by zero by checking total_params (in
function print_params) before computing ratios and use 0.0 (or an appropriate
default) when total_params == 0, ensuring linear_ratio and linear_weight_ratio
are computed only when total_params > 0; reference variables total_params,
linear_params, linear_weight_params and the ratio calculations linear_ratio and
linear_weight_ratio in print_params.

In `@fusion_bench/method/opcm/opcm.py`:
- Around line 17-26: Remove unused imports from the top of opcm.py to resolve
F401: inspect and delete any names from the import block that are not referenced
in this module (candidates shown in the diff include BaseModelPool,
LightningFabricMixin, SimpleProfilerMixin, is_leaf_module,
CLIPVisionModelTaskPool, instantiate, load_from_json, save_to_json,
state_dict_to_vector, state_dict_sub) while keeping only the actually used
symbols (e.g., BaseAlgorithm and the .utils imports like
frobenius_inner_product, get_task_vector_norm, svd if they are used); update the
import statement(s) to import only the needed identifiers so linting no longer
reports unused imports.

In `@fusion_bench/method/opcm/utils.py`:
- Line 6: Remove the unused import is_leaf_module from
fusion_bench/method/opcm/utils.py to resolve the F401 lint error: locate the
import statement "from fusion_bench.models.utils import is_leaf_module" at the
top of that module and delete it (or replace it with a used symbol if intended),
ensuring no other references to is_leaf_module remain in the file.

🧹 Nitpick comments (1)

fusion_bench/method/dop/dop_general.py (1)

6-31: Remove unused imports and add lightning to the cleanup list.

The proposed diff should also include import lightning as L (line 14), which is imported but never used in the code. All other removals are correct.

import logging
-import os
import random
-import time
from copy import deepcopy
-from pathlib import Path
-from typing import Dict, List, Literal, Optional, Tuple, cast
+from typing import Dict, List, Optional

-import lightning as L
import numpy as np
import torch
-from omegaconf import DictConfig
from torch import Tensor, nn
from torch.autograd import Variable
from tqdm.auto import tqdm

from fusion_bench import BaseAlgorithm, BaseModelPool, auto_register_config
from fusion_bench.method.simple_average import simple_average
from fusion_bench.mixins import LightningFabricMixin
from fusion_bench.models.utils import named_leaf_modules
from fusion_bench.utils import seed_everything_by_time
from fusion_bench.utils.dtype import dtype_support_svd
-from fusion_bench.utils.json import save_to_json

from .min_norm_solvers import MinNormSolver, gradient_normalizers
-from .utils import is_leaf_module, print_params, svd
+from .utils import svd

Per the coding guidelines: avoid importing PyTorch and Transformers at module level in method implementations; defer imports to function/method bodies.

fusion_bench/method/dop/dop_general.py

coderabbitai · 2026-02-05T12:35:32Z

fusion_bench/method/dop/dop_general.py

+        alpha: float = None,
+        svd_epsilon: float = 1.0,
+        svd_proj_space: str = "uv",
+        exclude_keys: List[str] | None = None,
+        **kwargs,
+    ):
+        """
+        Initialize the DOP merging algorithm.
+
+        Args:
+            seed: Random seed for reproducibility. If None, uses time-based seeding.
+            shuffle_order: Whether to shuffle the order of models before merging.
+            save_on_every_step: Whether to save the model after each merge step.
+            evaluate_on_every_step: Whether to evaluate the model after each merge step.
+            lr: Learning rate for the optimization process.
+            num_steps: Number of optimization steps per layer.
+            mgda: Whether to use Multi-Gradient Descent Algorithm for balancing losses.
+            ema: Whether to use exponential moving average for MGDA weights.
+            ema_beta: EMA decay rate for MGDA weights (only used if ema=True).
+            alpha: Weight for balancing between stability and plasticity (0-1).
+                   When mgda=False, used as a fixed weight. When mgda=True with ema=True,
+                   used as initial weight.
+            svd_epsilon: Threshold for SVD rank selection (0-1). Determines how much
+                        variance to preserve in the projection space.
+            svd_proj_space: SVD projection space to use: 'u', 'v', or 'uv' (both).
+            exclude_keys: List of module names to exclude from optimization.
+            **kwargs: Additional arguments passed to BaseAlgorithm.
+        """
+        self.lr = lr
+        self.num_steps = num_steps
+        self.mgda = mgda
+        self.ema = ema
+        self.ema_beta = ema_beta
+        self.alpha = alpha
+        self.svd_epsilon = svd_epsilon
+        self.svd_proj_space = svd_proj_space
+        self.seed = seed
+        self.shuffle_order = shuffle_order
+        self.save_on_every_step = save_on_every_step
+        self.evaluate_on_every_step = evaluate_on_every_step
+
+        if exclude_keys is None:
+            exclude_keys = []
+        self.exclude_keys = exclude_keys
+
+        assert (
+            self.svd_epsilon >= 0 and self.svd_epsilon <= 1
+        ), "The svd_epsilon should be in the range of [0, 1]"
+        assert (
+            self.alpha >= 0 and self.alpha <= 1
+        ), "The alpha should be in the range of [0, 1]"


⚠️ Potential issue | 🟠 Major

Fix alpha=None causing an assertion crash.

alpha defaults to None, but is immediately compared to numbers. This raises at runtime when alpha isn’t provided.

🛠️ Proposed fix

- alpha: float = None, + alpha: float | None = None, @@ - assert ( - self.alpha >= 0 and self.alpha <= 1 - ), "The alpha should be in the range of [0, 1]" + if self.alpha is None: + self.alpha = 0.5 + if not (0 <= self.alpha <= 1): + raise ValueError("alpha must be in the range of [0, 1]")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

alpha: float = None,

svd_epsilon: float = 1.0,

svd_proj_space: str = "uv",

exclude_keys: List[str] | None = None,

**kwargs,

):

"""

Initialize the DOP merging algorithm.

Args:

seed: Random seed for reproducibility. If None, uses time-based seeding.

shuffle_order: Whether to shuffle the order of models before merging.

save_on_every_step: Whether to save the model after each merge step.

evaluate_on_every_step: Whether to evaluate the model after each merge step.

lr: Learning rate for the optimization process.

num_steps: Number of optimization steps per layer.

mgda: Whether to use Multi-Gradient Descent Algorithm for balancing losses.

ema: Whether to use exponential moving average for MGDA weights.

ema_beta: EMA decay rate for MGDA weights (only used if ema=True).

alpha: Weight for balancing between stability and plasticity (0-1).

When mgda=False, used as a fixed weight. When mgda=True with ema=True,

used as initial weight.

svd_epsilon: Threshold for SVD rank selection (0-1). Determines how much

variance to preserve in the projection space.

svd_proj_space: SVD projection space to use: 'u', 'v', or 'uv' (both).

exclude_keys: List of module names to exclude from optimization.

**kwargs: Additional arguments passed to BaseAlgorithm.

"""

self.lr = lr

self.num_steps = num_steps

self.mgda = mgda

self.ema = ema

self.ema_beta = ema_beta

self.alpha = alpha

self.svd_epsilon = svd_epsilon

self.svd_proj_space = svd_proj_space

self.seed = seed

self.shuffle_order = shuffle_order

self.save_on_every_step = save_on_every_step

self.evaluate_on_every_step = evaluate_on_every_step

if exclude_keys is None:

exclude_keys = []

self.exclude_keys = exclude_keys

assert (

self.svd_epsilon >= 0 and self.svd_epsilon <= 1

), "The svd_epsilon should be in the range of [0, 1]"

assert (

self.alpha >= 0 and self.alpha <= 1

), "The alpha should be in the range of [0, 1]"

alpha: float | None = None,

svd_epsilon: float = 1.0,

svd_proj_space: str = "uv",

exclude_keys: List[str] | None = None,

**kwargs,

):

"""

Initialize the DOP merging algorithm.

Args:

seed: Random seed for reproducibility. If None, uses time-based seeding.

shuffle_order: Whether to shuffle the order of models before merging.

save_on_every_step: Whether to save the model after each merge step.

evaluate_on_every_step: Whether to evaluate the model after each merge step.

lr: Learning rate for the optimization process.

num_steps: Number of optimization steps per layer.

mgda: Whether to use Multi-Gradient Descent Algorithm for balancing losses.

ema: Whether to use exponential moving average for MGDA weights.

ema_beta: EMA decay rate for MGDA weights (only used if ema=True).

alpha: Weight for balancing between stability and plasticity (0-1).

When mgda=False, used as a fixed weight. When mgda=True with ema=True,

used as initial weight.

svd_epsilon: Threshold for SVD rank selection (0-1). Determines how much

variance to preserve in the projection space.

svd_proj_space: SVD projection space to use: 'u', 'v', or 'uv' (both).

exclude_keys: List of module names to exclude from optimization.

**kwargs: Additional arguments passed to BaseAlgorithm.

"""

self.lr = lr

self.num_steps = num_steps

self.mgda = mgda

self.ema = ema

self.ema_beta = ema_beta

self.alpha = alpha

self.svd_epsilon = svd_epsilon

self.svd_proj_space = svd_proj_space

self.seed = seed

self.shuffle_order = shuffle_order

self.save_on_every_step = save_on_every_step

self.evaluate_on_every_step = evaluate_on_every_step

if exclude_keys is None:

exclude_keys = []

self.exclude_keys = exclude_keys

if self.alpha is None:

self.alpha = 0.5

if not (0 <= self.alpha <= 1):

raise ValueError("alpha must be in the range of [0, 1]")

assert (

self.svd_epsilon >= 0 and self.svd_epsilon <= 1

), "The svd_epsilon should be in the range of [0, 1]"

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 64-64: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents

In `@fusion_bench/method/dop/dop_general.py` around lines 64 - 114, The constructor sets alpha: float = None but immediately asserts numeric bounds, causing a crash when alpha is omitted; fix by giving alpha a safe default (e.g., alpha: float = 0.5) in the __init__ signature or by handling None before the assertions (e.g., if alpha is None: self.alpha = 0.5 else: self.alpha = alpha), then perform the existing bounds check against self.alpha and update the docstring to reflect the new default; ensure the associated symbols are the alpha parameter, self.alpha assignment, and the subsequent assertions that check self.alpha and self.svd_epsilon.

coderabbitai · 2026-02-05T12:35:32Z

fusion_bench/method/dop/dop_general.py

+        model_names = modelpool.model_names
+        if self.shuffle_order:
+            random.shuffle(model_names)
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n fusion_bench/method/dop/dop_general.py | head -200

Repository: tanganke/fusion_bench

Length of output: 9274

🏁 Script executed:

rg "self\.seed" fusion_bench/method/dop/dop_general.py -n

Repository: tanganke/fusion_bench

Length of output: 93

🏁 Script executed:

cat -n fusion_bench/method/dop/dop_general.py | sed -n '200,300p'

Repository: tanganke/fusion_bench

Length of output: 5034

🏁 Script executed:

cat -n fusion_bench/method/dop/dop_general.py | sed -n '300,450p'

Repository: tanganke/fusion_bench

Length of output: 5845

🏁 Script executed:

rg "L\.seed_everything|seed_everything" --type py | head -20

Repository: tanganke/fusion_bench

Length of output: 1732

🏁 Script executed:

cat -n fusion_bench/method/opcm/opcm.py | grep -A 5 -B 5 "L.seed_everything"

Repository: tanganke/fusion_bench

Length of output: 470

🏁 Script executed:

cat -n fusion_bench/utils/fabric.py | grep -A 15 "def seed_everything_by_time"

Repository: tanganke/fusion_bench

Length of output: 843

🏁 Script executed:

cat -n fusion_bench/utils/fabric.py | sed -n '16,50p'

Repository: tanganke/fusion_bench

Length of output: 1627

🏁 Script executed:

cat -n fusion_bench/utils/fabric.py | sed -n '16,70p'

Repository: tanganke/fusion_bench

Length of output: 2396

🏁 Script executed:

rg "on_run_start|on_run_end" fusion_bench/method/dop/dop_general.py

Repository: tanganke/fusion_bench

Length of output: 47

🏁 Script executed:

rg "def on_run_start|def on_run_end" fusion_bench --type py | head -10

Repository: tanganke/fusion_bench

Length of output: 584

🏁 Script executed:

cat -n fusion_bench/method/base_algorithm.py | grep -A 20 "def on_run_start"

Repository: tanganke/fusion_bench

Length of output: 1828

Apply seeding before shuffling to honor the reproducibility contract.

self.seed is never used, so runs are always nondeterministic even when a seed is provided. This affects both the model order shuffling and the subsequent optimizer initialization.

🎯 Proposed fix

+ if self.seed is not None: + L.seed_everything(self.seed) + else: + seed_everything_by_time() model_names = modelpool.model_names if self.shuffle_order: random.shuffle(model_names)

🤖 Prompt for AI Agents

In `@fusion_bench/method/dop/dop_general.py` around lines 131 - 134, The shuffle of model_names currently happens without applying self.seed, making runs nondeterministic; before calling random.shuffle(model_names) (where model_names = modelpool.model_names and guarded by self.shuffle_order), call random.seed(self.seed) (and if numpy or torch RNGs are used later for optimizer initialization, seed them as well) so the model order and subsequent optimizer initialization become reproducible; ensure self.seed is optional and only seed when not None to preserve existing behavior.

coderabbitai · 2026-02-05T12:35:32Z

fusion_bench/method/dop/dop_general.py

+        for i, finetuned_weight in enumerate(finetuned_weights.values()):
+            finetuned_tv = finetuned_weight - pretrained_weight
+            u, s, v = svd(finetuned_tv, full_matrices=True)
+            epsilon = 1.0 if self.svd_epsilon > 1.0 else self.svd_epsilon
+            cumsum_ratio = s.cumsum(dim=0) / s.sum()
+            split_rank = torch.searchsorted(cumsum_ratio, epsilon).item()
+            u_main = u[:, :split_rank]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "dop_general.py" | head -5

Repository: tanganke/fusion_bench

Length of output: 105

🏁 Script executed:

cat -n fusion_bench/method/dop/dop_general.py | sed -n '280,310p'

Repository: tanganke/fusion_bench

Length of output: 1630

🏁 Script executed:

cat -n fusion_bench/method/dop/dop_general.py | sed -n '1,50p'

Repository: tanganke/fusion_bench

Length of output: 2206

🏁 Script executed:

cat -n fusion_bench/method/dop/utils.py | grep -A 20 "def svd"

Repository: tanganke/fusion_bench

Length of output: 922

🏁 Script executed:

cat -n fusion_bench/method/dop/dop_general.py | sed -n '295,350p'

Repository: tanganke/fusion_bench

Length of output: 2799

🏁 Script executed:

rg "s\.sum\(\)" fusion_bench/method/dop/

Repository: tanganke/fusion_bench

Length of output: 240

🏁 Script executed:

python3 << 'EOF' import torch # Test 1: What happens with NaN in cumsum_ratio? s = torch.tensor([0.0, 0.0, 0.0]) cumsum_ratio = s.cumsum(dim=0) / s.sum() print("Test 1 - NaN case:") print(f" s = {s}") print(f" s.sum() = {s.sum()}") print(f" cumsum_ratio = {cumsum_ratio}") epsilon = 0.95 result = torch.searchsorted(cumsum_ratio, epsilon) print(f" searchsorted(cumsum_ratio, {epsilon}) = {result}") print(f" .item() = {result.item()}") # Test 2: What about empty slices? u = torch.randn(10, 10) split_rank = 0 u_main = u[:, :split_rank] print(f"\nTest 2 - Empty slice:") print(f" u.shape = {u.shape}") print(f" u[:, :0].shape = {u_main.shape}") # Test 3: Normal case s_normal = torch.tensor([5.0, 3.0, 1.0, 0.1]) cumsum_normal = s_normal.cumsum(dim=0) / s_normal.sum() print(f"\nTest 3 - Normal case:") print(f" cumsum_ratio = {cumsum_normal}") split_rank_normal = torch.searchsorted(cumsum_normal, epsilon).item() print(f" searchsorted result = {split_rank_normal}") EOF

Repository: tanganke/fusion_bench

Length of output: 182

🏁 Script executed:

pip install torch -q && python3 << 'EOF' import torch # Test 1: What happens with NaN in cumsum_ratio? s = torch.tensor([0.0, 0.0, 0.0]) cumsum_ratio = s.cumsum(dim=0) / s.sum() print("Test 1 - NaN case:") print(f" s = {s}") print(f" s.sum() = {s.sum()}") print(f" cumsum_ratio = {cumsum_ratio}") epsilon = 0.95 result = torch.searchsorted(cumsum_ratio, epsilon) print(f" searchsorted(cumsum_ratio, {epsilon}) = {result}") print(f" .item() = {result.item()}") # Test 2: What about empty slices? u = torch.randn(10, 10) split_rank = 0 u_main = u[:, :split_rank] print(f"\nTest 2 - Empty slice:") print(f" u.shape = {u.shape}") print(f" u[:, :0].shape = {u_main.shape}") # Test 3: Normal case s_normal = torch.tensor([5.0, 3.0, 1.0, 0.1]) cumsum_normal = s_normal.cumsum(dim=0) / s_normal.sum() print(f"\nTest 3 - Normal case:") print(f" cumsum_ratio = {cumsum_normal}") split_rank_normal = torch.searchsorted(cumsum_normal, epsilon).item() print(f" searchsorted result = {split_rank_normal}") EOF

Repository: tanganke/fusion_bench

Length of output: 151

🏁 Script executed:

rg -A 10 "def cal_loss_i" fusion_bench/method/dop/dop_general.py

Repository: tanganke/fusion_bench

Length of output: 576

🏁 Script executed:

cat -n fusion_bench/method/dop/dop.py | sed -n '1,50p'

Repository: tanganke/fusion_bench

Length of output: 1868

🏁 Script executed:

rg -B 5 -A 10 "cumsum_ratio = s.cumsum" fusion_bench/method/dop/dop.py

Repository: tanganke/fusion_bench

Length of output: 758

Guard SVD rank selection when all singular values are zero.

When finetuned_weight == pretrained_weight, all singular values are zero and s.sum() evaluates to 0. This causes the cumsum ratio computation to produce NaN values, which propagates through torch.searchsorted() and poisons subsequent loss calculations in the optimization loop.

Proposed fix

- cumsum_ratio = s.cumsum(dim=0) / s.sum() - split_rank = torch.searchsorted(cumsum_ratio, epsilon).item() - u_main = u[:, :split_rank] - v_main = v[:, :split_rank] - s_main = s[:split_rank] + s_sum = s.sum() + if s_sum == 0: + split_rank = 0 + u_main = u[:, :0] + v_main = v[:, :0] + s_main = s[:0] + else: + cumsum_ratio = s.cumsum(dim=0) / s_sum + split_rank = torch.searchsorted(cumsum_ratio, epsilon).item() + u_main = u[:, :split_rank] + v_main = v[:, :split_rank] + s_main = s[:split_rank]

🤖 Prompt for AI Agents

In `@fusion_bench/method/dop/dop_general.py` around lines 286 - 292, The SVD rank selection can produce NaNs when all singular values are zero (finetuned_tv == 0); guard the computation by checking the singular-value sum before computing cumsum_ratio: compute total = s.sum() (or use torch.isclose to 0) and if total is zero set split_rank = 0 (or another safe default) rather than running cumsum/torch.searchsorted; then slice u into u_main = u[:, :split_rank] as before. Apply this check in the loop handling finetuned_weights/finetuned_tv (refer to finetuned_tv, svd(...)->u,s,v, s.sum(), cumsum_ratio, split_rank, and u_main) so downstream code never receives NaN split_rank or cumsum_ratio.

coderabbitai · 2026-02-05T12:35:32Z

fusion_bench/method/dop/utils.py

+def print_params(model):
+    total_params = 0
+    linear_params = 0
+    linear_weight_params = 0
+    for module_name, module in model.named_modules():
+        if not is_leaf_module(module):
+            continue
+        if isinstance(module, nn.Linear):
+            linear_params += sum(p.numel() for n, p in module.named_parameters())
+            linear_weight_params += sum(
+                p.numel() for n, p in module.named_parameters() if "weight" in n
+            )
+        total_params += sum(p.numel() for p in module.parameters())
+
+    linear_ratio = linear_params / total_params * 100
+    linear_weight_ratio = linear_weight_params / total_params * 100


⚠️ Potential issue | 🟡 Minor

Guard against zero-parameter models to avoid ZeroDivision.

If total_params is 0, the ratio computation will raise. A small guard keeps this safe for edge cases.

🔧 Proposed fix

- linear_ratio = linear_params / total_params * 100 - linear_weight_ratio = linear_weight_params / total_params * 100 + if total_params == 0: + linear_ratio = 0.0 + linear_weight_ratio = 0.0 + else: + linear_ratio = linear_params / total_params * 100 + linear_weight_ratio = linear_weight_params / total_params * 100

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def print_params(model):

total_params = 0

linear_params = 0

linear_weight_params = 0

for module_name, module in model.named_modules():

if not is_leaf_module(module):

continue

if isinstance(module, nn.Linear):

linear_params += sum(p.numel() for n, p in module.named_parameters())

linear_weight_params += sum(

p.numel() for n, p in module.named_parameters() if "weight" in n

)

total_params += sum(p.numel() for p in module.parameters())

linear_ratio = linear_params / total_params * 100

linear_weight_ratio = linear_weight_params / total_params * 100

def print_params(model):

total_params = 0

linear_params = 0

linear_weight_params = 0

for module_name, module in model.named_modules():

if not is_leaf_module(module):

continue

if isinstance(module, nn.Linear):

linear_params += sum(p.numel() for n, p in module.named_parameters())

linear_weight_params += sum(

p.numel() for n, p in module.named_parameters() if "weight" in n

)

total_params += sum(p.numel() for p in module.parameters())

if total_params == 0:

linear_ratio = 0.0

linear_weight_ratio = 0.0

else:

linear_ratio = linear_params / total_params * 100

linear_weight_ratio = linear_weight_params / total_params * 100

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 77-77: Loop control variable module_name not used within loop body

Rename unused module_name to _module_name

(B007)

🤖 Prompt for AI Agents

In `@fusion_bench/method/dop/utils.py` around lines 73 - 88, The function print_params computes linear_ratio and linear_weight_ratio by dividing by total_params which can be zero; update print_params to guard against division by zero by checking total_params (in function print_params) before computing ratios and use 0.0 (or an appropriate default) when total_params == 0, ensuring linear_ratio and linear_weight_ratio are computed only when total_params > 0; reference variables total_params, linear_params, linear_weight_params and the ratio calculations linear_ratio and linear_weight_ratio in print_params.

coderabbitai · 2026-02-05T12:35:32Z

fusion_bench/method/dop/utils.py

+    total_params = 0
+    linear_params = 0
+    linear_weight_params = 0
+    for module_name, module in model.named_modules():


⚠️ Potential issue | 🟡 Minor

Rename unused loop variable to satisfy lint.

🧹 Proposed fix

- for module_name, module in model.named_modules(): + for _module_name, module in model.named_modules():

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for module_name, module in model.named_modules():

for _module_name, module in model.named_modules():

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 77-77: Loop control variable module_name not used within loop body

Rename unused module_name to _module_name

(B007)

🤖 Prompt for AI Agents

In `@fusion_bench/method/dop/utils.py` at line 77, The loop currently uses an unused variable "module_name" in "for module_name, module in model.named_modules()"; rename it to "_" (or "_module_name") or switch to "for module in model.modules()" so the lint no-unused-variable warning is resolved—update the loop header wherever "for module_name, module in model.named_modules()" appears (keeping the "module" variable intact).

coderabbitai · 2026-02-05T12:35:33Z

fusion_bench/method/opcm/opcm.py

 from fusion_bench import BaseAlgorithm, BaseModelPool
 from fusion_bench.mixins import LightningFabricMixin, SimpleProfilerMixin
+from fusion_bench.models.utils import is_leaf_module
 from fusion_bench.taskpool import CLIPVisionModelTaskPool
 from fusion_bench.utils import instantiate
 from fusion_bench.utils.json import load_from_json, save_to_json
 from fusion_bench.utils.parameters import state_dict_to_vector
 from fusion_bench.utils.state_dict_arithmetic import state_dict_sub

-from .utils import frobenius_inner_product, get_task_vector_norm, is_leaf_module, svd
+from .utils import frobenius_inner_product, get_task_vector_norm, svd


⚠️ Potential issue | 🟡 Minor

Drop unused imports to satisfy F401.

🧹 Proposed fix

-from fusion_bench.utils import instantiate -from fusion_bench.utils.json import load_from_json, save_to_json -from fusion_bench.utils.parameters import state_dict_to_vector -from fusion_bench.utils.state_dict_arithmetic import state_dict_sub +from fusion_bench.utils.json import save_to_json @@ -from .utils import frobenius_inner_product, get_task_vector_norm, svd +from .utils import get_task_vector_norm, svd

🧰 Tools

🪛 Flake8 (7.3.0)

[error] 21-21: 'fusion_bench.utils.instantiate' imported but unused

(F401)

[error] 22-22: 'fusion_bench.utils.json.load_from_json' imported but unused

(F401)

[error] 23-23: 'fusion_bench.utils.parameters.state_dict_to_vector' imported but unused

(F401)

[error] 24-24: 'fusion_bench.utils.state_dict_arithmetic.state_dict_sub' imported but unused

(F401)

[error] 26-26: '.utils.frobenius_inner_product' imported but unused

(F401)

🤖 Prompt for AI Agents

In `@fusion_bench/method/opcm/opcm.py` around lines 17 - 26, Remove unused imports from the top of opcm.py to resolve F401: inspect and delete any names from the import block that are not referenced in this module (candidates shown in the diff include BaseModelPool, LightningFabricMixin, SimpleProfilerMixin, is_leaf_module, CLIPVisionModelTaskPool, instantiate, load_from_json, save_to_json, state_dict_to_vector, state_dict_sub) while keeping only the actually used symbols (e.g., BaseAlgorithm and the .utils imports like frobenius_inner_product, get_task_vector_norm, svd if they are used); update the import statement(s) to import only the needed identifiers so linting no longer reports unused imports.

coderabbitai · 2026-02-05T12:35:33Z

fusion_bench/method/opcm/utils.py

 import torch
 from torch import Tensor, nn

+from fusion_bench.models.utils import is_leaf_module


⚠️ Potential issue | 🟡 Minor

Remove unused import to satisfy F401.

🧹 Proposed fix

-from fusion_bench.models.utils import is_leaf_module

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from fusion_bench.models.utils import is_leaf_module

🧰 Tools

🪛 Flake8 (7.3.0)

[error] 6-6: 'fusion_bench.models.utils.is_leaf_module' imported but unused

(F401)

🤖 Prompt for AI Agents

In `@fusion_bench/method/opcm/utils.py` at line 6, Remove the unused import is_leaf_module from fusion_bench/method/opcm/utils.py to resolve the F401 lint error: locate the import statement "from fusion_bench.models.utils import is_leaf_module" at the top of that module and delete it (or replace it with a used symbol if intended), ensuring no other references to is_leaf_module remain in the file.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@fusion_bench/method/dop/dop_general.py`:
- Around line 54-107: The constructor fails to set self.num_ray_actors, causing
AttributeError when run() or _layer_wise_optimize() access it; fix by assigning
the incoming num_ray_actors parameter to the instance (e.g., add
self.num_ray_actors = num_ray_actors inside __init__), ensuring the attribute
exists and matches the provided int value used by run() and
_layer_wise_optimize().

coderabbitai · 2026-02-05T12:52:51Z

fusion_bench/method/dop/dop_general.py

+    def __init__(
+        self,
+        seed: Optional[int] = None,
+        shuffle_order: bool = False,
+        save_on_every_step: bool = True,
+        evaluate_on_every_step: bool = False,
+        lr: float = 1e-4,
+        num_steps: int = 200,
+        mgda: bool = True,
+        ema: bool = True,
+        ema_beta: float = 0.99,
+        alpha: float = None,
+        svd_epsilon: float = 1.0,
+        svd_proj_space: str = "uv",
+        exclude_keys: List[str] | None = None,
+        num_ray_actors: int = 0,
+        **kwargs,
+    ):
+        """
+        Initialize the DOP merging algorithm.
+
+        Args:
+            seed: Random seed for reproducibility. If None, uses time-based seeding.
+            shuffle_order: Whether to shuffle the order of models before merging.
+            save_on_every_step: Whether to save the model after each merge step.
+            evaluate_on_every_step: Whether to evaluate the model after each merge step.
+            lr: Learning rate for the optimization process.
+            num_steps: Number of optimization steps per layer.
+            mgda: Whether to use Multi-Gradient Descent Algorithm for balancing losses.
+            ema: Whether to use exponential moving average for MGDA weights.
+            ema_beta: EMA decay rate for MGDA weights (only used if ema=True).
+            alpha: Weight for balancing between stability and plasticity (0-1).
+                   When mgda=False, used as a fixed weight. When mgda=True with ema=True,
+                   used as initial weight.
+            svd_epsilon: Threshold for SVD rank selection (0-1). Determines how much
+                        variance to preserve in the projection space.
+            svd_proj_space: SVD projection space to use: 'u', 'v', or 'uv' (both).
+            exclude_keys: List of module names to exclude from optimization.
+            num_ray_actors: Number of Ray actors to use for parallel processing. If 0, ray is not used.
+            **kwargs: Additional arguments passed to BaseAlgorithm.
+        """
+        self.lr = lr
+        self.num_steps = num_steps
+        self.mgda = mgda
+        self.ema = ema
+        self.ema_beta = ema_beta
+        self.alpha = alpha
+        self.svd_epsilon = svd_epsilon
+        self.svd_proj_space = svd_proj_space
+        self.seed = seed
+        self.shuffle_order = shuffle_order
+        self.save_on_every_step = save_on_every_step
+        self.evaluate_on_every_step = evaluate_on_every_step
+


⚠️ Potential issue | 🔴 Critical

Persist num_ray_actors to avoid AttributeError in run().

self.num_ray_actors is used in run() and _layer_wise_optimize() but never assigned, which will crash at runtime.

🛠️ Proposed fix

self.seed = seed self.shuffle_order = shuffle_order self.save_on_every_step = save_on_every_step self.evaluate_on_every_step = evaluate_on_every_step + self.num_ray_actors = num_ray_actors

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 65-65: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

[warning] 69-69: Unused method argument: num_ray_actors

(ARG002)

🤖 Prompt for AI Agents

In `@fusion_bench/method/dop/dop_general.py` around lines 54 - 107, The constructor fails to set self.num_ray_actors, causing AttributeError when run() or _layer_wise_optimize() access it; fix by assigning the incoming num_ray_actors parameter to the instance (e.g., add self.num_ray_actors = num_ray_actors inside __init__), ensuring the attribute exists and matches the provided int value used by run() and _layer_wise_optimize().

…etup

…model loading with profiling; update LightningFabricMixin to default to 1 device if no configuration is found.

Implement DOP merging algorithm and utility functions for model param…

baba29f

…eter analysis

Copilot AI review requested due to automatic review settings February 5, 2026 12:23

pull-request-size bot added the size/XL label Feb 5, 2026

Copilot started reviewing on behalf of tanganke February 5, 2026 12:24 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

tanganke added 2 commits February 5, 2026 20:44

Add support for parallel processing with Ray in DOPMerging algorithm

38dfc24

Add configuration file for DOPMerging with initial parameters

4b0637a

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

tanganke added 4 commits February 5, 2026 21:07

Add distributed_ray example and enhance DOPMerging with GPU support

259cdc6

Add distributed DOP example with model loading and parallel merging s…

ae31366

…etup

Update error message to include installation command for Ray dependency

6027fa2

Refactor DOPMerging class to include SimpleProfilerMixin and enhance …

842332f

…model loading with profiling; update LightningFabricMixin to default to 1 device if no configuration is found.

tanganke merged commit 4302a8a into main Feb 5, 2026
2 of 4 checks passed

tanganke deleted the feature/architecture-agnostic-dop branch February 5, 2026 13:56

	# Comptue gradients of each loss function wrt parameters
	# Compute gradients of each loss function wrt parameters

	from .utils import is_leaf_module, print_params, svd
	from .utils import print_params, svd

	from .utils import frobenius_inner_product, get_task_vector_norm, svd
	from .utils import get_task_vector_norm, svd

	for module_name, module in model.named_modules():
	for _module_name, module in model.named_modules():

Comments

Conversation

tanganke commented Feb 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

tanganke commented Feb 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 5, 2026 •

edited

Loading