Fix fedprox MNIST tutorial for pytorch #1634

nisha987 · 2025-05-19T06:05:36Z

Summary

Models trained with different mu values in the FedProx optimizer appear to have identical weights, despite the expectation that different mu values should produce different models. This suggests that the proximal term in the FedProx algorithm is not being applied correctly.

Type of Change (Mandatory)

Specify the type of change being made.

Bug fix : FedProx: getting same model depsite different mu values #823

Description (Mandatory)

1. Incorrect Timing of the `set_old_weights` Call

In the current notebook implementation, set_old_weights is called after gradient computation but right before the optimizer step. This is problematic because:

It sets the reference weights (w_old) to be nearly identical to the current weights
When the optimizer step is executed, the proximal term mu * (param - w_old_param) becomes effectively zero
This nullifies the effect of different mu values, resulting in identical training outcomes

2. Missing Initial `w_old` Value in Optimizer

The optimizers (FedProxOptimizer and FedProxAdam) don't initialize the w_old parameter in their constructors. If step() is called before set_old_weights, there may be an error accessing the uninitialized w_old
.

Testing

Tested locally.

Copilot

Pull Request Overview

This PR fixes a bug in the FedProx MNIST tutorial for PyTorch by correcting the timing of the old weights update in the FedProx optimizers and by ensuring proper initialization of the reference weights. Key changes include:

Initializing "w_old" in both FedProxOptimizer and FedProxAdam constructors.
Guarding the application of the proximal term in the optimizer step using a new "apply_proximal" flag.
Updating the tutorial notebook to set the old weights once at the beginning of training rather than repeatedly per batch.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
openfl/utilities/optimizers/torch/fedprox.py	Added initialization for "w_old" and applied conditional proximal updates in both step and adam functions.
openfl-tutorials/experimental/workflow/403_Federated_FedProx_PyTorch_MNIST_Workflow_Tutorial.ipynb	Revised execution counts and repositioned the set_old_weights call to occur only once at training start.

openfl/utilities/optimizers/torch/fedprox.py

...-tutorials/experimental/workflow/403_Federated_FedProx_PyTorch_MNIST_Workflow_Tutorial.ipynb

kminhta

Good catch @nisha987 ! Thanks for taking up that long standing issue

openfl/utilities/optimizers/torch/fedprox.py

kminhta · 2025-05-19T13:59:23Z

openfl/utilities/optimizers/torch/fedprox.py

            "dampening": dampening,
            "lr": lr,
            "momentum": momentum,
            "mu": mu,


General: we should probably rename FedProxOptimizer to FedProxSGD to maintain naming convention, but maybe we can save that for a future PR

openfl/utilities/optimizers/torch/fedprox.py

kminhta · 2025-05-20T17:50:46Z

Thanks @nisha987 This is looking good. I had one more minor comment regarding mu
Can you also rebase and signoff on your commits? https://github.com/securefederatedai/openfl/pull/1634/checks?check_run_id=42564995654

Signed-off-by: Shekhawat, Nisha <[email protected]>

nisha987 · 2025-05-21T04:51:42Z

Thanks @nisha987 This is looking good. I had one more minor comment regarding mu Can you also rebase and signoff on your commits? https://github.com/securefederatedai/openfl/pull/1634/checks?check_run_id=42564995654

added signoff to all commits.

MasterSkepticista · 2025-05-21T04:58:15Z

openfl/utilities/optimizers/torch/fedprox.py

            raise ValueError(f"Invalid learning rate: {lr}")
        if weight_decay < 0.0:
            raise ValueError(f"Invalid weight_decay value: {weight_decay}")
        if mu < 0.0:


One liner assert mu >= 0.0, f"FedProx regularizer coefficient must be greater than or equal to 0, got {mu}" is sufficient.
I don't think there are any scenarios where negative mu is used. It is OK to raise exception here instead of a warning.

MasterSkepticista · 2025-05-21T04:59:14Z

openfl/utilities/optimizers/torch/fedprox.py

+    IMPORTANT: This optimizer requires a reference to the original (global) model parameters
+    to calculate the proximal term. These must be set explicitly using the set_old_weights()
+    method before training begins. The old weights (w_old) must match the order and structure
+    of the model's parameters. Typically, w_old should be set to the initial global model
+    parameters received from the aggregator at the beginning of each round.
+
+    If mu > 0 and w_old is not set, the optimizer will raise a ValueError.
+


Is it possible to record the first value of weights supplied to the optimizer as w_old at the end of iteration? User need not set this.

This is a good suggestion. I hadn't considered this.

@nisha987 - the optimizer is supplied with the model parameters when it is first initialized. These parameters are updated during optimizer.step() . This is the global model and it should be possible to also record these weights (before updating) as w_old internally rather than having the user explicitly do it, as @MasterSkepticista suggests

Then we won't risk running into the issue that we saw in the fedprox mnist tutorial

MasterSkepticista · 2025-05-21T04:59:56Z

openfl/utilities/optimizers/torch/fedprox.py

+        if mu > 0 and w_old is None:
+            raise ValueError(
+                "FedProx requires old weights to be set when mu > 0. "
+                "Please call set_old_weights() before optimization step."
+            )


This is assumed to be verified during init, no?

MasterSkepticista · 2025-05-21T05:01:46Z

openfl/utilities/optimizers/torch/fedprox.py

+            self._validate_old_weights(mu, w_old)
+
+            # Apply proximal term when mu != 0
+            apply_proximal = w_old is not None


Overall, consider a simpler approach:

apply_proximal should always be true. We guarantee that mu will be >=0.0 during initialization. Which means negative values never appear. As for 0.0 case, it means "no contribution" of the regularizer term. It becomes implied that it makes no effect on the weights.

MasterSkepticista · 2025-05-21T05:51:02Z

openfl/utilities/optimizers/torch/fedprox.py

                if weight_decay != 0:
                    d_p = d_p.add(p, alpha=weight_decay)


Similar case: weight_decay can be guaranteed to be >=0.0 during initialization and this check can be avoided

nisha987 requested review from MasterSkepticista, kminhta, psfoley, rahulga1 and teoparvanov as code owners May 19, 2025 06:05

rahulga1 requested a review from Copilot May 19, 2025 08:08

Copilot AI reviewed May 19, 2025

View reviewed changes

openfl/utilities/optimizers/torch/fedprox.py Show resolved Hide resolved

...-tutorials/experimental/workflow/403_Federated_FedProx_PyTorch_MNIST_Workflow_Tutorial.ipynb Show resolved Hide resolved

kminhta requested changes May 19, 2025

View reviewed changes

nisha987 force-pushed the nshekhaw/fix_mu_pytorch_fedprox branch 2 times, most recently from df1e827 to b5a2cf7 Compare May 20, 2025 06:09

kminhta reviewed May 20, 2025

View reviewed changes

openfl/utilities/optimizers/torch/fedprox.py Outdated Show resolved Hide resolved

nisha987 force-pushed the nshekhaw/fix_mu_pytorch_fedprox branch from d8b1df1 to 2b95cf3 Compare May 21, 2025 04:47

nisha987 and others added 5 commits May 20, 2025 21:49

fix_torch_mu_value

b6544c9

Signed-off-by: Shekhawat, Nisha <[email protected]>

change

6fce86e

Signed-off-by: Shekhawat, Nisha <[email protected]>

change_for_mu_values_comment

3387aaa

Signed-off-by: Shekhawat, Nisha <[email protected]>

fix_pre_commit_flake

04a7967

Signed-off-by: Shekhawat, Nisha <[email protected]>

remove_mu_is_0

d432356

Signed-off-by: Shekhawat, Nisha <[email protected]>

nisha987 force-pushed the nshekhaw/fix_mu_pytorch_fedprox branch from 2b95cf3 to d432356 Compare May 21, 2025 04:49

MasterSkepticista reviewed May 21, 2025

View reviewed changes

Fix fedprox MNIST tutorial for pytorch #1634

Are you sure you want to change the base?

Fix fedprox MNIST tutorial for pytorch #1634

Uh oh!

Conversation

nisha987 commented May 19, 2025

Summary

Type of Change (Mandatory)

Description (Mandatory)

1. Incorrect Timing of the set_old_weights Call

2. Missing Initial w_old Value in Optimizer

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

kminhta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kminhta May 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kminhta commented May 20, 2025

Uh oh!

nisha987 commented May 21, 2025

Uh oh!

MasterSkepticista May 21, 2025

Choose a reason for hiding this comment

Uh oh!

MasterSkepticista May 21, 2025

Choose a reason for hiding this comment

Uh oh!

kminhta May 21, 2025

Choose a reason for hiding this comment

Uh oh!

MasterSkepticista May 21, 2025

Choose a reason for hiding this comment

Uh oh!

MasterSkepticista May 21, 2025

Choose a reason for hiding this comment

Uh oh!

MasterSkepticista May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Incorrect Timing of the `set_old_weights` Call

2. Missing Initial `w_old` Value in Optimizer