[Research] Update fedumm by ZiyueXu77 · Pull Request #4390 · NVIDIA/NVFlare

ZiyueXu77 · 2026-04-01T19:12:41Z

Fixes # .

Description

Simplify the whole example to align with the paper experiment itself, remove JanusPro which is not mentioned in the paper
Use recipe beyond job
Add TB record
code restructure

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Quick tests passed locally by running ./runtest.sh.
In-line docstrings updated.
Documentation updated.

Copilot

Pull request overview

This PR streamlines the research/fedumm example to match the FedUMM paper’s BLIP-focused experiment flow, migrating the simulator job to NVFlare’s FedAvgRecipe API and adding TensorBoard logging, while removing the JanusPro backend and related env/scripts.

Changes:

Remove JanusPro backend and multi-env launch scripts; simplify backend registration around BLIP-VQA only.
Switch job.py to FedAvgRecipe + SimEnv execution and add experiment tracking.
Add step-level training logging + TensorBoard scalar recording in the shared training loop and client/baseline scripts.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
research/fedumm/src/model_registry.py	Simplifies registry module imports/docs.
research/fedumm/src/januspro_backend.py	Removes JanusPro backend implementation.
research/fedumm/src/common.py	Adds logging/TensorBoard hooks to training loop; formatting cleanup.
research/fedumm/src/blip_backend.py	Adds import fallback and introduces `BLIPLoRAModel` for recipe model init.
research/fedumm/src/init.py	Removes auto-registration behavior.
research/fedumm/scripts/slurm_run.sh	Removes SLURM runner script.
research/fedumm/scripts/setup_envs.sh	Removes conda env setup helper.
research/fedumm/scripts/launch_januspro.sh	Removes JanusPro env wrapper script.
research/fedumm/scripts/launch_blip.sh	Removes BLIP env wrapper script.
research/fedumm/requirements.txt	Pins datasets and adds TensorBoard/scipy dependencies.
research/fedumm/README.md	Rewrites README to the simplified simulator-based workflow.
research/fedumm/job.py	Replaces `FedJob` config with `FedAvgRecipe` + TensorBoard tracking.
research/fedumm/envs/env_januspro.yml	Removes JanusPro conda env file.
research/fedumm/envs/env_blip.yml	Removes BLIP conda env file.
research/fedumm/client.py	Simplifies to BLIP-only client; adds TensorBoard logging.
research/fedumm/centralized_baseline.py	Simplifies to BLIP-only baseline; adds TensorBoard logging.

Comments suppressed due to low confidence (3)

research/fedumm/client.py:87

load_dataset(..., trust_remote_code=True) enables execution of arbitrary code from the dataset repository. If this isn’t strictly required for HuggingFaceM4/VQAv2, it should be removed; otherwise consider gating it behind an explicit CLI flag / environment variable and defaulting to False to reduce the security risk.
research/fedumm/centralized_baseline.py:60
load_dataset(..., trust_remote_code=True) enables execution of arbitrary code from the dataset repository. If this isn’t strictly required for HuggingFaceM4/VQAv2, it should be removed; otherwise gate it behind an explicit opt-in flag to reduce the security risk.
research/fedumm/client.py:97
The job enables TensorBoard tracking via add_experiment_tracking(...), but the client uses torch.utils.tensorboard.SummaryWriter, which won’t integrate with NVFlare’s tracking pipeline (and may write to colliding default runs/ directories across simulated clients). Consider using nvflare.client.tracking.SummaryWriter or explicitly setting a per-site log_dir and closing the writer on shutdown.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

research/fedumm/job.py

research/fedumm/src/blip_backend.py

greptile-apps · 2026-04-01T19:17:33Z

Greptile Summary

This PR simplifies the FedUMM research example to align with the paper: JanusPro is removed, the implementation is hardcoded to the BLIP-VQA backend, FedAvgRecipe replaces the manual job setup, and TensorBoard logging is added to both the FL client and the centralized baseline. Prior review concerns (missing cur_round guard, writer.close(), and save_pretrained for LoRA sub-modules) have all been addressed.

Confidence Score: 4/5

PR is safe to merge after confirming whether the per-round optimizer reset is intentional.

All prior P1 concerns have been addressed. One new P1 flag — optimizer state discarded each FL round — may be the intended FedAvg behavior but is undocumented and worth confirming with the author before merge.

research/fedumm/client.py — optimizer instantiation inside the FL loop (lines 155–160).

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename	Overview
research/fedumm/client.py	FL client for federated BLIP-VQA fine-tuning; hardcoded to blip_vqa backend, TensorBoard added, cur_round fallback fixed. Minor: optimizer re-created each round (possibly intentional), SummaryWriter initialized slightly before site name is available.
research/fedumm/job.py	Switched to FedAvgRecipe-based job; model_name_or_path is conditionally forwarded to client scripts; clean refactor.
research/fedumm/centralized_baseline.py	Centralized baseline now saves each PEFT sub-module individually and includes a TensorBoard writer; looks correct.
research/fedumm/src/common.py	Shared helpers; empty-dataloader guard raises ValueError as per custom rule; Dirichlet partition logic unchanged and correct.
research/fedumm/src/blip_backend.py	BLIP-VQA backend including BLIPLoRAModel server wrapper; LoRA applied to text_encoder/text_decoder; evaluate raises on empty loader.
research/fedumm/README.md	README significantly simplified to match paper scope; JanusPro removed; setup instructions updated.

Sequence Diagram

sequenceDiagram
    participant job.py
    participant Server (FedAvgRecipe)
    participant client.py (site-N)

    job.py->>Server (FedAvgRecipe): FedAvgRecipe.execute(SimEnv)
    Server (FedAvgRecipe)->>Server (FedAvgRecipe): BLIPLoRAModel init (LoRA on CPU)
    loop num_rounds
        Server (FedAvgRecipe)->>client.py (site-N): flare.send(FLModel with LoRA params)
        client.py (site-N)->>client.py (site-N): load_trainable_params(model, params)
        client.py (site-N)->>client.py (site-N): train_one_epoch (local_epochs) + TensorBoard
        client.py (site-N)->>client.py (site-N): backend.evaluate → val acc + TensorBoard
        client.py (site-N)->>Server (FedAvgRecipe): flare.send(FLModel with LoRA updates + metrics)
        Server (FedAvgRecipe)->>Server (FedAvgRecipe): FedAvg aggregate LoRA deltas
    end
    Server (FedAvgRecipe)->>job.py: run.get_result()

_{Reviews (8): Last reviewed commit: "Merge branch 'main' into fed_umm" | Re-trigger Greptile}

research/fedumm/client.py

research/fedumm/centralized_baseline.py

ZiyueXu77 · 2026-04-01T20:00:35Z

/build

research/fedumm/requirements.txt

research/fedumm/src/blip_backend.py

research/fedumm/client.py

research/fedumm/job.py

ZiyueXu77 · 2026-04-01T20:53:22Z

/build

holgerroth · 2026-04-02T15:06:01Z

@greptileai review the latest changes.

holgerroth · 2026-04-02T15:13:08Z

/build

holgerroth

Sensible changes that simplify the example.

ZiyueXu77 · 2026-04-09T19:03:28Z

/build

greptile-apps · 2026-04-09T19:05:10Z

Tip:

Greploop — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.

Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

update fedumm example

7febbf4

Copilot AI review requested due to automatic review settings April 1, 2026 19:12

Merge branch 'main' into fed_umm

93f9903

Copilot started reviewing on behalf of ZiyueXu77 April 1, 2026 19:13 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

research/fedumm/job.py Show resolved Hide resolved

research/fedumm/src/blip_backend.py Show resolved Hide resolved

greptile-apps bot reviewed Apr 1, 2026

View reviewed changes

research/fedumm/client.py Outdated Show resolved Hide resolved

research/fedumm/client.py Show resolved Hide resolved

research/fedumm/centralized_baseline.py Outdated Show resolved Hide resolved

ZiyueXu77 added 2 commits April 1, 2026 15:38

address comments

4f66666

improvements

f6ea2fb

ZiyueXu77 requested a review from holgerroth April 1, 2026 20:00

holgerroth reviewed Apr 1, 2026

View reviewed changes

research/fedumm/requirements.txt Outdated Show resolved Hide resolved

Update nvflare version in requirements.txt

169b382

holgerroth reviewed Apr 1, 2026

View reviewed changes

research/fedumm/src/blip_backend.py Show resolved Hide resolved

holgerroth reviewed Apr 1, 2026

View reviewed changes

research/fedumm/client.py Show resolved Hide resolved

ZiyueXu77 added 3 commits April 1, 2026 16:38

address the model. wrapper

6eba90d

remove experiment_tracking, keep tb local

a32e428

Merge branch 'main' into fed_umm

98fa786

greptile-apps bot reviewed Apr 1, 2026

View reviewed changes

research/fedumm/job.py Show resolved Hide resolved

ZiyueXu77 and others added 3 commits April 1, 2026 17:02

necessary changes for -100 update

ae48926

Merge branch 'main' into fed_umm

e36ab73

Merge branch 'main' into fed_umm

873bd00

holgerroth approved these changes Apr 2, 2026

View reviewed changes

ZiyueXu77 enabled auto-merge (squash) April 2, 2026 15:53

Merge branch 'main' into fed_umm

97028c4

ZiyueXu77 merged commit 8cc09d7 into NVIDIA:main Apr 9, 2026
29 checks passed

Conversation

ZiyueXu77 commented Apr 1, 2026

Description

Types of changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Vulnerabilities

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZiyueXu77 commented Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZiyueXu77 commented Apr 1, 2026

Uh oh!

holgerroth commented Apr 2, 2026

Uh oh!

holgerroth commented Apr 2, 2026

Uh oh!

holgerroth left a comment

Choose a reason for hiding this comment

Uh oh!

ZiyueXu77 commented Apr 9, 2026

Uh oh!

greptile-apps bot commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Apr 1, 2026 •

edited

Loading