[Research] Update fedumm by ZiyueXu77 · Pull Request #4390 · NVIDIA/NVFlare

ZiyueXu77 · 2026-04-01T19:12:41Z

Fixes # .

Description

Simplify the whole example to align with the paper experiment itself, remove JanusPro which is not mentioned in the paper
Use recipe beyond job
Add TB record
code restructure

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Quick tests passed locally by running ./runtest.sh.
In-line docstrings updated.
Documentation updated.

Copilot

Pull request overview

This PR streamlines the research/fedumm example to match the FedUMM paper’s BLIP-focused experiment flow, migrating the simulator job to NVFlare’s FedAvgRecipe API and adding TensorBoard logging, while removing the JanusPro backend and related env/scripts.

Changes:

Remove JanusPro backend and multi-env launch scripts; simplify backend registration around BLIP-VQA only.
Switch job.py to FedAvgRecipe + SimEnv execution and add experiment tracking.
Add step-level training logging + TensorBoard scalar recording in the shared training loop and client/baseline scripts.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
research/fedumm/src/model_registry.py	Simplifies registry module imports/docs.
research/fedumm/src/januspro_backend.py	Removes JanusPro backend implementation.
research/fedumm/src/common.py	Adds logging/TensorBoard hooks to training loop; formatting cleanup.
research/fedumm/src/blip_backend.py	Adds import fallback and introduces `BLIPLoRAModel` for recipe model init.
research/fedumm/src/init.py	Removes auto-registration behavior.
research/fedumm/scripts/slurm_run.sh	Removes SLURM runner script.
research/fedumm/scripts/setup_envs.sh	Removes conda env setup helper.
research/fedumm/scripts/launch_januspro.sh	Removes JanusPro env wrapper script.
research/fedumm/scripts/launch_blip.sh	Removes BLIP env wrapper script.
research/fedumm/requirements.txt	Pins datasets and adds TensorBoard/scipy dependencies.
research/fedumm/README.md	Rewrites README to the simplified simulator-based workflow.
research/fedumm/job.py	Replaces `FedJob` config with `FedAvgRecipe` + TensorBoard tracking.
research/fedumm/envs/env_januspro.yml	Removes JanusPro conda env file.
research/fedumm/envs/env_blip.yml	Removes BLIP conda env file.
research/fedumm/client.py	Simplifies to BLIP-only client; adds TensorBoard logging.
research/fedumm/centralized_baseline.py	Simplifies to BLIP-only baseline; adds TensorBoard logging.

Comments suppressed due to low confidence (3)

research/fedumm/client.py:87

load_dataset(..., trust_remote_code=True) enables execution of arbitrary code from the dataset repository. If this isn’t strictly required for HuggingFaceM4/VQAv2, it should be removed; otherwise consider gating it behind an explicit CLI flag / environment variable and defaulting to False to reduce the security risk.
research/fedumm/centralized_baseline.py:60
load_dataset(..., trust_remote_code=True) enables execution of arbitrary code from the dataset repository. If this isn’t strictly required for HuggingFaceM4/VQAv2, it should be removed; otherwise gate it behind an explicit opt-in flag to reduce the security risk.
research/fedumm/client.py:97
The job enables TensorBoard tracking via add_experiment_tracking(...), but the client uses torch.utils.tensorboard.SummaryWriter, which won’t integrate with NVFlare’s tracking pipeline (and may write to colliding default runs/ directories across simulated clients). Consider using nvflare.client.tracking.SummaryWriter or explicitly setting a per-site log_dir and closing the writer on shutdown.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

research/fedumm/job.py

research/fedumm/src/blip_backend.py

greptile-apps · 2026-04-01T19:17:33Z

Greptile Summary

This PR simplifies the FedUMM research example to focus solely on the BLIP-VQA backbone used in the paper, removing JanusPro support, migrating from FedJob/FedAvg to FedAvgRecipe, and adding TensorBoard logging throughout. Previous review concerns (null cur_round, missing writer.close(), LoRA sub-module save, model. prefix round-trip) are all addressed.

Confidence Score: 5/5

Safe to merge; all prior P1 issues are resolved and only minor P2 style suggestions remain.
All blocking issues from the previous review round have been addressed. The three remaining findings are P2: a missing lora_dropout in the server model dict (dropout doesn't affect param structure so aggregation is unaffected), an exact datasets pin that could cause install conflicts, and a broad ImportError catch that could obscure dependency errors. None of these block correct FL execution.
research/fedumm/job.py (lora_dropout omission), research/fedumm/requirements.txt (exact datasets pin)

Important Files Changed

Filename	Overview
research/fedumm/job.py	Switched from FedJob+FedAvg to FedAvgRecipe; lora_dropout forwarded to clients via script_args but omitted from server-side BLIPLoRAModel config dict
research/fedumm/client.py	Renamed from src/fl_client.py; hardcoded blip_vqa backend, added TensorBoard logging, fixed cur_round fallback to 0, symmetric model. prefix round-trip, writer.close() added
research/fedumm/centralized_baseline.py	Renamed from src/local_train.py; hardcoded blip_vqa, added TensorBoard writer with explicit log_dir, fixed LoRA save via sub-module save_pretrained, writer.close() added
research/fedumm/src/blip_backend.py	Added decoder_input_ids field, proper label masking for padding tokens, BLIPLoRAModel server wrapper, ValueError on empty eval dataloader
research/fedumm/src/common.py	train_one_epoch extended with TensorBoard writer, prefix, log_interval, and global_step_offset parameters; backward() and optimizer step logic unchanged
research/fedumm/requirements.txt	Bumped nvflare to >=2.7.2, pinned datasets==2.19.2 exactly, added tensorboard and scipy
research/fedumm/README.md	Simplified README to match paper scope; removed JanusPro references and detailed CLI docs, added GPU requirements table

Sequence Diagram

sequenceDiagram
    participant job as job.py (FedAvgRecipe)
    participant server as NVFlare Server (BLIPLoRAModel)
    participant client as client.py × N

    job->>server: init BLIPLoRAModel(lora_r, lora_alpha)
    job->>client: launch with script_args (num_clients, lr, lora_*, …)

    loop FL rounds
        server->>client: FLModel(params={model.* keys})
        client->>client: strip "model." prefix → load_trainable_params
        client->>client: train_one_epoch (TensorBoard loss/step)
        client->>client: evaluate → TensorBoard val/acc
        client->>server: FLModel(params={"model."+k: v}, metrics)
        server->>server: FedAvg aggregate
    end

    server-->>job: run.get_status() / get_result()

_{Reviews (7): Last reviewed commit: "Merge branch 'main' into fed_umm" | Re-trigger Greptile}

research/fedumm/client.py

research/fedumm/centralized_baseline.py

ZiyueXu77 · 2026-04-01T20:00:35Z

/build

research/fedumm/requirements.txt

research/fedumm/src/blip_backend.py

research/fedumm/client.py

research/fedumm/job.py

ZiyueXu77 · 2026-04-01T20:53:22Z

/build

holgerroth · 2026-04-02T15:06:01Z

@greptileai review the latest changes.

holgerroth · 2026-04-02T15:13:08Z

/build

holgerroth

Sensible changes that simplify the example.

update fedumm example

7febbf4

Copilot AI review requested due to automatic review settings April 1, 2026 19:12

Merge branch 'main' into fed_umm

93f9903

Copilot started reviewing on behalf of ZiyueXu77 April 1, 2026 19:13 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

research/fedumm/job.py Show resolved Hide resolved

research/fedumm/src/blip_backend.py Show resolved Hide resolved

greptile-apps bot reviewed Apr 1, 2026

View reviewed changes

research/fedumm/client.py Outdated Show resolved Hide resolved

research/fedumm/client.py Show resolved Hide resolved

research/fedumm/centralized_baseline.py Outdated Show resolved Hide resolved

ZiyueXu77 added 2 commits April 1, 2026 15:38

address comments

4f66666

improvements

f6ea2fb

ZiyueXu77 requested a review from holgerroth April 1, 2026 20:00

holgerroth reviewed Apr 1, 2026

View reviewed changes

research/fedumm/requirements.txt Outdated Show resolved Hide resolved

Update nvflare version in requirements.txt

169b382

holgerroth reviewed Apr 1, 2026

View reviewed changes

research/fedumm/src/blip_backend.py Show resolved Hide resolved

holgerroth reviewed Apr 1, 2026

View reviewed changes

research/fedumm/client.py Show resolved Hide resolved

ZiyueXu77 added 3 commits April 1, 2026 16:38

address the model. wrapper

6eba90d

remove experiment_tracking, keep tb local

a32e428

Merge branch 'main' into fed_umm

98fa786

greptile-apps bot reviewed Apr 1, 2026

View reviewed changes

research/fedumm/job.py Show resolved Hide resolved

ZiyueXu77 and others added 3 commits April 1, 2026 17:02

necessary changes for -100 update

ae48926

Merge branch 'main' into fed_umm

e36ab73

Merge branch 'main' into fed_umm

873bd00

holgerroth approved these changes Apr 2, 2026

View reviewed changes

ZiyueXu77 enabled auto-merge (squash) April 2, 2026 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Research] Update fedumm #4390

[Research] Update fedumm #4390
ZiyueXu77 wants to merge 11 commits intoNVIDIA:mainfrom
ZiyueXu77:fed_umm

ZiyueXu77 commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZiyueXu77 commented Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZiyueXu77 commented Apr 1, 2026

Uh oh!

holgerroth commented Apr 2, 2026

Uh oh!

holgerroth commented Apr 2, 2026

Uh oh!

holgerroth left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ZiyueXu77 commented Apr 1, 2026

Description

Types of changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZiyueXu77 commented Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZiyueXu77 commented Apr 1, 2026

Uh oh!

holgerroth commented Apr 2, 2026

Uh oh!

holgerroth commented Apr 2, 2026

Uh oh!

holgerroth left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Apr 1, 2026 •

edited

Loading