Conversation
There was a problem hiding this comment.
Pull request overview
This PR streamlines the research/fedumm example to match the FedUMM paper’s BLIP-focused experiment flow, migrating the simulator job to NVFlare’s FedAvgRecipe API and adding TensorBoard logging, while removing the JanusPro backend and related env/scripts.
Changes:
- Remove JanusPro backend and multi-env launch scripts; simplify backend registration around BLIP-VQA only.
- Switch
job.pytoFedAvgRecipe+SimEnvexecution and add experiment tracking. - Add step-level training logging + TensorBoard scalar recording in the shared training loop and client/baseline scripts.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| research/fedumm/src/model_registry.py | Simplifies registry module imports/docs. |
| research/fedumm/src/januspro_backend.py | Removes JanusPro backend implementation. |
| research/fedumm/src/common.py | Adds logging/TensorBoard hooks to training loop; formatting cleanup. |
| research/fedumm/src/blip_backend.py | Adds import fallback and introduces BLIPLoRAModel for recipe model init. |
| research/fedumm/src/init.py | Removes auto-registration behavior. |
| research/fedumm/scripts/slurm_run.sh | Removes SLURM runner script. |
| research/fedumm/scripts/setup_envs.sh | Removes conda env setup helper. |
| research/fedumm/scripts/launch_januspro.sh | Removes JanusPro env wrapper script. |
| research/fedumm/scripts/launch_blip.sh | Removes BLIP env wrapper script. |
| research/fedumm/requirements.txt | Pins datasets and adds TensorBoard/scipy dependencies. |
| research/fedumm/README.md | Rewrites README to the simplified simulator-based workflow. |
| research/fedumm/job.py | Replaces FedJob config with FedAvgRecipe + TensorBoard tracking. |
| research/fedumm/envs/env_januspro.yml | Removes JanusPro conda env file. |
| research/fedumm/envs/env_blip.yml | Removes BLIP conda env file. |
| research/fedumm/client.py | Simplifies to BLIP-only client; adds TensorBoard logging. |
| research/fedumm/centralized_baseline.py | Simplifies to BLIP-only baseline; adds TensorBoard logging. |
Comments suppressed due to low confidence (3)
research/fedumm/client.py:87
load_dataset(..., trust_remote_code=True)enables execution of arbitrary code from the dataset repository. If this isn’t strictly required forHuggingFaceM4/VQAv2, it should be removed; otherwise consider gating it behind an explicit CLI flag / environment variable and defaulting toFalseto reduce the security risk.
research/fedumm/centralized_baseline.py:60load_dataset(..., trust_remote_code=True)enables execution of arbitrary code from the dataset repository. If this isn’t strictly required forHuggingFaceM4/VQAv2, it should be removed; otherwise gate it behind an explicit opt-in flag to reduce the security risk.
research/fedumm/client.py:97- The job enables TensorBoard tracking via
add_experiment_tracking(...), but the client usestorch.utils.tensorboard.SummaryWriter, which won’t integrate with NVFlare’s tracking pipeline (and may write to colliding defaultruns/directories across simulated clients). Consider usingnvflare.client.tracking.SummaryWriteror explicitly setting a per-sitelog_dirand closing the writer on shutdown.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Greptile SummaryThis PR simplifies the FedUMM research example to align with the paper: JanusPro is removed, the implementation is hardcoded to the BLIP-VQA backend, Confidence Score: 4/5PR is safe to merge after confirming whether the per-round optimizer reset is intentional. All prior P1 concerns have been addressed. One new P1 flag — optimizer state discarded each FL round — may be the intended FedAvg behavior but is undocumented and worth confirming with the author before merge. research/fedumm/client.py — optimizer instantiation inside the FL loop (lines 155–160).
|
| Filename | Overview |
|---|---|
| research/fedumm/client.py | FL client for federated BLIP-VQA fine-tuning; hardcoded to blip_vqa backend, TensorBoard added, cur_round fallback fixed. Minor: optimizer re-created each round (possibly intentional), SummaryWriter initialized slightly before site name is available. |
| research/fedumm/job.py | Switched to FedAvgRecipe-based job; model_name_or_path is conditionally forwarded to client scripts; clean refactor. |
| research/fedumm/centralized_baseline.py | Centralized baseline now saves each PEFT sub-module individually and includes a TensorBoard writer; looks correct. |
| research/fedumm/src/common.py | Shared helpers; empty-dataloader guard raises ValueError as per custom rule; Dirichlet partition logic unchanged and correct. |
| research/fedumm/src/blip_backend.py | BLIP-VQA backend including BLIPLoRAModel server wrapper; LoRA applied to text_encoder/text_decoder; evaluate raises on empty loader. |
| research/fedumm/README.md | README significantly simplified to match paper scope; JanusPro removed; setup instructions updated. |
Sequence Diagram
sequenceDiagram
participant job.py
participant Server (FedAvgRecipe)
participant client.py (site-N)
job.py->>Server (FedAvgRecipe): FedAvgRecipe.execute(SimEnv)
Server (FedAvgRecipe)->>Server (FedAvgRecipe): BLIPLoRAModel init (LoRA on CPU)
loop num_rounds
Server (FedAvgRecipe)->>client.py (site-N): flare.send(FLModel with LoRA params)
client.py (site-N)->>client.py (site-N): load_trainable_params(model, params)
client.py (site-N)->>client.py (site-N): train_one_epoch (local_epochs) + TensorBoard
client.py (site-N)->>client.py (site-N): backend.evaluate → val acc + TensorBoard
client.py (site-N)->>Server (FedAvgRecipe): flare.send(FLModel with LoRA updates + metrics)
Server (FedAvgRecipe)->>Server (FedAvgRecipe): FedAvg aggregate LoRA deltas
end
Server (FedAvgRecipe)->>job.py: run.get_result()
Reviews (8): Last reviewed commit: "Merge branch 'main' into fed_umm" | Re-trigger Greptile
|
/build |
|
/build |
|
@greptileai review the latest changes. |
|
/build |
holgerroth
left a comment
There was a problem hiding this comment.
Sensible changes that simplify the example.
|
/build |
|
Tip: Greploop — Automatically fix all review issues by running Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal. |
Fixes # .
Description
Types of changes
./runtest.sh.