Conversation
There was a problem hiding this comment.
Pull request overview
This PR streamlines the research/fedumm example to match the FedUMM paper’s BLIP-focused experiment flow, migrating the simulator job to NVFlare’s FedAvgRecipe API and adding TensorBoard logging, while removing the JanusPro backend and related env/scripts.
Changes:
- Remove JanusPro backend and multi-env launch scripts; simplify backend registration around BLIP-VQA only.
- Switch
job.pytoFedAvgRecipe+SimEnvexecution and add experiment tracking. - Add step-level training logging + TensorBoard scalar recording in the shared training loop and client/baseline scripts.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| research/fedumm/src/model_registry.py | Simplifies registry module imports/docs. |
| research/fedumm/src/januspro_backend.py | Removes JanusPro backend implementation. |
| research/fedumm/src/common.py | Adds logging/TensorBoard hooks to training loop; formatting cleanup. |
| research/fedumm/src/blip_backend.py | Adds import fallback and introduces BLIPLoRAModel for recipe model init. |
| research/fedumm/src/init.py | Removes auto-registration behavior. |
| research/fedumm/scripts/slurm_run.sh | Removes SLURM runner script. |
| research/fedumm/scripts/setup_envs.sh | Removes conda env setup helper. |
| research/fedumm/scripts/launch_januspro.sh | Removes JanusPro env wrapper script. |
| research/fedumm/scripts/launch_blip.sh | Removes BLIP env wrapper script. |
| research/fedumm/requirements.txt | Pins datasets and adds TensorBoard/scipy dependencies. |
| research/fedumm/README.md | Rewrites README to the simplified simulator-based workflow. |
| research/fedumm/job.py | Replaces FedJob config with FedAvgRecipe + TensorBoard tracking. |
| research/fedumm/envs/env_januspro.yml | Removes JanusPro conda env file. |
| research/fedumm/envs/env_blip.yml | Removes BLIP conda env file. |
| research/fedumm/client.py | Simplifies to BLIP-only client; adds TensorBoard logging. |
| research/fedumm/centralized_baseline.py | Simplifies to BLIP-only baseline; adds TensorBoard logging. |
Comments suppressed due to low confidence (3)
research/fedumm/client.py:87
load_dataset(..., trust_remote_code=True)enables execution of arbitrary code from the dataset repository. If this isn’t strictly required forHuggingFaceM4/VQAv2, it should be removed; otherwise consider gating it behind an explicit CLI flag / environment variable and defaulting toFalseto reduce the security risk.
research/fedumm/centralized_baseline.py:60load_dataset(..., trust_remote_code=True)enables execution of arbitrary code from the dataset repository. If this isn’t strictly required forHuggingFaceM4/VQAv2, it should be removed; otherwise gate it behind an explicit opt-in flag to reduce the security risk.
research/fedumm/client.py:97- The job enables TensorBoard tracking via
add_experiment_tracking(...), but the client usestorch.utils.tensorboard.SummaryWriter, which won’t integrate with NVFlare’s tracking pipeline (and may write to colliding defaultruns/directories across simulated clients). Consider usingnvflare.client.tracking.SummaryWriteror explicitly setting a per-sitelog_dirand closing the writer on shutdown.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Greptile SummaryThis PR simplifies the FedUMM research example to focus solely on the BLIP-VQA backbone used in the paper, removing JanusPro support, migrating from Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant job as job.py (FedAvgRecipe)
participant server as NVFlare Server (BLIPLoRAModel)
participant client as client.py × N
job->>server: init BLIPLoRAModel(lora_r, lora_alpha)
job->>client: launch with script_args (num_clients, lr, lora_*, …)
loop FL rounds
server->>client: FLModel(params={model.* keys})
client->>client: strip "model." prefix → load_trainable_params
client->>client: train_one_epoch (TensorBoard loss/step)
client->>client: evaluate → TensorBoard val/acc
client->>server: FLModel(params={"model."+k: v}, metrics)
server->>server: FedAvg aggregate
end
server-->>job: run.get_status() / get_result()
Reviews (7): Last reviewed commit: "Merge branch 'main' into fed_umm" | Re-trigger Greptile |
|
/build |
|
/build |
|
@greptileai review the latest changes. |
|
/build |
holgerroth
left a comment
There was a problem hiding this comment.
Sensible changes that simplify the example.
Fixes # .
Description
Types of changes
./runtest.sh.