-
Notifications
You must be signed in to change notification settings - Fork 233
Convert job to recipe for LLM_HF example #3888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Greptile SummaryThis PR successfully converts the LLM HuggingFace example from the low-level Key Changes:
Benefits:
Testing considerations:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant job.py
participant FedAvgRecipe
participant SimEnv/ProdEnv
participant FL_Server
participant FL_Client
participant client.py
User->>job.py: python job.py --client_ids dolly --data_path ./dataset
job.py->>job.py: Parse arguments and setup per_site_config
job.py->>FedAvgRecipe: Create recipe with initial_model, per_site_config
FedAvgRecipe->>FedAvgRecipe: Configure FedAvg controller, model persistor
job.py->>FedAvgRecipe: Add quantization filters (if enabled)
job.py->>FedAvgRecipe: Add client timeouts and wrapper script (multi-node)
job.py->>FedAvgRecipe: recipe.export(job_dir)
job.py->>SimEnv/ProdEnv: Create execution environment
job.py->>FedAvgRecipe: recipe.execute(env)
FedAvgRecipe->>FL_Server: Start FL server
FedAvgRecipe->>FL_Client: Start FL client with per_site_config
loop Each FL Round
FL_Server->>FL_Client: Send global model
FL_Client->>client.py: Launch training script (via wrapper if multi-node)
client.py->>client.py: flare.init(rank=rank)
client.py->>FL_Client: flare.receive() - get global model
client.py->>client.py: Load global model weights
client.py->>client.py: Evaluate global model
client.py->>client.py: Train locally (SFTTrainer)
client.py->>client.py: Extract updated weights
client.py->>FL_Client: flare.send(output_model)
FL_Client->>FL_Server: Send trained model
FL_Server->>FL_Server: Aggregate models (FedAvg)
end
FL_Server->>job.py: Training complete
job.py->>User: Print job status and results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
|
convert to draft, wait for per-site config feature |
|
@ZiyueXu77 what happens to this example, why this is still in draft status 1 month later ? |
|
It is dependent on yuanting’s multigpu/node functionality to be updated, so has been waiting for that
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Chester Chen ***@***.***>
Sent: Saturday, January 10, 2026 9:15:31 PM
To: NVIDIA/NVFlare ***@***.***>
Cc: Ziyue Xu ***@***.***>; Mention ***@***.***>
Subject: Re: [NVIDIA/NVFlare] Convert job to recipe for LLM_HF example (PR #3888)
[https://avatars.githubusercontent.com/u/512707?s=20&v=4]chesterxgchen left a comment (NVIDIA/NVFlare#3888)<#3888 (comment)>
@ZiyueXu77<https://github.com/ZiyueXu77> what happens to this example, why this is still in draft status 1 month later ?
—
Reply to this email directly, view it on GitHub<#3888 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ARDWAT3BYONBFKK5XSFTACL4GGW4HAVCNFSM6AAAAACOYLRF6SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTOMZTHAZTGNRYGA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 1 comment
YuanTingHsieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have some questions regarding how to set client ids and site names,
otherwise LGTM
holgerroth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I made some minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 files reviewed, 2 comments
|
/build |
|
/build |
Fixes # .
Description
From job api to recipe
Types of changes
./runtest.sh.