Collab API user guide and specification [skip ci]#4196
Open
chesterxgchen wants to merge 11 commits intoNVIDIA:mainfrom
Open
Collab API user guide and specification [skip ci]#4196chesterxgchen wants to merge 11 commits intoNVIDIA:mainfrom
chesterxgchen wants to merge 11 commits intoNVIDIA:mainfrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
CollabRecipe is just another recipe implementing the Recipe API spec. Its parameter details belong in CollabRecipe own documentation, not in this API specification document.
- Fix incorrect dist.broadcast(state_dict) to broadcast per-parameter - Use CollabSimEnv (not SimEnv) consistently in all Collab API examples - SimEnv remains for standard recipe examples (FedAvgRecipe, etc.)
- Fix FedAvgRecipe import: nvflare.app_opt.pt.recipes.fedavg (not app_common.np) - Fix SimEnv import: nvflare.recipe.sim_env (not nvflare.recipe.spec) - Fix dist.broadcast: use net.parameters() with param.data (not state_dict) - Non-existent Recipe methods already removed with Section 7 - CollabRecipe import already consistent (Section 7 removal) - CollabSimEnv already in Key Concepts table
Using Collab API on the server and Client API on the client requires wiring them together with a recipe -- this IS developing a new recipe. There is no distinction between staying with CollabRecipe and building a new recipe. Merged into a single Path A (novel algorithm -> new recipe).
- Add Section 7 with 11 development requirements, implementation order, and open design questions - Additional requirements beyond user's 7: Collab Sim/FLARE Backends, decorator/runtime infrastructure, multi-GPU/DDP support - Clarify Section 5: no auto-matching or auto-detection magic. Collab API usage must be explicitly specified by the user, not inferred.
There is no separate CollabPocEnv/CollabProdEnv. The Collab FLARE Backend IS what makes PocEnv/ProdEnv work with Collab API.
There is no separate "Collab Simulation Backend" -- CollabSimEnv itself IS the simulation backend, just like the Collab FLARE Backend IS what makes PocEnv/ProdEnv Collab-aware (#2). Updated dependencies and implementation order accordingly.
- Unify SimEnv: remove CollabSimEnv as user-facing class. SimEnv selects Collab or standard backend internally based on recipe type. - Fix DDP example: move DDP wrapping outside loop, add device_ids, add comment about load_state_dict after DDP wrap. - Fix PocEnv import path: nvflare.collab.sys -> nvflare.recipe.poc_env - Add __init__ to MyAlgorithm, MyAggregation, SplitLearningServer - Clarify undefined helpers (weighted_avg, SimpleModel, loss, trainer) with comments indicating they are user-defined or computed in omitted code. - Remove undocumented @collab.init from requirement #8 - Fix implementation order: #6 (CollabRecipe) has phased dependencies (sim-only in Phase 2, full in Phase 4) - Simplify collab.clients.execute() call, remove misleading params - Rename Section 3 to 'Hybrid Patterns: Collab Server + Client API' - Add PocEnv import in Section 4.2 Step 4 - Make Execution envs table row explicit in both columns - Add blank line before --- separator after Section 5.4 table - Update Section 5.1: recipe indicates API type (not user choosing separate env classes)
- Harmonize Section 5.2 table header format ('e.g.' in both columns)
- Make loss assignment explicit in standalone function example
Previously addressed (verified):
- MyAggregation.__init__ already added
- _aggregate already has 'user-defined; omitted for brevity' comment
- Phase 2 CollabRecipe already marked as 'initial sim-only'
- collab_api_spec.md: User-facing spec for FL researchers Clean API examples, usage patterns, progression paths, recipe relationships. No bridge internals or implementation details. - collab_api_design_spec.md: Internal design spec for FLARE engineers CollabClientAPI bridge architecture, queue mechanics, contextvars, execution environment backend selection, decorator/runtime infrastructure, development requirements, implementation order, and open design questions. Each document cross-references the other.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add the Collab API (code name Fox) specification as two documents, split by target audience:
docs/design/collab_api_spec.md(436 lines) -- User-facing API spec for FL algorithm researchers. Covers usage patterns, hybrid patterns (Collab server + Client API), progression paths from prototype to production, execution environments, and recipe relationships.docs/design/collab_api_design_spec.md(227 lines) -- Internal design spec for FLARE engineers. Covers CollabClientAPI bridge architecture (queue mechanics, contextvars, subprocess mode), execution environment backend selection internals, decorator/runtime infrastructure, 9 development requirements with phased implementation order, and 3 open design questions.Key design decisions documented
SimEnv: No separateCollabSimEnv-- the Collab simulation backend is an internal detail; users write the sameSimEnv/PocEnvfor all recipes@collab.publish(in-process, fast prototyping) and Client API viatrain_script(multi-GPU, production)Open design questions
Test plan
nvflare.collab.sys.recipe,nvflare.recipe.sim_env,nvflare.app_opt.pt.recipes.fedavg)