Skip to content

Commit 39a9763

Browse files
[Doc] How-to: Doc update part 2 [skip-ci] (#3931)
## Description add new how-to guide documentation structure add how-to: develop to include different FLARE API, add FLARE API evolution documentation page add how to calculate Fed Stats add how to calculate Convert ML/FL to FL add how to do simulations add production section ( deployments in aws, azure, monitoring, interaction with Flare) ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated.
1 parent 8387440 commit 39a9763

File tree

14 files changed

+2554
-69
lines changed

14 files changed

+2554
-69
lines changed

docs/how-to-guide/develop.rst

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
.. _how_to_develop_guide:
2+
3+
####################
4+
How To Develop Guide
5+
####################
6+
7+
How to install and turn existing stand-alone, centralized applications into federated learning applications.
8+
9+
Installation Guide
10+
------------------
11+
.. toctree::
12+
:maxdepth: 1
13+
14+
../installation
15+
16+
Federated Application
17+
---------------------
18+
19+
.. toctree::
20+
:maxdepth: 1
21+
:glob:
22+
23+
develop/*
24+
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
.. _diff_api_guide:
2+
3+
########################
4+
How to Use FLARE APIs
5+
########################
6+
7+
When getting started with NVIDIA FLARE, one of the first decisions new users face is choosing which APIs to use.
8+
FLARE provides multiple API layers that have evolved over time to support different levels of abstraction—from
9+
high-level, ready-to-use workflows for common federated learning and analytics tasks, to lower-level, highly
10+
customizable APIs for advanced control over execution, orchestration, and security. These APIs are reflected
11+
across different FLARE examples, which can sometimes be confusing for new users when deciding where to start.
12+
This guide helps clarify the evolution of FLARE APIs and provides guidance on selecting the most appropriate
13+
API for your use case and development goals.
14+
15+
The newer FLARE APIs—Client API, Job Recipe API, and Collab API (coming soon)—represent the latest stage in the
16+
evolution of the platform. They are designed primarily for data scientists and researchers, providing a simplified
17+
and intuitive interface that is sufficient for most common federated learning and federated analytics use cases.
18+
In contrast, the Controller/Executor APIs operate at a lower level and are intended for system integration,
19+
advanced customization, and platform-level extensions, where fine-grained control over execution flow, policies,
20+
and orchestration is required.
21+
22+
23+
Evolution of FLARE APIs
24+
=======================
25+
26+
Before deciding which API layer to use, it helps to understand the available options.
27+
The diagrams below provide an overview. For a detailed history, see :ref:`api_evolution`.
28+
29+
Server-side APIs
30+
----------------
31+
32+
.. image:: ../../resources/server_side_apis.jpg
33+
:height: 400
34+
35+
Client-side APIs
36+
----------------
37+
38+
.. image:: ../../resources/client_side_apis.jpg
39+
:height: 400
40+
41+
Client-Server Wiring APIs
42+
-------------------------
43+
44+
.. image:: ../../resources/client_server_wiring_apis.jpg
45+
:height: 400
46+
47+
48+
Which APIs to Use?
49+
==================
50+
51+
We recommend using the following APIs depending on your role:
52+
53+
Applied Data Scientists
54+
-----------------------
55+
56+
For users focused on applying FL to their ML workflows with minimal complexity:
57+
58+
- **Client**: Client API
59+
- **Server**: Built-in algorithm (FedAvg, FedProx, etc.)
60+
- **Wiring**: Job Recipe with built-in FL algorithms
61+
62+
FL Researchers
63+
--------------
64+
65+
For users developing new FL algorithms or customizing training logic:
66+
67+
- **Client**: Collab API, Client API
68+
- **Server**: Collab API
69+
- **Wiring**: Job Recipe
70+
71+
System Integrators
72+
------------------
73+
74+
For users building custom integrations or extending the platform:
75+
76+
- **Client**: Collab API, Executor API
77+
- **Server**: Collab API, Controller API
78+
- **Wiring**: Job Recipe
79+
80+
81+
Deprecated APIs
82+
===============
83+
84+
The following APIs are deprecated and should be avoided in new projects:
85+
86+
- **LearnerExecutor and Learner**: Use Client API instead
87+
- **ModelController**: Will be superseded by Collab API
88+
- **Job Template & CLI Job Template API**: Use Job Recipe API instead
89+
90+
91+
API References
92+
==============
93+
94+
For detailed documentation on each API:
95+
96+
- :ref:`client_api` - Client-side API for ML code integration
97+
- :ref:`job_recipe` - Programmatic job definition
98+
- :ref:`api_evolution` - Complete API evolution history
99+
- Collab API - Coming soon
100+
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
.. _dl_to_fl_guide:
2+
3+
##################################################
4+
How to Convert Deep Learning to Federated Learning
5+
##################################################
6+
7+
This guide uses deep learning code as an example. Other traditional ML examples can be found in the
8+
hello-world example series. Assuming the dataset is ready for training, converting an existing stand-alone,
9+
centralized deep learning script to federated learning with NVIDIA FLARE involves the following steps:
10+
11+
- **Step 1**: Decide what type of training workflow to use: round-robin (cyclic weight transfer) or scatter-and-gather (weighted
12+
average). For example, we can use federated weighted averaging (FedAvg). This determines what server-side
13+
code to run.
14+
15+
- **Step 2**: Convert the training scripts into client-side local training scripts so that they can receive a global model,
16+
perform local training, and send the newly updated local model back to the server.
17+
18+
- **Step 3**: Connect the server aggregation algorithm (FedAvg) and client training code together to perform the
19+
federated learning job.
20+
21+
22+
NVIDIA FLARE provides a set of API stacks for data scientists to perform the above steps:
23+
24+
- **Step 1**: Data scientists can use existing federated learning algorithms such as FedAvg. For custom
25+
algorithms, FLARE offers the Collab API to simplify development (coming soon). The ModelController API
26+
is also available for advanced customization.
27+
28+
- **Step 2**: You can use the FLARE Client API to convert DL to FL with just a few lines of code changes.
29+
30+
- **Step 3**: We have the Job Recipe API that connects steps 1 and 2. For example, ``FedAvgRecipe``.
31+
32+
Here are some code snippets from the :ref:`hello_pt` example. See the complete code and description there.
33+
34+
Client Code with Client API
35+
---------------------------
36+
37+
On the client side, the training workflow is:
38+
39+
1. Receive the model from the FL server
40+
2. Perform local training on the received global model and/or evaluate it for model selection
41+
3. Send the updated model back to the FL server
42+
43+
The client code (``client.py``) implements this workflow. The training code is almost identical to
44+
standard PyTorch training code—the only difference is a few lines to receive and send data to the server.
45+
46+
Using NVIDIA FLARE's Client API, you can easily adapt centralized ML code for federated scenarios.
47+
The three essential methods are:
48+
49+
- ``init()``: Initializes the NVIDIA FLARE Client API environment.
50+
- ``receive()``: Receives a model from the FL server.
51+
- ``send()``: Sends the model to the FL server.
52+
53+
With these simple methods, developers can use the Client API
54+
to change their centralized training code to an FL scenario with
55+
five lines of code changes, as shown below:
56+
57+
.. code-block:: python
58+
59+
import nvflare.client as flare
60+
61+
flare.init() # 1. Initialize FLARE Client API
62+
input_model = flare.receive() # 2. Receive model from FL server
63+
params = input_model.params # 3. Extract model parameters
64+
65+
# Original local training code
66+
new_params = local_train(params)
67+
68+
output_model = flare.FLModel(params=new_params) # 4. Wrap results in FLModel
69+
flare.send(output_model) # 5. Send model to FL server
70+
71+
72+
Job Recipe
73+
----------
74+
75+
The Job Recipe connects the client training script with the built-in federated averaging algorithm:
76+
77+
.. code-block:: python
78+
79+
from nvflare.job_config.fed_avg_recipe import FedAvgRecipe
80+
from nvflare.simulation import SimEnv
81+
82+
recipe = FedAvgRecipe(
83+
name="hello-pt",
84+
min_clients=n_clients,
85+
num_rounds=num_rounds,
86+
initial_model=SimpleNetwork(),
87+
train_script="client.py",
88+
train_args=f"--batch_size {batch_size}",
89+
)
90+
91+
env = SimEnv(num_clients=n_clients, num_threads=n_clients)
92+
recipe.execute(env=env)
93+
94+
Save this code to ``job.py``.
95+
96+
Run the Job
97+
-----------
98+
99+
From the terminal, run the job script to execute in a simulation environment:
100+
101+
.. code-block:: bash
102+
103+
python job.py
104+
105+
106+
References
107+
----------
108+
109+
- :ref:`hello_pt` - Complete example with full source code
110+
- :ref:`client_api` - Detailed Client API documentation
111+
- :ref:`job_recipe` - Job Recipe API documentation
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
.. _fed_analytics_guide:
2+
3+
####################################
4+
How to Calculate Federated Analytics
5+
####################################
6+
7+
NVIDIA FLARE enables collaborative data analysis across multiple sites without sharing raw data. Federated Analytics
8+
focuses on computing global statistics (such as counts, distributions, and means) by aggregating local analytics results
9+
computed at each participant.
10+
11+
When to Use Federated Analytics
12+
================================
13+
14+
Use Federated Analytics when you want to:
15+
16+
- Understand data distribution and quality across institutions
17+
- Perform cohort discovery or feasibility analysis
18+
- Validate dataset compatibility before federated training
19+
20+
Common outputs include:
21+
22+
- Counts and histograms
23+
- Summary statistics (mean, sum, standard deviation)
24+
- Label or class distributions
25+
26+
27+
Overview
28+
========
29+
30+
NVIDIA FLARE provides built-in federated statistics operators that generate global statistics based on local
31+
client-side statistics. At each client site, you can have one or more datasets (such as "train" and "test"),
32+
and each dataset may have many features. For each feature, the system calculates local statistics and then
33+
combines them to produce global statistics for all numeric features. The output includes complete statistics
34+
for all datasets across all clients, as well as global aggregates.
35+
36+
The supported statistics include commonly used measures:
37+
38+
- **count** - Number of samples
39+
- **sum** - Sum of values
40+
- **mean** - Average value (calculated from count and sum if both are selected)
41+
- **stddev** - Standard deviation
42+
- **histogram** - Distribution of values across bins
43+
- **quantile** - Percentile values (requires additional dependency)
44+
45+
.. note::
46+
47+
We do not include min and max values to avoid data privacy concerns.
48+
Only numerical features are supported; non-numerical features will be removed automatically.
49+
50+
51+
Steps to Implement
52+
==================
53+
54+
1. Provide the target data source names (such as "train", "test") and the feature names
55+
2. Configure the target statistics metrics and output location
56+
3. Implement the local statistics generator using the ``Statistics`` spec and configure the client-side data input location
57+
58+
For detailed instructions, see the :ref:`hello_tabular_stats` example.
59+
60+
Example Configuration
61+
---------------------
62+
63+
Here is an example statistics configuration:
64+
65+
.. code-block:: text
66+
67+
statistic_configs = {
68+
"count": {},
69+
"mean": {},
70+
"sum": {},
71+
"stddev": {},
72+
"histogram": {"*": {"bins": 20}, "Age": {"bins": 20, "range": [0, 100]}},
73+
"quantile": {"*": [0.1, 0.5, 0.9]},
74+
}
75+
76+
This configuration specifies:
77+
78+
- Calculate count, mean, sum, and stddev for all features
79+
- For histograms, use 20 bins with auto-calculated range, except "Age" which uses a fixed range of 0–100
80+
- For quantiles, calculate the 10%, 50% (median), and 90% percentiles for all features
81+
82+
For tabular data, many required functions are implemented in ``DFStatisticsCore``, so data scientists
83+
only need to provide the data loader and configure the Job Recipe.
84+
85+
86+
Client Code
87+
-----------
88+
89+
The local statistics generator ``AdultStatistics`` implements the ``Statistics`` spec:
90+
91+
.. literalinclude:: ../../../examples/hello-world/hello-tabular-stats/client.py
92+
:language: python
93+
:linenos:
94+
:caption: client.py
95+
:lines: 14-
96+
97+
The ``AdultStatistics`` class extends ``DFStatisticsCore`` and provides:
98+
99+
- ``data_features``: Array of feature names
100+
- ``load_data()``: Returns a dictionary of Pandas DataFrames (one per data source)
101+
- ``data_path``: Path in the format ``<data_root_dir>/<site-name>/<filename>``
102+
103+
104+
Job Recipe
105+
----------
106+
107+
The job is defined via a recipe and runs in the simulation environment:
108+
109+
.. literalinclude:: ../../../examples/hello-world/hello-tabular-stats/job.py
110+
:language: python
111+
:linenos:
112+
:caption: job.py
113+
:lines: 14-
114+
115+
116+
Run the Job
117+
-----------
118+
119+
From the terminal, run the job script:
120+
121+
.. code-block:: bash
122+
123+
python job.py
124+
125+
126+
References
127+
----------
128+
129+
- :ref:`hello_tabular_stats` - Complete tabular statistics example
130+
- :ref:`federated_statistics` - Detailed federated statistics documentation

0 commit comments

Comments
 (0)