You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 26, 2026. It is now read-only.
Updated the documentation to clarify the architecture and motivation for Propeller's Federated Machine Learning system, enhancing explanations of components and processes.
Copy file name to clipboardExpand all lines: docs/fml.md
+27-29Lines changed: 27 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Federated Machine Learning in Propeller
2
2
3
-
Propeller implements Federated Machine Learning (FML) as a workload-agnostic federated learning framework that enables distributed machine learning training across multiple edge devices without centralizing raw data. This document explains the architecture of Propeller's FML system and how the components interact.
3
+
Propeller implements Federated Machine Learning (FML) as a workload-agnostic federated learning framework that enables distributed machine learning training across multiple edge devices without centralizing raw data. This document explains the motivation for federated learning, the high-level architecture of Propeller's FML system, and how the components interact during a training round.
-**Reduced Server Load**: The central coordinator only aggregates updates, not raw data, significantly reducing computational and storage requirements.
35
35
-**Incremental Learning**: New devices can join the federation without retraining from scratch, and models can be updated incrementally as new data becomes available.
36
36
37
-
## Propeller's FML system Architecture
37
+
## Architecture
38
38
39
39
Propeller's FML system is built on a workload-agnostic design where the core orchestration layer (Manager) has no FL-specific logic. Instead, FL-specific functionality is handled by an external Coordinator service that manages rounds, aggregation, and model versioning. This separation of concerns allows Propeller to support federated learning while remaining flexible enough to orchestrate other types of distributed workloads.
40
40
@@ -48,9 +48,7 @@ Propeller's FML system is built on a workload-agnostic design where the core orc
48
48
49
49
4.**WASM-Based Training**: Training workloads execute as WebAssembly modules, providing portability, security isolation, and consistent execution across different device architectures.
50
50
51
-
### System Architecture Overview
52
-
53
-
The following diagram illustrates the high-level architecture and message flow of Propeller's federated learning system:
51
+
The following diagram illustrates the architecture and message flow of Propeller's federated learning system:
54
52
55
53
```text
56
54
┌──────────────────────┐
@@ -118,7 +116,7 @@ The following diagram illustrates the high-level architecture and message flow o
118
116
119
117
## System Components
120
118
121
-
The FML system consists of the following components that work together to enable federated learning:
119
+
Propeller's FML system consists of the following components that work together to enable federated learning:
122
120
123
121
### Manager Service
124
122
@@ -132,7 +130,7 @@ The Manager is Propeller's core orchestration component. In the context of feder
132
130
133
131
-**Proplet Management**: The Manager maintains awareness of available proplets and their health status, ensuring tasks are only created for active, reachable devices.
134
132
135
-
**Key Design**: The Manager remains completely workload-agnostic. It doesn't understand federated learning semantics, model structures, or aggregation logic. This separation allows the same Manager to orchestrate other types of distributed workloads beyond federated learning.
133
+
**Key Design**: Propeller's Manager remains completely workload-agnostic. It doesn't understand federated learning semantics, model structures, or aggregation logic. This separation allows Propeller's Manager to orchestrate other types of distributed workloads beyond federated learning.
136
134
137
135
### FML Coordinator
138
136
@@ -204,7 +202,7 @@ The Client WASM module is the portable training workload that runs on each propl
204
202
205
203
-**Output Generation**: The module outputs a JSON-formatted update message to stdout, which is captured by the proplet runtime and submitted to the Coordinator.
206
204
207
-
**Portability**: Because the training logic is compiled to WebAssembly, the same WASM module can run on different proplet types (Rust or embedded) without modification, as long as the data access interface (environment variables or host functions) is consistent.
205
+
**Portability**: In Propeller, because the training logic is compiled to WebAssembly, the same WASM module can run on different proplet types (Rust or embedded) without modification, as long as the data access interface (environment variables or host functions) is consistent.
208
206
209
207
### SuperMQ MQTT Infrastructure
210
208
@@ -220,9 +218,9 @@ SuperMQ provides the underlying MQTT messaging infrastructure that enables async
220
218
221
219
-**Quality of Service**: MQTT's QoS levels ensure reliable message delivery even in unreliable network conditions, critical for distributed edge deployments.
222
220
223
-
## Propeller Training Round Lifecycle
221
+
## Training Round Lifecycle
224
222
225
-
The following diagram shows the complete message flow in Propeller during a federated learning round:
223
+
The following diagram shows the complete message flow during a federated learning round:
226
224
227
225
```text
228
226
1. Round Start
@@ -343,11 +341,11 @@ The Coordinator handles round completion:
343
341
344
342
## Communication Patterns
345
343
346
-
The FML system uses several communication patterns to coordinate distributed training:
344
+
Propeller's FML system uses several communication patterns to coordinate distributed training:
347
345
348
346
### Communication Flow
349
347
350
-
The system combines MQTT publish-subscribe and HTTP request-response patterns:
348
+
Propeller combines MQTT publish-subscribe and HTTP request-response patterns:
351
349
352
350
```text
353
351
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
@@ -401,7 +399,7 @@ Some interactions use HTTP for direct, synchronous communication:
401
399
402
400
### Hybrid Approach
403
401
404
-
The system uses a hybrid approach that combines the strengths of both patterns:
402
+
Propeller uses a hybrid approach that combines the strengths of both patterns:
405
403
406
404
-**MQTT for Orchestration**: MQTT's asynchronous, topic-based routing is ideal for coordinating distributed rounds across many devices.
407
405
@@ -411,7 +409,7 @@ The system uses a hybrid approach that combines the strengths of both patterns:
411
409
412
410
## Model Lifecycle and Versioning
413
411
414
-
Models in the FML system progress through versions as training rounds complete:
412
+
Models in Propeller's FML system progress through versions as training rounds complete:
415
413
416
414
### Initial Model
417
415
@@ -447,63 +445,63 @@ Each round incrementally improves the model by incorporating knowledge from part
447
445
448
446
The Model Registry maintains a version history, allowing:
449
447
450
-
-**Rollback**: If a new model version performs poorly, the system can roll back to a previous version.
448
+
-**Rollback**: If a new model version performs poorly, Propeller can roll back to a previous version.
451
449
452
450
-**Analysis**: Researchers and operators can compare model versions to understand how the model evolved over time.
453
451
454
452
-**Reproducibility**: Specific model versions can be referenced and reproduced for testing and validation.
455
453
456
454
## Scalability and Performance Considerations
457
455
458
-
The FML architecture is designed to scale across several dimensions:
456
+
Propeller's FML architecture is designed to scale across several dimensions:
459
457
460
458
### Horizontal Scaling
461
459
462
-
-**Multiple Proplets**: The system naturally scales to support hundreds or thousands of proplets participating in a single round. The Manager can create tasks for all participants in parallel.
460
+
-**Multiple Proplets**: Propeller naturally scales to support hundreds or thousands of proplets participating in a single round. The Manager can create tasks for all participants in parallel.
463
461
464
-
-**Multiple Coordinators**: While the current implementation uses a single Coordinator, the architecture supports multiple Coordinators with consistent hashing or round assignment to distribute load.
462
+
-**Multiple Coordinators**: While Propeller's current implementation uses a single Coordinator, the architecture supports multiple Coordinators with consistent hashing or round assignment to distribute load.
465
463
466
464
-**Distributed Model Registry**: The Model Registry can be replicated or sharded to handle high request volumes from many proplets fetching models simultaneously.
467
465
468
466
### Network Efficiency
469
467
470
-
-**Chunked Transport**: Large model artifacts are automatically chunked for efficient MQTT transport, allowing models to be distributed even over bandwidth-constrained networks.
468
+
-**Chunked Transport**: Propeller automatically chunks large model artifacts for efficient MQTT transport, allowing models to be distributed even over bandwidth-constrained networks.
471
469
472
-
-**Retained Messages**: MQTT retained messages allow proplets to immediately receive the latest model when they subscribe, reducing latency and avoiding missed updates.
470
+
-**Retained Messages**: Propeller uses MQTT retained messages to allow proplets to immediately receive the latest model when they subscribe, reducing latency and avoiding missed updates.
473
471
474
-
-**Asynchronous Communication**: MQTT's asynchronous nature allows proplets to submit updates without blocking, and the Coordinator can process updates as they arrive.
472
+
-**Asynchronous Communication**: Propeller leverages MQTT's asynchronous nature to allow proplets to submit updates without blocking, and the Coordinator can process updates as they arrive.
475
473
476
474
### Fault Tolerance
477
475
478
-
-**Timeout Handling**: Rounds complete even if some proplets fail to submit updates, ensuring progress despite device failures or network issues.
476
+
-**Timeout Handling**: Propeller ensures rounds complete even if some proplets fail to submit updates, ensuring progress despite device failures or network issues.
479
477
480
478
-**Update Thresholds**: The k-of-n parameter allows rounds to complete with a subset of participants, providing resilience to device failures.
481
479
482
-
-**Fallback Mechanisms**: Proplets can fall back from HTTP to MQTT if network conditions degrade, ensuring updates are delivered even in challenging network environments.
480
+
-**Fallback Mechanisms**: Propeller's proplets can fall back from HTTP to MQTT if network conditions degrade, ensuring updates are delivered even in challenging network environments.
483
481
484
482
## Security and Privacy
485
483
486
-
Federated learning inherently provides privacy benefits, but the system includes additional security considerations:
484
+
Federated learning inherently provides privacy benefits, but Propeller includes additional security considerations:
487
485
488
486
### Data Privacy
489
487
490
488
-**No Raw Data Transmission**: Only model weight updates are transmitted, never raw training data. This provides strong privacy guarantees even if messages are intercepted.
491
489
492
-
-**Local Training**: All training happens on-device within the WASM sandbox, ensuring that raw data never leaves the device's secure execution environment.
490
+
-**Local Training**: In Propeller, all training happens on-device within the WASM sandbox, ensuring that raw data never leaves the device's secure execution environment.
493
491
494
-
-**Isolated Execution**: WASM's sandboxing provides isolation between the training workload and the proplet's host system, preventing data leakage through side channels.
492
+
-**Isolated Execution**: Propeller leverages WASM's sandboxing to provide isolation between the training workload and the proplet's host system, preventing data leakage through side channels.
495
493
496
494
### Communication Security
497
495
498
-
-**SuperMQ Authentication**: All MQTT communication is authenticated via SuperMQ's client authentication system, ensuring only authorized components can participate.
496
+
-**SuperMQ Authentication**: In Propeller, all MQTT communication is authenticated via SuperMQ's client authentication system, ensuring only authorized components can participate.
499
497
500
498
-**Encrypted Transport**: MQTT connections can use TLS to encrypt messages in transit, protecting updates from interception or tampering.
501
499
502
-
-**Topic Access Control**: SuperMQ's topic-based access control ensures that proplets can only publish to their designated update topics and cannot access other proplets' updates.
500
+
-**Topic Access Control**: Propeller uses SuperMQ's topic-based access control to ensure that proplets can only publish to their designated update topics and cannot access other proplets' updates.
503
501
504
502
### Model Security
505
503
506
-
-**Model Integrity**: Model versions are cryptographically hashed and versioned, allowing detection of tampering or corruption.
504
+
-**Model Integrity**: Propeller cryptographically hashes and versions model versions, allowing detection of tampering or corruption.
507
505
508
506
-**Access Control**: The Model Registry can implement access control to ensure only authorized proplets can fetch models.
0 commit comments