Skip to content
This repository was archived by the owner on Feb 26, 2026. It is now read-only.

Commit 5fd5381

Browse files
authored
Revise FML documentation for clarity and detail
Updated the documentation to clarify the architecture and motivation for Propeller's Federated Machine Learning system, enhancing explanations of components and processes.
1 parent 8e62e4c commit 5fd5381

File tree

1 file changed

+27
-29
lines changed

1 file changed

+27
-29
lines changed

docs/fml.md

Lines changed: 27 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Federated Machine Learning in Propeller
22

3-
Propeller implements Federated Machine Learning (FML) as a workload-agnostic federated learning framework that enables distributed machine learning training across multiple edge devices without centralizing raw data. This document explains the architecture of Propeller's FML system and how the components interact.
3+
Propeller implements Federated Machine Learning (FML) as a workload-agnostic federated learning framework that enables distributed machine learning training across multiple edge devices without centralizing raw data. This document explains the motivation for federated learning, the high-level architecture of Propeller's FML system, and how the components interact during a training round.
44

55
## Motivation for Federated Learning
66

@@ -34,7 +34,7 @@ Federated learning provides natural scalability advantages:
3434
- **Reduced Server Load**: The central coordinator only aggregates updates, not raw data, significantly reducing computational and storage requirements.
3535
- **Incremental Learning**: New devices can join the federation without retraining from scratch, and models can be updated incrementally as new data becomes available.
3636

37-
## Propeller's FML system Architecture
37+
## Architecture
3838

3939
Propeller's FML system is built on a workload-agnostic design where the core orchestration layer (Manager) has no FL-specific logic. Instead, FL-specific functionality is handled by an external Coordinator service that manages rounds, aggregation, and model versioning. This separation of concerns allows Propeller to support federated learning while remaining flexible enough to orchestrate other types of distributed workloads.
4040

@@ -48,9 +48,7 @@ Propeller's FML system is built on a workload-agnostic design where the core orc
4848

4949
4. **WASM-Based Training**: Training workloads execute as WebAssembly modules, providing portability, security isolation, and consistent execution across different device architectures.
5050

51-
### System Architecture Overview
52-
53-
The following diagram illustrates the high-level architecture and message flow of Propeller's federated learning system:
51+
The following diagram illustrates the architecture and message flow of Propeller's federated learning system:
5452

5553
```text
5654
┌──────────────────────┐
@@ -118,7 +116,7 @@ The following diagram illustrates the high-level architecture and message flow o
118116

119117
## System Components
120118

121-
The FML system consists of the following components that work together to enable federated learning:
119+
Propeller's FML system consists of the following components that work together to enable federated learning:
122120

123121
### Manager Service
124122

@@ -132,7 +130,7 @@ The Manager is Propeller's core orchestration component. In the context of feder
132130

133131
- **Proplet Management**: The Manager maintains awareness of available proplets and their health status, ensuring tasks are only created for active, reachable devices.
134132

135-
**Key Design**: The Manager remains completely workload-agnostic. It doesn't understand federated learning semantics, model structures, or aggregation logic. This separation allows the same Manager to orchestrate other types of distributed workloads beyond federated learning.
133+
**Key Design**: Propeller's Manager remains completely workload-agnostic. It doesn't understand federated learning semantics, model structures, or aggregation logic. This separation allows Propeller's Manager to orchestrate other types of distributed workloads beyond federated learning.
136134

137135
### FML Coordinator
138136

@@ -204,7 +202,7 @@ The Client WASM module is the portable training workload that runs on each propl
204202

205203
- **Output Generation**: The module outputs a JSON-formatted update message to stdout, which is captured by the proplet runtime and submitted to the Coordinator.
206204

207-
**Portability**: Because the training logic is compiled to WebAssembly, the same WASM module can run on different proplet types (Rust or embedded) without modification, as long as the data access interface (environment variables or host functions) is consistent.
205+
**Portability**: In Propeller, because the training logic is compiled to WebAssembly, the same WASM module can run on different proplet types (Rust or embedded) without modification, as long as the data access interface (environment variables or host functions) is consistent.
208206

209207
### SuperMQ MQTT Infrastructure
210208

@@ -220,9 +218,9 @@ SuperMQ provides the underlying MQTT messaging infrastructure that enables async
220218

221219
- **Quality of Service**: MQTT's QoS levels ensure reliable message delivery even in unreliable network conditions, critical for distributed edge deployments.
222220

223-
## Propeller Training Round Lifecycle
221+
## Training Round Lifecycle
224222

225-
The following diagram shows the complete message flow in Propeller during a federated learning round:
223+
The following diagram shows the complete message flow during a federated learning round:
226224

227225
```text
228226
1. Round Start
@@ -343,11 +341,11 @@ The Coordinator handles round completion:
343341

344342
## Communication Patterns
345343

346-
The FML system uses several communication patterns to coordinate distributed training:
344+
Propeller's FML system uses several communication patterns to coordinate distributed training:
347345

348346
### Communication Flow
349347

350-
The system combines MQTT publish-subscribe and HTTP request-response patterns:
348+
Propeller combines MQTT publish-subscribe and HTTP request-response patterns:
351349

352350
```text
353351
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
@@ -401,7 +399,7 @@ Some interactions use HTTP for direct, synchronous communication:
401399

402400
### Hybrid Approach
403401

404-
The system uses a hybrid approach that combines the strengths of both patterns:
402+
Propeller uses a hybrid approach that combines the strengths of both patterns:
405403

406404
- **MQTT for Orchestration**: MQTT's asynchronous, topic-based routing is ideal for coordinating distributed rounds across many devices.
407405

@@ -411,7 +409,7 @@ The system uses a hybrid approach that combines the strengths of both patterns:
411409

412410
## Model Lifecycle and Versioning
413411

414-
Models in the FML system progress through versions as training rounds complete:
412+
Models in Propeller's FML system progress through versions as training rounds complete:
415413

416414
### Initial Model
417415

@@ -447,63 +445,63 @@ Each round incrementally improves the model by incorporating knowledge from part
447445

448446
The Model Registry maintains a version history, allowing:
449447

450-
- **Rollback**: If a new model version performs poorly, the system can roll back to a previous version.
448+
- **Rollback**: If a new model version performs poorly, Propeller can roll back to a previous version.
451449

452450
- **Analysis**: Researchers and operators can compare model versions to understand how the model evolved over time.
453451

454452
- **Reproducibility**: Specific model versions can be referenced and reproduced for testing and validation.
455453

456454
## Scalability and Performance Considerations
457455

458-
The FML architecture is designed to scale across several dimensions:
456+
Propeller's FML architecture is designed to scale across several dimensions:
459457

460458
### Horizontal Scaling
461459

462-
- **Multiple Proplets**: The system naturally scales to support hundreds or thousands of proplets participating in a single round. The Manager can create tasks for all participants in parallel.
460+
- **Multiple Proplets**: Propeller naturally scales to support hundreds or thousands of proplets participating in a single round. The Manager can create tasks for all participants in parallel.
463461

464-
- **Multiple Coordinators**: While the current implementation uses a single Coordinator, the architecture supports multiple Coordinators with consistent hashing or round assignment to distribute load.
462+
- **Multiple Coordinators**: While Propeller's current implementation uses a single Coordinator, the architecture supports multiple Coordinators with consistent hashing or round assignment to distribute load.
465463

466464
- **Distributed Model Registry**: The Model Registry can be replicated or sharded to handle high request volumes from many proplets fetching models simultaneously.
467465

468466
### Network Efficiency
469467

470-
- **Chunked Transport**: Large model artifacts are automatically chunked for efficient MQTT transport, allowing models to be distributed even over bandwidth-constrained networks.
468+
- **Chunked Transport**: Propeller automatically chunks large model artifacts for efficient MQTT transport, allowing models to be distributed even over bandwidth-constrained networks.
471469

472-
- **Retained Messages**: MQTT retained messages allow proplets to immediately receive the latest model when they subscribe, reducing latency and avoiding missed updates.
470+
- **Retained Messages**: Propeller uses MQTT retained messages to allow proplets to immediately receive the latest model when they subscribe, reducing latency and avoiding missed updates.
473471

474-
- **Asynchronous Communication**: MQTT's asynchronous nature allows proplets to submit updates without blocking, and the Coordinator can process updates as they arrive.
472+
- **Asynchronous Communication**: Propeller leverages MQTT's asynchronous nature to allow proplets to submit updates without blocking, and the Coordinator can process updates as they arrive.
475473

476474
### Fault Tolerance
477475

478-
- **Timeout Handling**: Rounds complete even if some proplets fail to submit updates, ensuring progress despite device failures or network issues.
476+
- **Timeout Handling**: Propeller ensures rounds complete even if some proplets fail to submit updates, ensuring progress despite device failures or network issues.
479477

480478
- **Update Thresholds**: The k-of-n parameter allows rounds to complete with a subset of participants, providing resilience to device failures.
481479

482-
- **Fallback Mechanisms**: Proplets can fall back from HTTP to MQTT if network conditions degrade, ensuring updates are delivered even in challenging network environments.
480+
- **Fallback Mechanisms**: Propeller's proplets can fall back from HTTP to MQTT if network conditions degrade, ensuring updates are delivered even in challenging network environments.
483481

484482
## Security and Privacy
485483

486-
Federated learning inherently provides privacy benefits, but the system includes additional security considerations:
484+
Federated learning inherently provides privacy benefits, but Propeller includes additional security considerations:
487485

488486
### Data Privacy
489487

490488
- **No Raw Data Transmission**: Only model weight updates are transmitted, never raw training data. This provides strong privacy guarantees even if messages are intercepted.
491489

492-
- **Local Training**: All training happens on-device within the WASM sandbox, ensuring that raw data never leaves the device's secure execution environment.
490+
- **Local Training**: In Propeller, all training happens on-device within the WASM sandbox, ensuring that raw data never leaves the device's secure execution environment.
493491

494-
- **Isolated Execution**: WASM's sandboxing provides isolation between the training workload and the proplet's host system, preventing data leakage through side channels.
492+
- **Isolated Execution**: Propeller leverages WASM's sandboxing to provide isolation between the training workload and the proplet's host system, preventing data leakage through side channels.
495493

496494
### Communication Security
497495

498-
- **SuperMQ Authentication**: All MQTT communication is authenticated via SuperMQ's client authentication system, ensuring only authorized components can participate.
496+
- **SuperMQ Authentication**: In Propeller, all MQTT communication is authenticated via SuperMQ's client authentication system, ensuring only authorized components can participate.
499497

500498
- **Encrypted Transport**: MQTT connections can use TLS to encrypt messages in transit, protecting updates from interception or tampering.
501499

502-
- **Topic Access Control**: SuperMQ's topic-based access control ensures that proplets can only publish to their designated update topics and cannot access other proplets' updates.
500+
- **Topic Access Control**: Propeller uses SuperMQ's topic-based access control to ensure that proplets can only publish to their designated update topics and cannot access other proplets' updates.
503501

504502
### Model Security
505503

506-
- **Model Integrity**: Model versions are cryptographically hashed and versioned, allowing detection of tampering or corruption.
504+
- **Model Integrity**: Propeller cryptographically hashes and versions model versions, allowing detection of tampering or corruption.
507505

508506
- **Access Control**: The Model Registry can implement access control to ensure only authorized proplets can fetch models.
509507

0 commit comments

Comments
 (0)