GitHub - UFFeScience/Flower-PROV: Provenance-based Dynamic Fine-Tuning of Federated Machine Learning Framework

Flower-PROV: Adding Provenance Features in Federated Learning

 ________   __                                                    _______    _______        ___     ____   ____
|_   __  | [  |                                                  |_   __ \  |_   __ \     .'   `.  |_  _| |_  _|
  | |_ \_|  | |    .--.    _   _   __   .---.   _ .--.   ______    | |__) |   | |__) |   /  .-.  \   \ \   / /
  |  _|     | |  / .'`\ \ [ \ [ \ [  ] / /__\\ [ `/'`\] |______|   |  ___/    |  __ /    | |   | |    \ \ / /
 _| |_      | |  | \__. |  \ \/\ \/ /  | \__.,  | |               _| |_      _| |  \ \_  \  `-'  /     \ ' /
|_____|    [___]  '.__.'    \__/\__/    '.__.' [___]             |_____|    |____| |___|  `.___.'       \_/

Motivation

Federated Learning (FL) has emerged as a privacy-preserving paradigm that enables distributed model training without sharing raw data. However, ensuring traceability in FL workflows remains a challenge due to the inherently distributed nature of FL. Data is spread across multiple clients—ranging from a few to thousands—without clear consumption/production relationships within the workflow.

Tracking and understanding model evolution in this scenario requires insights into the data derivation path and key evaluation metrics (e.g., accuracy, silhouette score) for both local and aggregated models. This information is essential not only for understanding and explaining the training process but also for dynamically adjusting the FL workflow.

Since each training round can take minutes to hours, the ability to monitor and fine-tune the process in real time is critical. Poor hyperparameter configurations can lead to wasted time and computational resources—especially in existing FL frameworks, where users often only evaluate model performance after complete training.

This project addresses these challenges by improving traceability and monitoring capabilities in FL workflows.

🌻 Flower-PROV: Provenance-Aware Federated Learning 🌻

Flower-PROV is an extension of the open-source Flower Federated Learning (FL) framework, designed to integrate provenance tracking as a core component of FL workflows to enhance reproducibility and analysis.

🔍 Key Features

Flower-PROV enables the automatic and distributed capture of:

Retrospective provenance (r-prov): Logs details about the actual FL workflow execution.
Prospective provenance (p-prov): Represents the FL workflow specification.

The captured provenance data includes:

✅ Participating clients
✅ Hyperparameter values
✅ Accuracy metrics
✅ Model versions and checkpoints

Beyond simply collecting provenance data, Flower-PROV actively uses it to:

Dynamically adjust model hyperparameters during training.
Enable clients to recover previously trained models as a starting point for local training, avoiding redundant computations.

⚙️ Software requirements

The following list of software has to be configured/installed to run Flower-PROV.

🐳 Running an Example in a Docker Environment 🐳

We provide a pre-built Docker image that includes the DfAnalyzer provenance library, the Python library, and the provenance database (MonetDB):

docker pull nymeria0042/dfanalyzer

We also provide a docker-compose.yaml that we will use to launch our containers.

This guide demonstrates how to run a Flower-PROV container using the CIFAR-10 dataset. We begin by splitting the dataset in a balanced manner using the dataset-splitter component.

Navigate to the dataset-splitter directory and execute the following command:

python splitter.py --dataset_splitter_config_file config/dataset_splitter.cfg

This will create 5 folders - default - with the data that will be used by each client.

Next, we can start the DfAnalyzer container, which runs in the background to capture all provenance data for the experiment:

docker compose up dfanalyzer

Once the DfAnalyzer service is running, execute the prospective provenance script, which defines the structure and parameters to be captured. Additionally, start the MongoDB service, which stores the model weights for fault tolerance.

When it’s finished, we can start the server:

docker compose up server

Start the clients — five in this demonstration:

docker compose up client1 client2 client3 client4 client5

🔍 Submiting queries

Once the experiment runs, you can submit queries to the provenance database (MonetDB) to monitor metrics and parameter/hyperparameters configurations.

First, we connect to the provenance database, running in the DfAnalyzer container:

 docker exec -it dfanalyzer mclient -u monetdb -d dataflow_analyzer

The default password is monetdb.

Then, we can submit the queries, like:.

SELECT client_id FROM oClientTraining WHERE server_round = 5;
-- to see which clients were participating in round 5

SELECT server_round, accuracy FROM oServerTrainingAggregation ;
-- to see how the accuracy is evolving

SELECT server_round, dynamically_adjusted FROM oTrainingConfig;
--  to see in which rounds the dynamic adjust was trigged

SELECT server_round, insertion_time, weights_mongo_id, checkpoint_time FROM oServerTrainingAggregation;
-- to monitor the insertion of the checkpoints

🖇️ Provenance Graph

The user can also access localhost:22000 to view the provenance graph and understand each step of the FL workflow:

📈 Monitoring

To monitor the metrics, the user can run the streamlit app locally:

streamlit run monitoring/Flower-PROV_Monitor.py

🤝 Team

Current Members
- Camila Lopes
- Aline Paes
- Daniel de Oliveira
Former Members
- Alan Lira
- Cristina Boeres
- Lucia Drummond

Watch this video on YouTube

References

Beutel, D. J., et al. Flower: A Friendly Federated Learning Framework., 2020.
Lopes, C., et al. Provenance-Based Dynamic Fine-Tuning of Cross-Silo Federated Learning. CARLA 2023.

📜 License

This project is licensed under the Apache License 2.0. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
DfAnalyzer		DfAnalyzer
dataset-partitions-prov		dataset-partitions-prov
dataset-splitter		dataset-splitter
flower-prov		flower-prov
monitoring		monitoring
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flower-PROV: Adding Provenance Features in Federated Learning

Motivation

🌻 Flower-PROV: Provenance-Aware Federated Learning 🌻

🔍 Key Features

⚙️ Software requirements

🐳 Running an Example in a Docker Environment 🐳

🔍 Submiting queries

🖇️ Provenance Graph

📈 Monitoring

🤝 Team

Watch this video on YouTube

References

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Flower-PROV: Adding Provenance Features in Federated Learning

Motivation

🌻 Flower-PROV: Provenance-Aware Federated Learning 🌻

🔍 Key Features

⚙️ Software requirements

🐳 Running an Example in a Docker Environment 🐳

🔍 Submiting queries

🖇️ Provenance Graph

📈 Monitoring

🤝 Team

Watch this video on YouTube

References

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages