Baseline functions for aggregator & collaborator #23
-
what are the aggregator & collaborator functions that you are going to use as baseline? |
Beta Was this translation helpful? Give feedback.
Replies: 12 comments
-
Hi @dskhanirfan, I am not sure what you mean by "aggregator & collaborator functions". Could you please elaborate? |
Beta Was this translation helpful? Give feedback.
-
There are options for aggregation_function like weighted_average_aggregation and clipped_aggregation, options for choose_training_collaborators options like all_collaborators_train and one_collaborator_on_odd_rounds, options for training_hyper_parameters_for_round like constant_hyper_parameter, train_less_each_round and fixed_number_of_batches, which functions will be used as a baseline for performance evaluation? |
Beta Was this translation helpful? Give feedback.
-
There is no baseline per se for participant ranking and final selection, rather participants will be ranked against each other based on their performance scores using whatever customization they choose to use. If you are asking what could be considered baseline to compare against your own customizations, appropriate functions would be weighted_average_aggregation, all_collaborators_train, and constant_hyper_parameter as these represent simple yet reasonable first passes at running a federation. |
Beta Was this translation helpful? Give feedback.
-
Thanks, How many iterations are considered baseline? 5 is default for 1 epoch |
Beta Was this translation helpful? Give feedback.
-
The total number of rounds of 5 was used only as a value to use for a short test. To run a complete test, set this to something excessively large (like 1000 rounds). The experiment.py script will exit when the simulated time exceeds 1 week and will now return a dataframe of your results which you can use for a plot (the simulated time of 1 week is considering a complete run) (look to the top README to find out more about simulated time). If you wish however to run a shorter test, you can keep the rounds small. In this case the results of the experiment will be calculated by projecting your training curve out to one simulated week using the max performance metric value over the rounds you completed. In general, stopping your experiment short of the week of simulated training time will under-estimate the final score achievable by your method. |
Beta Was this translation helpful? Give feedback.
-
Hello @brandon-edwards I ran 5 rounds and it took me 30-40 hours. running 1000 seems like Mission Impossible. Currently it seems that the framework does not support multi-gpu and multi-cpu? |
Beta Was this translation helpful? Give feedback.
-
Hi @zhanghaoyue, please see the words following my suggestion to set to 1000 rounds to see that early exit will occur, though 1 week of simulated time will indeed take a good deal of time. The OpenFL framework allows a model writer to train their model in whatever way they wish, including multi-gpu, etc.. However for the challenge, YES, the model we are using does not have data-parallel support. The primary issue here, is that the code to produce collaborator model updates needs to result in the exact same collaborator updates (for a given setting of the training parameters) for all participants in the challenge. Holding the model code constant (as far as what collaborator-side model updates are produced for a given setting of the training parameters) is a critical feature of this challenge, and data parallel training (for example) general changes the data science (results in different collaborator trained updates). What participants are supposed to demonstrate is improved FL logic (holding the collaborator model update creation constant, but changing the four functions in the notebook). Every participant faces the same difficulty of long times to experiment completion. |
Beta Was this translation helpful? Give feedback.
-
Which ML model is used in Task1 for example UNet? and what does WT, ET, TC stand for in DICE WT, DICE ET, DICE TC? |
Beta Was this translation helpful? Give feedback.
-
It is a U-Net with residual connections.
This is the BraTS convention and stands for the following:
|
Beta Was this translation helpful? Give feedback.
-
What is meant by label 1, label 2 and label 4? Can you also confirm that the following 28 patients data out of 369 is missing? 149,248, 249, 252,254,255,256,258,259,262,263,267,268,271,281,284,287,289,292,305,307,314,316,317,318,320,324,335 ? |
Beta Was this translation helpful? Give feedback.
-
We follow the BraTS convention:
|
Beta Was this translation helpful? Give feedback.
-
I believe the original query has been addressed. If you have any further questions, please comment and/or open a new discussion. |
Beta Was this translation helpful? Give feedback.
There is no baseline per se for participant ranking and final selection, rather participants will be ranked against each other based on their performance scores using whatever customization they choose to use. If you are asking what could be considered baseline to compare against your own customizations, appropriate functions would be weighted_average_aggregation, all_collaborators_train, and constant_hyper_parameter as these represent simple yet reasonable first passes at running a federation.