Generic questions about the challenge #40
-
Hello, Thank you for organizing this nice challenge. I have a few generic questions about Task_1.
3b) How can I cut down the wall time spent per round? Running the run_challenge_experiment was very slow, even when I set collaborator training selection to just 1 collaborator at all rounds. For example, if I have some problem with my setup (e.g. not enough disk space to save checkpoints), I don't want to have to wait 48 hours before I find that out, and testing that wouldn't require having to train and validate with all collaborators. 3c) How do I plot the metrics displayed by collaborator.py (in openfl) during training? Can we only visualize the results after the entire thing has been trained? Again, this was problematic because training even the simplest test takes a very long time. Thank you |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 14 replies
-
Hi @danpak94, Thank you for your interest in the FeTS Challenge!
3a. You can ask questions related to OpenFL/GaNDLF here itself, and the organizers will either answer directly or provide references to the documentation, whenever appropriate. 3b & c: tagging @brandon-edwards @msheller @psfoley @alexey-gruzdev Cheers, |
Beta Was this translation helpful? Give feedback.
-
Hi danpak94,
For testing purposes (ie, something like making sure you have enough disk space to save the checkpoints), you can set the value of 'challenge_metrics_validation_interval' to a large value in order to skip the validation step for most rounds (if your test does not require seeing much model validation). Limiting to one collaborator is also a good idea, and when doing so try to pick a smaller one. You can also make your own partitioning csv, in order to create institutions of whatever size you wish for such tests. Running on GPU, I get through a single round this way (single small institution) in minutes. In addition, evaluating Hausdorff distance is computationally expensive and so adds quit a bit to the runtime. To address this (for those that do not care to collect this aspect of validation), we will soon be pushing a change that allows the removal of Hausdorff from validation altogether.
I would suggest using checkpointing, then running your experiment with progressively greater values of rounds_to_train (like a run with 5 while enabling saving of a checkpoint, then run by restoring from that checkpoint with a new value of rounds_to_train of 10, ...). Each time you get a dataframe as the return of run_challenge_experiment, and you can plot those as they come in. A short script wrapping run_challenge_experiment could automate this.
|
Beta Was this translation helpful? Give feedback.
Hi danpak94,
Thanks for your questions. Please find responses to 3b and 3c below.