Fix: Remove duplicate W&B initialization in offline mode (#3818) #3886
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR addresses an issue where, when using
acceleratewith Weights & Biases (W&B) in offline mode, duplicate WB runs were being initialized. This resulted in two "offline-run-..." directories being created for a single training process, which is an unintended and redundant behavior.The problem stemmed from the
WandBTracker.store_init_configurationmethod. In offline mode, this method would explicitly callwandb.init()again to include the run's configuration, even thoughwandb.init()had already be called byWandBTracker.start(). This redundant initialization led to the creation of a new W&B run, effectively duplicating the logging process.This PR resolves the issue by removing the second, offline-mode-specific
wandb.init()call withinWandBTracker.store_init_configuration. Instead, it now consistently useswandb.config.update(values, allow_val_change=True)to update the run's configuration. This approach correctly integrates the configuration in the existing W&B run without triggering a new initialization. The fix ensures that only a single W&B run is initialized and maintained throughout the training process when operating in offline mode, leading to cleaner W&B directories and more accurate run management.Fixes #3818
Before submitting