Fixed the bug in Bottleneck when using the adapter interface for multi-GPU training of custom models. #823
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This code causes the
layer.output_adaptersof cuda:n to always point to thelayer.output_adaptersof cuda 0 during multi-GPU training with the default distributed settings of the Huggingface trainer. The model can be properly distributed to different GPUs. I suspect it is due topartial. So I tried to save variables likelayer.xxxandlayerin the context so that it can run on multiple GPUs.Variables like
residualandhidden stateare both shown to be oncuda1during debugging, butlayeris shown to be oncuda0. I printed the addresses of thelayervariable on two GPUs. The address oflayeroncuda:1is the same as that oncuda:0. Since my GPU can't handle models like Qwen, and it's not easy to provide data for my own model, could you please test whether this problem occurs in multi-GPU training? Thank you! I followed the process of adapters-for-any-transformer.