feat: PyTorch Extras Container #26
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
torch-extrasContainerThis PR adds a new container named
ml-containers/torch-extras, which isml-containers/torchwith supplementary libraries DeepSpeed and flash-attention.The code is originally based off of #21, but significantly more generalized and with the finetuner application-specific parts removed.
Rationale
DeepSpeed and flash-attention both require CUDA development tools to install properly. This complicates using them with anything but an
nvidia/cuda:...-develbased image. Optionally including them with ourml-containers/torchcontainers allows for still-lightweight images that can use those powerful libraries without the full CUDA development toolkit. It also reduces compile time for downstream Dockerfiles, since flash-attention takes a long time to compile at whatever step it is included.Structure
ml-containers/torch-extrais separated out as a separate container, unlike the tag-differentiatedtorch:baseandtorch:ncclflavours of the baseline torch image. These are simply layers on top of thetorch:baseandtorch:ncclimages, and are built as a second CI step immediately after either of those two are built.Since compatibility of DeepSpeed and flash-attention may lag behind PyTorch releases themselves, the secondary step to build these images can be temporarily disabled via flags in
torch-base.ymlandtorch-nccl.ymluntil they become compatible.I welcome comments and suggestions on this build process and structure, because it requires tradeoffs. It guarantees that the
torch-extrascontainers are always built, whenever possible, on newtorchimage updates, but it makes it more difficult to build thetorch-extrascontainers standalone, if desired.