Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .azure/gpu-tests-fabric.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,12 @@ jobs:
pip list
displayName: "Image info & NVIDIA"

- bash: |
python .actions/assistant.py replace_oldest_ver
pip install "cython<3.0" wheel # for compatibility
condition: contains(variables['Agent.JobName'], 'oldest')
displayName: "setting oldest dependencies"

- bash: |
PYTORCH_VERSION=$(python -c "import torch; print(torch.__version__.split('+')[0])")
pip install -q wget packaging
Expand Down
6 changes: 6 additions & 0 deletions .azure/gpu-tests-pytorch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,12 @@ jobs:
pip list
displayName: "Image info & NVIDIA"

- bash: |
python .actions/assistant.py replace_oldest_ver
pip install "cython<3.0" wheel # for compatibility
condition: contains(variables['Agent.JobName'], 'oldest')
displayName: "setting oldest dependencies"

- bash: |
PYTORCH_VERSION=$(python -c "import torch; print(torch.__version__.split('+')[0])")
pip install -q wget packaging
Expand Down
2 changes: 1 addition & 1 deletion requirements/fabric/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
torch >=2.1.0, <2.8.0
fsspec[http] >=2022.5.0, <2025.6.0
packaging >=20.0, <=25.0
typing-extensions >=4.4.0, <4.15.0
typing-extensions >=4.5.0, <4.15.0
lightning-utilities >=0.10.0, <0.15.0
2 changes: 1 addition & 1 deletion requirements/fabric/strategies.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@

# note: is a bug around 0.10 with `MPS_Accelerator must implement all abstract methods`
# shall be resolved by https://github.com/microsoft/DeepSpeed/issues/4372
deepspeed >=0.8.2, <=0.9.3; platform_system != "Windows" and platform_system != "Darwin" # strict
deepspeed >=0.9.3, <=0.9.3; platform_system != "Windows" and platform_system != "Darwin" # strict
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping the 0.8 since it would need to be compiled from source
Also, noted that we are quite far behind the latest 0.17 🤔

Why Upgrade? Upgrading to v0.17 delivers significant performance, stability, and integration benefits—vital for training larger models with improved efficiency and reliability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ZeRO Optimizations: • v0.9: Early experiments in partitioning model states for memory savings. • v0.17: Advanced refinements (ZeRO-Offload and improved stage 3) enable training massively scaled models.

  • Performance Enhancements: • Upgraded distributed communication and fused operations. • Better mixed precision (fp16/bf16) support for faster training and efficient hardware usage.

  • Stability & API Maturation: • Streamlined configuration, enhanced documentation, and robust testing. • Fewer bugs and smoother integration with frameworks like HuggingFace Transformers.

  • Inference Improvements: • Expanded inference API with support for quantization. • Optimized runtime strategies for production deployment.

  • Ecosystem Integration: • Broader compatibility with modern AI tools and libraries. • Simplifies building and deploying complex deep learning workflows.

# skip bitsandbytes==0.46, due to ValueError: too many values to unpack (expected 2)
bitsandbytes >=0.45.2,!=0.46,<0.47.0; platform_system != "Darwin"
2 changes: 1 addition & 1 deletion requirements/pytorch/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ PyYAML >=5.4, <6.1.0
fsspec[http] >=2022.5.0, <2025.6.0
torchmetrics >=0.7.0, <1.8.0
packaging >=20.0, <=25.0
typing-extensions >=4.4.0, <4.15.0
typing-extensions >=4.5.0, <4.15.0
lightning-utilities >=0.10.0, <0.15.0
2 changes: 1 addition & 1 deletion requirements/pytorch/strategies.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@

# note: is a bug around 0.10 with `MPS_Accelerator must implement all abstract methods`
# shall be resolved by https://github.com/microsoft/DeepSpeed/issues/4372
deepspeed >=0.8.2, <=0.9.3; platform_system != "Windows" and platform_system != "Darwin" # strict
deepspeed >=0.9.3, <=0.9.3; platform_system != "Windows" and platform_system != "Darwin" # strict
Loading