Skip to content

Commit 66fb3c9

Browse files
authored
Merge branch 'master' into deepspeed_exclude_frozen
2 parents 43090f8 + e55650d commit 66fb3c9

File tree

14 files changed

+107
-50
lines changed

14 files changed

+107
-50
lines changed

docs/source-fabric/advanced/compile.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -417,7 +417,7 @@ Additional Resources
417417

418418
Here are a few resources for further reading after you complete this tutorial:
419419

420-
- `PyTorch 2.0 Paper <https://pytorch.org/blog/pytorch-2-paper-tutorial/>`_
420+
- `PyTorch 2.0 Paper <https://pytorch.org/get-started/pytorch-2-x/>`_
421421
- `GenAI with PyTorch 2.0 blog post series <https://pytorch.org/blog/accelerating-generative-ai-4/>`_
422422
- `Training Production AI Models with PyTorch 2.0 <https://pytorch.org/blog/training-production-ai-models/>`_
423423
- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach <https://pytorch.org/blog/empowering-models-performance/>`_

docs/source-pytorch/advanced/compile.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -396,7 +396,7 @@ Additional Resources
396396

397397
Here are a few resources for further reading after you complete this tutorial:
398398

399-
- `PyTorch 2.0 Paper <https://pytorch.org/blog/pytorch-2-paper-tutorial/>`_
399+
- `PyTorch 2.0 Paper <https://pytorch.org/get-started/pytorch-2-x/>`_
400400
- `GenAI with PyTorch 2.0 blog post series <https://pytorch.org/blog/accelerating-generative-ai-4/>`_
401401
- `Training Production AI Models with PyTorch 2.0 <https://pytorch.org/blog/training-production-ai-models/>`_
402402
- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach <https://pytorch.org/blog/empowering-models-performance/>`_

docs/source-pytorch/versioning.rst

Lines changed: 37 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,8 @@ API Evolution
5353

5454
Lightning's development is driven by research and best practices in a rapidly developing field of AI and machine learning. Change is inevitable and when it happens, the Lightning team is committed to minimizing user friction and maximizing ease of transition from one version to the next. We take backwards compatibility and reproducibility very seriously.
5555

56-
For API removal, renaming or other forms of backwards-incompatible changes, the procedure is:
57-
58-
#. A deprecation process is initiated at a minor version ``MAJOR.MINOR.PATCH`` (e.g. ``1.5.0``), producing a deprecation warning at runtime and removing it from the documentation.
59-
#. The deprecated API remains unchanged during the deprecation phase for two minor versions or the next major update, whichever comes first.
60-
#. The breaking change is done in version ``MAJOR.(MINOR+2).0`` (e.g. ``1.7.0``), or ``(MAJOR+1).0.0`` (e.g. ``2.0.0``), whichever comes first.
61-
#. From that version onward, the deprecation warning gets converted into a helpful error, which will remain until next major release.
56+
Excepting extenuating circumstances (e.g. a critical bug), API removal, renaming or other forms of backwards-incompatible changes are limited to major version upgrades — that is ``(MAJOR+1).0.0``.
57+
Concretely, a breaking change for an API introduced in ``2.x.x`` can be introduced with Lightning ``3.0.0``.
6258

6359
This policy is not strict. Shorter or longer deprecation cycles may apply to some cases.
6460
For example, in the past DDP2 was removed without a deprecation process because the feature was broken and unusable beyond fixing as discussed in `#12584 <https://github.com/Lightning-AI/pytorch-lightning/issues/12584>`_.
@@ -69,6 +65,7 @@ Compatibility matrix
6965

7066
PyTorch Lightning follows `NEP 29 <https://numpy.org/neps/nep-0029-deprecation_policy.html>`_ which PyTorch also follows (`#74203 <https://github.com/pytorch/pytorch/issues/74203>`_).
7167
The table below indicates the coverage of tested versions in our CI. Versions outside the ranges may unofficially work in some cases.
68+
Since the release of PyTorch `2.0`, Lightning strives to officially support the latest 5 PyTorch minor releases with no breaking changes within major versions [1]_.
7269

7370
.. list-table::
7471
:header-rows: 1
@@ -82,102 +79,104 @@ The table below indicates the coverage of tested versions in our CI. Versions ou
8279
* - 2.5
8380
- 2.5
8481
- 2.5
85-
- ≥2.1, ≤2.7
82+
- ≥2.1, (last tested 2.8)
8683
- ≥0.7.0
87-
- ≥3.9, 3.12
84+
- ≥3.9, (last tested 3.12)
8885
* - 2.4
8986
- 2.4
9087
- 2.4
91-
- ≥2.1, 2.6
88+
- ≥2.1, (last tested 2.6)
9289
- ≥0.7.0
93-
- ≥3.9, 3.12
90+
- ≥3.9, (last tested 3.12)
9491
* - 2.3
9592
- 2.3
9693
- 2.3
97-
- ≥2.0, 2.3
94+
- ≥2.0, (last tested 2.3)
9895
- ≥0.7.0
99-
- ≥3.8, 3.11
96+
- ≥3.8, (last tested 3.11)
10097
* - 2.2
10198
- 2.2
10299
- 2.2
103-
- ≥1.13, 2.2
100+
- ≥1.13, (last tested 2.2)
104101
- ≥0.7.0
105-
- ≥3.8, 3.11
102+
- ≥3.8, (last tested 3.11)
106103
* - 2.1
107104
- 2.1
108105
- 2.1
109-
- ≥1.12, 2.1
106+
- ≥1.12, (last tested 2.1)
110107
- ≥0.7.0
111-
- ≥3.8, 3.11
108+
- ≥3.8, (last tested 3.11)
112109
* - 2.0
113110
- 2.0
114111
- 2.0 (GA)
115-
- ≥1.11, 2.0
112+
- ≥1.11, (last tested 2.0)
116113
- ≥0.7.0
117-
- ≥3.8, 3.10
114+
- ≥3.8, (last tested 3.10)
118115
* - 1.9
119116
- 1.9
120117
- 1.9 (experimental)
121-
- ≥1.10, 1.13
118+
- ≥1.10, (last tested 1.13)
122119
- ≥0.7.0
123-
- ≥3.7, 3.10
120+
- ≥3.7, (last tested 3.10)
124121
* - 1.8**
125122
- 1.8
126123
- n/a***
127-
- ≥1.10, 1.13
124+
- ≥1.10, (last tested 1.13)
128125
- ≥0.7.0
129-
- ≥3.7, 3.10
126+
- ≥3.7, (last tested 3.10)
130127
* - n/a
131128
- 1.7
132129
- n/a***
133-
- ≥1.9, 1.12
130+
- ≥1.9, (last tested 1.12)
134131
- ≥0.7.0
135-
- ≥3.7, 3.10
132+
- ≥3.7, (last tested 3.10)
136133
* - n/a
137134
- 1.6
138135
- n/a***
139-
- ≥1.8, 1.11
136+
- ≥1.8, (last tested 1.11)
140137
- ≥0.4.1
141-
- ≥3.7, 3.9
138+
- ≥3.7, (last tested 3.9)
142139
* - n/a
143140
- 1.5
144141
- n/a***
145-
- ≥1.7, 1.10
142+
- ≥1.7, (last tested 1.10)
146143
- ≥0.4.1
147-
- ≥3.6, 3.9
144+
- ≥3.6, (last tested 3.9)
148145
* - n/a
149146
- 1.4
150147
- n/a
151-
- ≥1.6, 1.9
148+
- ≥1.6, (last tested 1.9)
152149
- ≥0.4.0
153-
- ≥3.6, 3.9
150+
- ≥3.6, (last tested 3.9)
154151
* - n/a
155152
- 1.3
156153
- n/a
157-
- ≥1.4, 1.8
154+
- ≥1.4, (last tested 1.8)
158155
- ≥0.2.0
159-
- ≥3.6, 3.9
156+
- ≥3.6, (last tested 3.9)
160157
* - n/a
161158
- 1.2
162159
- n/a
163-
- ≥1.4, 1.8
160+
- ≥1.4, (last tested 1.8)
164161
- n/a*
165-
- ≥3.6, 3.8
162+
- ≥3.6, (last tested 3.8)
166163
* - n/a
167164
- 1.1
168165
- n/a
169-
- ≥1.3, 1.8
166+
- ≥1.3, (last tested 1.8)
170167
- n/a*
171-
- ≥3.6, 3.8
168+
- ≥3.6, (last tested 3.8)
172169
* - n/a
173170
- 1.0
174171
- n/a
175-
- ≥1.3, 1.7
172+
- ≥1.3, (last tested 1.7)
176173
- n/a*
177-
- ≥3.6, 3.8
174+
- ≥3.6, (last tested 3.8)
178175

179176
\* ``torchmetrics`` was part of ``pytorch_lightning`` at the time and was decoupled to a separate package in v1.3.
180177

181178
\*\* The joint ``lightning`` package was first published in version 1.8
182179

183180
\*\*\* Fabric is the evolution of ``LightningLite`` which was released inside ``pytorch_lightning`` 1.5 and was decoupled to a separate package in v1.9
181+
182+
.. [1] See `this community discussion <https://github.com/Lightning-AI/pytorch-lightning/issues/21073#issuecomment-3201706857>`_.

requirements/docs.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
sphinx >5.0, <6.0
2-
myst-parser >=0.18.1, <4.0.0
2+
myst-parser >=0.18.1, <5.0.0
33
nbsphinx >=0.8.5, <=0.9.7
44
nbconvert >7.14, <7.17
55
pandoc >=1.0, <=2.4

requirements/fabric/test.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
coverage ==7.10.4
1+
coverage ==7.10.5
22
numpy >=1.21.0, <1.27.0
33
pytest ==8.4.1
44
pytest-cov ==6.2.1

requirements/pytorch/docs.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,6 @@ nbformat # used for generate empty notebook
44
ipython[notebook] <9.5.0
55
setuptools<81.0 # workaround for `error in ipython setup command: use_2to3 is invalid.`
66

7-
onnxscript >= 0.2.2, <0.4.0
7+
onnxscript >= 0.2.2, < 0.5.0
88

99
#-r ../../_notebooks/.actions/requires.txt

requirements/pytorch/test.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
coverage ==7.10.4
1+
coverage ==7.10.5
22
pytest ==8.4.1
33
pytest-cov ==6.2.1
44
pytest-timeout ==2.4.0
@@ -11,7 +11,7 @@ scikit-learn >0.22.1, <1.8.0
1111
numpy >1.20.0, <1.27.0
1212
onnx >1.12.0, <1.19.0
1313
onnxruntime >=1.12.0, <1.23.0
14-
onnxscript >= 0.1.0, <0.4.0
14+
onnxscript >= 0.1.0, < 0.5.0
1515
psutil <7.0.1 # for `DeviceStatsMonitor`
1616
pandas >2.0, <2.4.0 # needed in benchmarks
1717
fastapi # for `ServableModuleValidator` # not setting version as re-defined in App

src/lightning/fabric/CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
1212
- Added `exclude_frozen_parameters` to `DeepSpeedStrategy` ([#21060](https://github.com/Lightning-AI/pytorch-lightning/pull/21060))
1313

1414

15+
- Added support for NVIDIA H200 GPUs in `get_available_flops` ([#20913](https://github.com/Lightning-AI/pytorch-lightning/pull/21119))
16+
17+
1518
### Removed
1619

1720
-

src/lightning/fabric/utilities/throughput.py

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,23 @@ def measure_flops(
304304

305305
_CUDA_FLOPS: dict[str, dict[Union[str, torch.dtype], float]] = {
306306
# Hopper
307+
# source: https://nvdam.widen.net/s/nb5zzzsjdf/hpc-datasheet-sc23-h200-datasheet-3002446
308+
"h200 sxm1": {
309+
torch.float64: 3.4e13,
310+
torch.float32: 6.7e13,
311+
"tfloat32": 9.9e14,
312+
torch.bfloat16: 2.0e15,
313+
torch.float16: 2.0e15,
314+
torch.int8: 4.0e15,
315+
},
316+
"h200 nvl1": {
317+
torch.float64: 3.0e13,
318+
torch.float32: 6.0e13,
319+
"tfloat32": 8.4e14,
320+
torch.bfloat16: 1.7e15,
321+
torch.float16: 1.7e15,
322+
torch.int8: 3.3e15,
323+
},
307324
# source: https://resources.nvidia.com/en-us-tensor-core
308325
"h100 nvl": {
309326
torch.float64: 67e12,
@@ -536,7 +553,12 @@ def get_available_flops(device: torch.device, dtype: Union[torch.dtype, str]) ->
536553
if device.type == "cuda":
537554
device_name = torch.cuda.get_device_name(device)
538555
chip = device_name.lower()
539-
if "h100" in chip:
556+
if "h200" in chip:
557+
if "sxm1" in chip:
558+
chip = "h200 sxm1"
559+
elif "nvl1" in chip:
560+
chip = "h200 nvl1"
561+
elif "h100" in chip:
540562
if "hbm3" in chip:
541563
chip = "h100 sxm"
542564
elif "nvl" in chip:

src/lightning/pytorch/CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
4242

4343
- Fixed misalignment column while using rich model summary in `DeepSpeedstrategy` ([#21100](https://github.com/Lightning-AI/pytorch-lightning/pull/21100))
4444

45+
46+
- Fixed `RichProgressBar` crashing when sanity checking using val dataloader with 0 len ([#21108](https://github.com/Lightning-AI/pytorch-lightning/pull/21108))
47+
4548
---
4649

4750
## [2.5.3] - 2025-08-13

0 commit comments

Comments
 (0)