Skip to content

Commit 523c8dd

Browse files
authored
Merge branch 'master' into device-enhance
2 parents dee68b1 + a944e77 commit 523c8dd

File tree

295 files changed

+792
-565
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

295 files changed

+792
-565
lines changed

.github/CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ We welcome any useful contribution! For your convenience here's a recommended wo
182182
1. Use tags in PR name for the following cases:
183183

184184
- **\[blocked by #<number>\]** if your work is dependent on other PRs.
185-
- **\[wip\]** when you start to re-edit your work, mark it so no one will accidentally merge it in meantime.
185+
- **[wip]** when you start to re-edit your work, mark it so no one will accidentally merge it in meantime.
186186

187187
### Question & Answer
188188

.github/checkgroup.yml

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -23,26 +23,26 @@ subprojects:
2323
- "pl-cpu (macOS-14, lightning, 3.10, 2.1)"
2424
- "pl-cpu (macOS-14, lightning, 3.11, 2.2.2)"
2525
- "pl-cpu (macOS-14, lightning, 3.11, 2.3)"
26-
- "pl-cpu (macOS-14, lightning, 3.12, 2.4.1)"
27-
- "pl-cpu (macOS-14, lightning, 3.12, 2.5.1)"
26+
- "pl-cpu (macOS-14, lightning, 3.12.7, 2.4.1)"
27+
- "pl-cpu (macOS-14, lightning, 3.12.7, 2.5.1)"
2828
- "pl-cpu (ubuntu-20.04, lightning, 3.9, 2.1, oldest)"
2929
- "pl-cpu (ubuntu-20.04, lightning, 3.10, 2.1)"
3030
- "pl-cpu (ubuntu-20.04, lightning, 3.11, 2.2.2)"
3131
- "pl-cpu (ubuntu-20.04, lightning, 3.11, 2.3)"
32-
- "pl-cpu (ubuntu-22.04, lightning, 3.12, 2.4.1)"
33-
- "pl-cpu (ubuntu-22.04, lightning, 3.12, 2.5.1)"
32+
- "pl-cpu (ubuntu-22.04, lightning, 3.12.7, 2.4.1)"
33+
- "pl-cpu (ubuntu-22.04, lightning, 3.12.7, 2.5.1)"
3434
- "pl-cpu (windows-2022, lightning, 3.9, 2.1, oldest)"
3535
- "pl-cpu (windows-2022, lightning, 3.10, 2.1)"
3636
- "pl-cpu (windows-2022, lightning, 3.11, 2.2.2)"
3737
- "pl-cpu (windows-2022, lightning, 3.11, 2.3)"
38-
- "pl-cpu (windows-2022, lightning, 3.12, 2.4.1)"
39-
- "pl-cpu (windows-2022, lightning, 3.12, 2.5.1)"
38+
- "pl-cpu (windows-2022, lightning, 3.12.7, 2.4.1)"
39+
- "pl-cpu (windows-2022, lightning, 3.12.7, 2.5.1)"
4040
- "pl-cpu (macOS-14, pytorch, 3.9, 2.1)"
4141
- "pl-cpu (ubuntu-20.04, pytorch, 3.9, 2.1)"
4242
- "pl-cpu (windows-2022, pytorch, 3.9, 2.1)"
43-
- "pl-cpu (macOS-14, pytorch, 3.12, 2.5.1)"
44-
- "pl-cpu (ubuntu-22.04, pytorch, 3.12, 2.5.1)"
45-
- "pl-cpu (windows-2022, pytorch, 3.12, 2.5.1)"
43+
- "pl-cpu (macOS-14, pytorch, 3.12.7, 2.5.1)"
44+
- "pl-cpu (ubuntu-22.04, pytorch, 3.12.7, 2.5.1)"
45+
- "pl-cpu (windows-2022, pytorch, 3.12.7, 2.5.1)"
4646

4747
- id: "pytorch_lightning: Azure GPU"
4848
paths:
@@ -176,26 +176,26 @@ subprojects:
176176
- "fabric-cpu (macOS-14, lightning, 3.10, 2.1)"
177177
- "fabric-cpu (macOS-14, lightning, 3.11, 2.2.2)"
178178
- "fabric-cpu (macOS-14, lightning, 3.11, 2.3)"
179-
- "fabric-cpu (macOS-14, lightning, 3.12, 2.4.1)"
180-
- "fabric-cpu (macOS-14, lightning, 3.12, 2.5.1)"
179+
- "fabric-cpu (macOS-14, lightning, 3.12.7, 2.4.1)"
180+
- "fabric-cpu (macOS-14, lightning, 3.12.7, 2.5.1)"
181181
- "fabric-cpu (ubuntu-20.04, lightning, 3.9, 2.1, oldest)"
182182
- "fabric-cpu (ubuntu-20.04, lightning, 3.10, 2.1)"
183183
- "fabric-cpu (ubuntu-20.04, lightning, 3.11, 2.2.2)"
184184
- "fabric-cpu (ubuntu-20.04, lightning, 3.11, 2.3)"
185-
- "fabric-cpu (ubuntu-22.04, lightning, 3.12, 2.4.1)"
186-
- "fabric-cpu (ubuntu-22.04, lightning, 3.12, 2.5.1)"
185+
- "fabric-cpu (ubuntu-22.04, lightning, 3.12.7, 2.4.1)"
186+
- "fabric-cpu (ubuntu-22.04, lightning, 3.12.7, 2.5.1)"
187187
- "fabric-cpu (windows-2022, lightning, 3.9, 2.1, oldest)"
188188
- "fabric-cpu (windows-2022, lightning, 3.10, 2.1)"
189189
- "fabric-cpu (windows-2022, lightning, 3.11, 2.2.2)"
190190
- "fabric-cpu (windows-2022, lightning, 3.11, 2.3)"
191-
- "fabric-cpu (windows-2022, lightning, 3.12, 2.4.1)"
192-
- "fabric-cpu (windows-2022, lightning, 3.12, 2.5.1)"
191+
- "fabric-cpu (windows-2022, lightning, 3.12.7, 2.4.1)"
192+
- "fabric-cpu (windows-2022, lightning, 3.12.7, 2.5.1)"
193193
- "fabric-cpu (macOS-14, fabric, 3.9, 2.1)"
194194
- "fabric-cpu (ubuntu-20.04, fabric, 3.9, 2.1)"
195195
- "fabric-cpu (windows-2022, fabric, 3.9, 2.1)"
196-
- "fabric-cpu (macOS-14, fabric, 3.12, 2.5.1)"
197-
- "fabric-cpu (ubuntu-22.04, fabric, 3.12, 2.5.1)"
198-
- "fabric-cpu (windows-2022, fabric, 3.12, 2.5.1)"
196+
- "fabric-cpu (macOS-14, fabric, 3.12.7, 2.5.1)"
197+
- "fabric-cpu (ubuntu-22.04, fabric, 3.12.7, 2.5.1)"
198+
- "fabric-cpu (windows-2022, fabric, 3.12.7, 2.5.1)"
199199

200200
- id: "lightning_fabric: Azure GPU"
201201
paths:

.github/workflows/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Brief description of all our automation tools used for boosting development perf
1616
| .azure-pipelines/gpu-benchmarks.yml | Run speed/memory benchmarks for parity with vanila PyTorch. | GPU |
1717
| .github/workflows/ci-flagship-apps.yml | Run end-2-end tests with full applications, including deployment to the production cloud. | CPU |
1818
| .github/workflows/ci-tests-pytorch.yml | Run all tests except for accelerator-specific, standalone and slow tests. | CPU |
19-
| .github/workflows/tpu-tests.yml | Run only TPU-specific tests. Requires that the PR title contains '\[TPU\]' | TPU |
19+
| .github/workflows/tpu-tests.yml | Run only TPU-specific tests. Requires that the PR title contains '[TPU]' | TPU |
2020

2121
\* Each standalone test needs to be run in separate processes to avoid unwanted interactions between test cases.
2222

.github/workflows/docs-build.yml

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,21 @@ jobs:
174174
with:
175175
project_id: ${{ secrets.GCS_PROJECT }}
176176

177+
# Uploading docs as archive to GCS, so they can be as backup
178+
- name: Upload docs as archive to GCS 🪣
179+
if: startsWith(github.ref, 'refs/tags/') || github.event_name == 'workflow_dispatch'
180+
working-directory: docs/build
181+
run: |
182+
zip ${{ env.VERSION }}.zip -r html/
183+
gsutil cp ${{ env.VERSION }}.zip ${GCP_TARGET}
184+
185+
- name: Inject version selector
186+
working-directory: docs/build
187+
run: |
188+
pip install -q wget
189+
python -m wget https://raw.githubusercontent.com/Lightning-AI/utilities/main/scripts/inject-selector-script.py
190+
python inject-selector-script.py html ${{ matrix.pkg-name }}
191+
177192
# Uploading docs to GCS, so they can be served on lightning.ai
178193
- name: Upload docs/${{ matrix.pkg-name }}/stable to GCS 🪣
179194
if: startsWith(github.ref, 'refs/heads/release/') && github.event_name == 'push'
@@ -188,11 +203,3 @@ jobs:
188203
- name: Upload docs/${{ matrix.pkg-name }}/release to GCS 🪣
189204
if: startsWith(github.ref, 'refs/tags/') || github.event_name == 'workflow_dispatch'
190205
run: gsutil -m rsync -d -R docs/build/html/ ${GCP_TARGET}/${{ env.VERSION }}
191-
192-
# Uploading docs as archive to GCS, so they can be as backup
193-
- name: Upload docs as archive to GCS 🪣
194-
if: startsWith(github.ref, 'refs/tags/') || github.event_name == 'workflow_dispatch'
195-
working-directory: docs/build
196-
run: |
197-
zip ${{ env.VERSION }}.zip -r html/
198-
gsutil cp ${{ env.VERSION }}.zip ${GCP_TARGET}

.pre-commit-config.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ ci:
2323

2424
repos:
2525
- repo: https://github.com/pre-commit/pre-commit-hooks
26-
rev: v4.6.0
26+
rev: v5.0.0
2727
hooks:
2828
- id: end-of-file-fixer
2929
- id: trailing-whitespace
@@ -65,12 +65,12 @@ repos:
6565
args: ["--in-place"]
6666

6767
- repo: https://github.com/sphinx-contrib/sphinx-lint
68-
rev: v0.9.1
68+
rev: v1.0.0
6969
hooks:
7070
- id: sphinx-lint
7171

7272
- repo: https://github.com/astral-sh/ruff-pre-commit
73-
rev: v0.5.0
73+
rev: v0.8.6
7474
hooks:
7575
# try to fix what is possible
7676
- id: ruff
@@ -81,7 +81,7 @@ repos:
8181
- id: ruff
8282

8383
- repo: https://github.com/executablebooks/mdformat
84-
rev: 0.7.17
84+
rev: 0.7.21
8585
hooks:
8686
- id: mdformat
8787
additional_dependencies:

docs/source-pytorch/common/tbptt.rst

Lines changed: 64 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -12,48 +12,91 @@ hidden states should be kept in-between each time-dimension split.
1212
.. code-block:: python
1313
1414
import torch
15+
import torch.nn as nn
16+
import torch.nn.functional as F
1517
import torch.optim as optim
16-
import pytorch_lightning as pl
17-
from pytorch_lightning import LightningModule
18+
from torch.utils.data import Dataset, DataLoader
1819
19-
class LitModel(LightningModule):
20+
import lightning as L
21+
22+
23+
class AverageDataset(Dataset):
24+
def __init__(self, dataset_len=300, sequence_len=100):
25+
self.dataset_len = dataset_len
26+
self.sequence_len = sequence_len
27+
self.input_seq = torch.randn(dataset_len, sequence_len, 10)
28+
top, bottom = self.input_seq.chunk(2, -1)
29+
self.output_seq = top + bottom.roll(shifts=1, dims=-1)
30+
31+
def __len__(self):
32+
return self.dataset_len
33+
34+
def __getitem__(self, item):
35+
return self.input_seq[item], self.output_seq[item]
36+
37+
38+
class LitModel(L.LightningModule):
2039
2140
def __init__(self):
2241
super().__init__()
2342
43+
self.batch_size = 10
44+
self.in_features = 10
45+
self.out_features = 5
46+
self.hidden_dim = 20
47+
2448
# 1. Switch to manual optimization
2549
self.automatic_optimization = False
26-
2750
self.truncated_bptt_steps = 10
28-
self.my_rnn = ParityModuleRNN() # Define RNN model using ParityModuleRNN
51+
52+
self.rnn = nn.LSTM(self.in_features, self.hidden_dim, batch_first=True)
53+
self.linear_out = nn.Linear(in_features=self.hidden_dim, out_features=self.out_features)
54+
55+
def forward(self, x, hs):
56+
seq, hs = self.rnn(x, hs)
57+
return self.linear_out(seq), hs
2958
3059
# 2. Remove the `hiddens` argument
3160
def training_step(self, batch, batch_idx):
32-
3361
# 3. Split the batch in chunks along the time dimension
34-
split_batches = split_batch(batch, self.truncated_bptt_steps)
35-
36-
batch_size = 10
37-
hidden_dim = 20
38-
hiddens = torch.zeros(1, batch_size, hidden_dim, device=self.device)
39-
for split_batch in range(split_batches):
40-
# 4. Perform the optimization in a loop
41-
loss, hiddens = self.my_rnn(split_batch, hiddens)
42-
self.backward(loss)
43-
self.optimizer.step()
44-
self.optimizer.zero_grad()
62+
x, y = batch
63+
split_x, split_y = [
64+
x.tensor_split(self.truncated_bptt_steps, dim=1),
65+
y.tensor_split(self.truncated_bptt_steps, dim=1)
66+
]
67+
68+
hiddens = None
69+
optimizer = self.optimizers()
70+
losses = []
71+
72+
# 4. Perform the optimization in a loop
73+
for x, y in zip(split_x, split_y):
74+
y_pred, hiddens = self(x, hiddens)
75+
loss = F.mse_loss(y_pred, y)
76+
77+
optimizer.zero_grad()
78+
self.manual_backward(loss)
79+
optimizer.step()
4580
4681
# 5. "Truncate"
47-
hiddens = hiddens.detach()
82+
hiddens = [h.detach() for h in hiddens]
83+
losses.append(loss.detach())
84+
85+
avg_loss = sum(losses) / len(losses)
86+
self.log("train_loss", avg_loss, prog_bar=True)
4887
4988
# 6. Remove the return of `hiddens`
5089
# Returning loss in manual optimization is not needed
5190
return None
5291
5392
def configure_optimizers(self):
54-
return optim.Adam(self.my_rnn.parameters(), lr=0.001)
93+
return optim.Adam(self.parameters(), lr=0.001)
94+
95+
def train_dataloader(self):
96+
return DataLoader(AverageDataset(), batch_size=self.batch_size)
97+
5598
5699
if __name__ == "__main__":
57100
model = LitModel()
58-
trainer = pl.Trainer(max_epochs=5)
59-
trainer.fit(model, train_dataloader) # Define your own dataloader
101+
trainer = L.Trainer(max_epochs=5)
102+
trainer.fit(model)

docs/source-pytorch/tuning/profiler_intermediate.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ The profiler will generate an output like this:
5555
Self CPU time total: 1.681ms
5656
5757
.. note::
58-
When using the PyTorch Profiler, wall clock time will not not be representative of the true wall clock time.
58+
When using the PyTorch Profiler, wall clock time will not be representative of the true wall clock time.
5959
This is due to forcing profiled operations to be measured synchronously, when many CUDA ops happen asynchronously.
6060
It is recommended to use this Profiler to find bottlenecks/breakdowns, however for end to end wall clock time use
6161
the ``SimpleProfiler``.
@@ -142,7 +142,7 @@ This profiler will record ``training_step``, ``validation_step``, ``test_step``,
142142
The output above shows the profiling for the action ``training_step``.
143143

144144
.. note::
145-
When using the PyTorch Profiler, wall clock time will not not be representative of the true wall clock time.
145+
When using the PyTorch Profiler, wall clock time will not be representative of the true wall clock time.
146146
This is due to forcing profiled operations to be measured synchronously, when many CUDA ops happen asynchronously.
147147
It is recommended to use this Profiler to find bottlenecks/breakdowns, however for end to end wall clock time use
148148
the ``SimpleProfiler``.

examples/fabric/build_your_own_trainer/run.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
import lightning as L
21
import torch
32
from torchmetrics.functional.classification.accuracy import accuracy
43
from trainer import MyCustomTrainer
54

5+
import lightning as L
6+
67

78
class MNISTModule(L.LightningModule):
89
def __init__(self) -> None:

examples/fabric/build_your_own_trainer/trainer.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,16 @@
33
from functools import partial
44
from typing import Any, Literal, Optional, Union, cast
55

6-
import lightning as L
76
import torch
7+
from lightning_utilities import apply_to_collection
8+
from tqdm import tqdm
9+
10+
import lightning as L
811
from lightning.fabric.accelerators import Accelerator
912
from lightning.fabric.loggers import Logger
1013
from lightning.fabric.strategies import Strategy
1114
from lightning.fabric.wrappers import _unwrap_objects
1215
from lightning.pytorch.utilities.model_helpers import is_overridden
13-
from lightning_utilities import apply_to_collection
14-
from tqdm import tqdm
1516

1617

1718
class MyCustomTrainer:

examples/fabric/dcgan/train_fabric.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,10 @@
1616
import torch.utils.data
1717
import torchvision.transforms as transforms
1818
import torchvision.utils
19-
from lightning.fabric import Fabric, seed_everything
2019
from torchvision.datasets import CelebA
2120

21+
from lightning.fabric import Fabric, seed_everything
22+
2223
# Root directory for dataset
2324
dataroot = "data/"
2425
# Number of workers for dataloader

0 commit comments

Comments
 (0)