Skip to content

Commit 2bda743

Browse files
committed
Merge remote-tracking branch 'upstream/master' into fabric_callback_filtering
2 parents 73a65b6 + e21b172 commit 2bda743

File tree

17 files changed

+99
-36
lines changed

17 files changed

+99
-36
lines changed

.github/checkgroup.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,9 @@ subprojects:
4747
- "!*.md"
4848
- "!**/*.md"
4949
checks:
50-
- "pytorch.yml / Lit Job (nvidia/cuda:12.1.1-runtime-ubuntu22.04, pytorch, 3.10, L4_X_2)"
51-
- "pytorch.yml / Lit Job (nvidia/cuda:12.6.3-runtime-ubuntu22.04, lightning, 3.12, L4_X_2)"
52-
- "pytorch.yml / Lit Job (nvidia/cuda:12.6.3-runtime-ubuntu22.04, pytorch, 3.12, L4_X_2)"
50+
- "pytorch.yml / Lit Job (nvidia/cuda:12.1.1-runtime-ubuntu22.04, pytorch, 3.10)"
51+
- "pytorch.yml / Lit Job (lightning, 3.12)"
52+
- "pytorch.yml / Lit Job (pytorch, 3.12)"
5353

5454
- id: "Benchmarks"
5555
paths:
@@ -148,9 +148,9 @@ subprojects:
148148
- "!*.md"
149149
- "!**/*.md"
150150
checks:
151-
- "fabric.yml / Lit Job (nvidia/cuda:12.1.1-runtime-ubuntu22.04, fabric, 3.10, L4_X_2)"
152-
- "fabric.yml / Lit Job (nvidia/cuda:12.6.3-runtime-ubuntu22.04, fabric, 3.12, L4_X_2)"
153-
- "fabric.yml / Lit Job (nvidia/cuda:12.6.3-runtime-ubuntu22.04, lightning, 3.12, L4_X_2)"
151+
- "fabric.yml / Lit Job (nvidia/cuda:12.1.1-runtime-ubuntu22.04, fabric, 3.10)"
152+
- "fabric.yml / Lit Job (fabric, 3.12)"
153+
- "fabric.yml / Lit Job (lightning, 3.12)"
154154

155155
# Temporarily disabled
156156
# - id: "lightning_fabric: TPU workflow"

.github/workflows/probot-check-group.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,14 @@ jobs:
1212
required-jobs:
1313
runs-on: ubuntu-latest
1414
if: github.event.pull_request.draft == false
15-
timeout-minutes: 61 # in case something is wrong with the internal timeout
15+
timeout-minutes: 71 # in case something is wrong with the internal timeout
1616
steps:
1717
- uses: Lightning-AI/[email protected]
1818
env:
1919
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
2020
with:
2121
job: check-group
2222
interval: 180 # seconds
23-
timeout: 60 # minutes
23+
timeout: 70 # minutes
2424
maintainers: "Lightning-AI/lai-frameworks"
2525
owner: "carmocca"

.lightning/workflows/fabric.yml

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,22 @@ trigger:
44
pull_request:
55
branches: ["master", "release/stable"]
66

7-
timeout: "55" # minutes
7+
timeout: "60" # minutes
8+
machine: "L4_X_2"
9+
image: "nvidia/cuda:12.6.3-runtime-ubuntu22.04"
810
parametrize:
911
matrix: {}
1012
include:
1113
# note that this is setting also all oldest requirements which is linked to python == 3.10
1214
- image: "nvidia/cuda:12.1.1-runtime-ubuntu22.04"
1315
PACKAGE_NAME: "fabric"
1416
python_version: "3.10"
15-
machine: "L4_X_2"
16-
- image: "nvidia/cuda:12.6.3-runtime-ubuntu22.04"
17-
PACKAGE_NAME: "fabric"
17+
- PACKAGE_NAME: "fabric"
1818
python_version: "3.12"
19-
machine: "L4_X_2"
2019
# - image: "nvidia/cuda:12.6-runtime-ubuntu22.04"
2120
# PACKAGE_NAME: "fabric"
22-
- image: "nvidia/cuda:12.6.3-runtime-ubuntu22.04"
23-
PACKAGE_NAME: "lightning"
21+
- PACKAGE_NAME: "lightning"
2422
python_version: "3.12"
25-
machine: "L4_X_2"
2623
exclude: []
2724

2825
env:

.lightning/workflows/pytorch.yml

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,25 +4,22 @@ trigger:
44
pull_request:
55
branches: ["master", "release/stable"]
66

7-
timeout: "55" # minutes
7+
timeout: "60" # minutes
8+
machine: "L4_X_2"
9+
image: "nvidia/cuda:12.6.3-runtime-ubuntu22.04"
810
parametrize:
911
matrix: {}
1012
include:
1113
# note that this also sets oldest requirements which are linked to Python == 3.10
1214
- image: "nvidia/cuda:12.1.1-runtime-ubuntu22.04"
1315
PACKAGE_NAME: "pytorch"
1416
python_version: "3.10"
15-
machine: "L4_X_2"
16-
- image: "nvidia/cuda:12.6.3-runtime-ubuntu22.04"
17-
PACKAGE_NAME: "pytorch"
17+
- PACKAGE_NAME: "pytorch"
1818
python_version: "3.12"
19-
machine: "L4_X_2"
2019
# - image: "nvidia/cuda:12.6.3-runtime-ubuntu22.04"
2120
# PACKAGE_NAME: "pytorch"
22-
- image: "nvidia/cuda:12.6.3-runtime-ubuntu22.04"
23-
PACKAGE_NAME: "lightning"
21+
- PACKAGE_NAME: "lightning"
2422
python_version: "3.12"
25-
machine: "L4_X_2"
2623
exclude: []
2724

2825
env:

docs/source-pytorch/common/checkpointing_intermediate.rst

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,13 @@ For fine-grained control over checkpointing behavior, use the :class:`~lightning
2121
checkpoint_callback = ModelCheckpoint(dirpath="my/path/", save_top_k=2, monitor="val_loss")
2222
trainer = Trainer(callbacks=[checkpoint_callback])
2323
trainer.fit(model)
24-
checkpoint_callback.best_model_path
24+
25+
# Access best and last model checkpoint directly from the callback
26+
print(checkpoint_callback.best_model_path)
27+
print(checkpoint_callback.last_model_path)
28+
# Or via the trainer
29+
print(trainer.checkpoint_callback.best_model_path)
30+
print(trainer.checkpoint_callback.last_model_path)
2531
2632
Any value that has been logged via *self.log* in the LightningModule can be monitored.
2733

docs/source-pytorch/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -645,6 +645,7 @@ def package_list_from_file(file):
645645
r"installation.html$",
646646
r"starter/installation.html$",
647647
r"^../common/trainer.html#trainer-flags$",
648+
"https://medium.com/pytorch-lightning/quick-contribution-guide-86d977171b3a",
648649
"https://deepgenerativemodels.github.io/assets/slides/cs236_lecture11.pdf",
649650
"https://developer.habana.ai", # returns 403 error but redirects to intel.com documentation
650651
"https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html",

docs/source-pytorch/data/alternatives.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,12 @@ The webdataset library contains a small wrapper (``WebLoader``) that adds a flui
9999
import lightning as L
100100
import webdataset as wds
101101
102-
dataset = wds.WebDataset(urls)
102+
dataset = wds.WebDataset(
103+
urls,
104+
# needed for multi-gpu or multi-node training
105+
workersplitter=wds.shardlists.split_by_worker,
106+
nodesplitter=wds.shardlists.split_by_node,
107+
)
103108
train_dataloader = wds.WebLoader(dataset)
104109
105110
model = ...

requirements/docs.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ myst-parser >=0.18.1, <5.0.0
33
nbsphinx >=0.8.5, <=0.9.7
44
nbconvert >7.14, <7.17
55
pandoc >=1.0, <=2.4
6-
docutils>=0.18.1,<=0.22
6+
docutils>=0.18.1,<=0.22.2
77
sphinxcontrib-fulltoc >=1.0, <=1.2.0
88
sphinxcontrib-mockautodoc
99
sphinx-autobuild

requirements/fabric/test.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,5 @@ pytest-timeout ==2.4.0
66
pytest-rerunfailures ==16.0.1
77
pytest-random-order ==1.2.0
88
click ==8.1.8; python_version < "3.11"
9-
click ==8.2.1; python_version > "3.10"
9+
click ==8.3.0; python_version > "3.10"
1010
tensorboardX >=2.6, <2.7.0 # todo: relax it back to `>=2.2` after fixing tests

requirements/pytorch/test.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ numpy >1.20.0, <1.27.0
1212
onnx >1.12.0, <1.20.0
1313
onnxruntime >=1.12.0, <1.23.0
1414
onnxscript >= 0.1.0, < 0.5.0
15-
psutil <7.0.1 # for `DeviceStatsMonitor`
15+
psutil <7.1.1 # for `DeviceStatsMonitor`
1616
pandas >2.0, <2.4.0 # needed in benchmarks
1717
fastapi # for `ServableModuleValidator` # not setting version as re-defined in App
1818
uvicorn # for `ServableModuleValidator` # not setting version as re-defined in App

0 commit comments

Comments
 (0)