Skip to content

Commit 051a86b

Browse files
authored
Merge branch 'master' into docs/links
2 parents 6c148ca + 4e3cf67 commit 051a86b

File tree

7 files changed

+30
-8
lines changed

7 files changed

+30
-8
lines changed

.github/markdown-links-config.json

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,9 @@
2222
"Accept-Encoding": "zstd, br, gzip, deflate"
2323
}
2424
}
25-
]
25+
],
26+
"timeout": "20s",
27+
"retryOn429": true,
28+
"retryCount": 5,
29+
"fallbackRetryDelay": "20s"
2630
}

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,12 @@ ______________________________________________________________________
5555

5656
 
5757

58+
# Why PyTorch Lightning?
59+
60+
Training models in plain PyTorch is tedious and error-prone - you have to manually handle things like backprop, mixed precision, multi-GPU, and distributed training, often rewriting code for every new project. PyTorch Lightning organizes PyTorch code to automate those complexities so you can focus on your model and data, while keeping full control and scaling from CPU to multi-node without changing your core code.
61+
62+
Fun analogy: If PyTorch is Javascript, PyTorch Lightning is ReactJS or NextJS.
63+
5864
# Lightning has 2 core packages
5965

6066
[PyTorch Lightning: Train and deploy PyTorch at scale](#why-pytorch-lightning).

_notebooks

requirements/pytorch/base.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# NOTE: the upper bound for the package version is only set for CI stability, and it is dropped while installing this package
22
# in case you want to preserve/enforce restrictions on the latest compatible version, add "strict" as an in-line comment
33

4-
torch >=2.1.0, <2.8.0
4+
torch >=2.1.0, <=2.8.0
55
tqdm >=4.57.0, <4.68.0
66
PyYAML >5.4, <6.1.0
77
fsspec[http] >=2022.5.0, <2025.6.0

src/lightning/pytorch/callbacks/model_checkpoint.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -133,9 +133,15 @@ class ModelCheckpoint(Checkpoint):
133133
will only save checkpoints at epochs 0 < E <= N
134134
where both values for ``every_n_epochs`` and ``check_val_every_n_epoch`` evenly divide E.
135135
save_on_train_epoch_end: Whether to run checkpointing at the end of the training epoch.
136-
If this is ``False``, then the check runs at the end of the validation.
136+
If ``True``, checkpoints are saved at the end of every training epoch.
137+
If ``False``, checkpoints are saved at the end of validation.
138+
If ``None`` (default), checkpointing behavior is determined based on training configuration.
139+
If ``check_val_every_n_epoch != 1``, checkpointing will not be performed at the end of
140+
every training epoch. If there are no validation batches of data, checkpointing will occur at the
141+
end of the training epoch. If there is a non-default number of validation runs per training epoch
142+
(``val_check_interval != 1``), checkpointing is performed after validation.
137143
enable_version_counter: Whether to append a version to the existing file name.
138-
If this is ``False``, then the checkpoint files will be overwritten.
144+
If ``False``, then the checkpoint files will be overwritten.
139145
140146
Note:
141147
For extra customization, ModelCheckpoint includes the following attributes:

src/lightning/pytorch/trainer/connectors/accelerator_connector.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -453,10 +453,11 @@ def _check_strategy_and_fallback(self) -> None:
453453

454454
if (
455455
strategy_flag in FSDPStrategy.get_registered_strategies() or type(self._strategy_flag) is FSDPStrategy
456-
) and self._accelerator_flag not in ("cuda", "gpu"):
456+
) and not (self._accelerator_flag in ("cuda", "gpu") or isinstance(self._accelerator_flag, CUDAAccelerator)):
457457
raise ValueError(
458-
f"The strategy `{FSDPStrategy.strategy_name}` requires a GPU accelerator, but got:"
459-
f" {self._accelerator_flag}"
458+
f"The strategy `{FSDPStrategy.strategy_name}` requires a GPU accelerator, but received "
459+
f"`accelerator={self._accelerator_flag!r}`. Please set `accelerator='cuda'`, `accelerator='gpu'`,"
460+
" or pass a `CUDAAccelerator()` instance to use FSDP."
460461
)
461462
if strategy_flag in _DDP_FORK_ALIASES and "fork" not in torch.multiprocessing.get_all_start_methods():
462463
raise ValueError(

tests/tests_pytorch/trainer/connectors/test_accelerator_connector.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -582,6 +582,11 @@ class AcceleratorSubclass(CPUAccelerator):
582582
Trainer(accelerator=AcceleratorSubclass(), strategy=FSDPStrategySubclass())
583583

584584

585+
@RunIf(min_cuda_gpus=1)
586+
def test_check_fsdp_strategy_and_fallback_with_cudaaccelerator():
587+
Trainer(strategy="fsdp", accelerator=CUDAAccelerator())
588+
589+
585590
@mock.patch.dict(os.environ, {}, clear=True)
586591
def test_unsupported_tpu_choice(xla_available, tpu_available):
587592
# if user didn't set strategy, _Connector will choose the SingleDeviceXLAStrategy or XLAStrategy

0 commit comments

Comments
 (0)