Skip to content

Commit da1b0d2

Browse files
authored
Merge branch 'master' into ci/lit
2 parents 3c736e4 + 791753b commit da1b0d2

File tree

17 files changed

+202
-31
lines changed

17 files changed

+202
-31
lines changed

.github/CONTRIBUTING.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -113,14 +113,28 @@ ______________________________________________________________________
113113

114114
To set up a local development environment, we recommend using `uv`, which can be installed following their [instructions](https://docs.astral.sh/uv/getting-started/installation/).
115115

116-
Once `uv` has been installed, begin by cloning the repository:
116+
Once `uv` has been installed, begin by cloning the forked repository:
117117

118118
```bash
119-
git clone https://github.com/Lightning-AI/lightning.git
120-
cd lightning
119+
git clone https://github.com/{YOUR_GITHUB_USERNAME}/pytorch-lightning.git
120+
cd pytorch-lightning
121121
```
122122

123-
Once in root level of the repository, create a new virtual environment and install the project dependencies.
123+
> If you're using [Lightning Studio](https://lightning.ai) or already have your `uv venv` activated, you can quickly set up the project by running:
124+
125+
```bash
126+
make setup
127+
```
128+
129+
This will:
130+
131+
- Install all required dependencies.
132+
- Perform an editable install of the `pytorch-lightning` project.
133+
- Install and configure `pre-commit`.
134+
135+
#### Manual Setup (Optional)
136+
137+
If you prefer more fine-grained control over the dependencies, you can set up the environment manually:
124138

125139
```bash
126140
uv venv

Makefile

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.PHONY: test clean docs
1+
.PHONY: test clean docs setup
22

33
# to imitate SLURM set only single node
44
export SLURM_LOCALID=0
@@ -7,6 +7,23 @@ export SPHINX_MOCK_REQUIREMENTS=1
77
# install only Lightning Trainer packages
88
export PACKAGE_NAME=pytorch
99

10+
setup:
11+
uv pip install -r requirements.txt \
12+
-r requirements/pytorch/base.txt \
13+
-r requirements/pytorch/test.txt \
14+
-r requirements/pytorch/extra.txt \
15+
-r requirements/pytorch/strategies.txt \
16+
-r requirements/fabric/base.txt \
17+
-r requirements/fabric/test.txt \
18+
-r requirements/fabric/strategies.txt \
19+
-r requirements/typing.txt \
20+
-e ".[all]" \
21+
pre-commit
22+
pre-commit install
23+
@echo "-----------------------------"
24+
@echo "✅ Environment setup complete. Ready to Contribute ⚡️!"
25+
26+
1027
clean:
1128
# clean all temp runs
1229
rm -rf $(shell find . -name "mlruns")

docs/source-pytorch/common/checkpointing_basic.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ The LightningModule also has access to the Hyperparameters
111111
.. code-block:: python
112112
113113
model = MyLightningModule.load_from_checkpoint("/path/to/checkpoint.ckpt")
114-
print(model.learning_rate)
114+
print(model.hparams.learning_rate)
115115
116116
----
117117

examples/fabric/image_classifier/train_fabric.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ def run(hparams):
158158
# When using distributed training, use `fabric.save`
159159
# to ensure the current process is allowed to save a checkpoint
160160
if hparams.save_model:
161-
fabric.save(model.state_dict(), "mnist_cnn.pt")
161+
fabric.save(path="mnist_cnn.pt", state=model.state_dict())
162162

163163

164164
if __name__ == "__main__":

examples/fabric/kfold_cv/train_fabric.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ def run(hparams):
161161
# When using distributed training, use `fabric.save`
162162
# to ensure the current process is allowed to save a checkpoint
163163
if hparams.save_model:
164-
fabric.save(model.state_dict(), "mnist_cnn.pt")
164+
fabric.save(path="mnist_cnn.pt", state=model.state_dict())
165165

166166

167167
if __name__ == "__main__":

examples/fabric/tensor_parallel/train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ def train():
6767
# See `fabric consolidate --help` if you need to convert the checkpoint to a single file
6868
fabric.print("Saving a (distributed) checkpoint ...")
6969
state = {"model": model, "optimizer": optimizer, "iteration": i}
70-
fabric.save("checkpoint.pt", state)
70+
fabric.save(path="checkpoint.pt", state=state)
7171

7272
fabric.print("Training successfully completed!")
7373
fabric.print(f"Peak memory usage: {torch.cuda.max_memory_allocated() / 1e9:.02f} GB")

src/lightning/fabric/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
1616

1717
-
1818

19+
### Changed
20+
21+
- Raise ValueError when seed is `out-of-bounds` or `cannot be cast to int` ([#21029](https://github.com/Lightning-AI/pytorch-lightning/pull/21029))
22+
1923

2024
---
2125

src/lightning/fabric/utilities/seed.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,8 @@ def seed_everything(seed: Optional[int] = None, workers: bool = False, verbose:
2727
Args:
2828
seed: the integer value seed for global random state in Lightning.
2929
If ``None``, it will read the seed from ``PL_GLOBAL_SEED`` env variable. If ``None`` and the
30-
``PL_GLOBAL_SEED`` env variable is not set, then the seed defaults to 0.
30+
``PL_GLOBAL_SEED`` env variable is not set, then the seed defaults to 0. If seed is
31+
not in bounds or cannot be cast to int, a ValueError is raised.
3132
workers: if set to ``True``, will properly configure all dataloaders passed to the
3233
Trainer with a ``worker_init_fn``. If the user already provides such a function
3334
for their dataloaders, setting this argument will have no influence. See also:
@@ -44,14 +45,12 @@ def seed_everything(seed: Optional[int] = None, workers: bool = False, verbose:
4445
try:
4546
seed = int(env_seed)
4647
except ValueError:
47-
seed = 0
48-
rank_zero_warn(f"Invalid seed found: {repr(env_seed)}, seed set to {seed}")
48+
raise ValueError(f"Invalid seed specified via PL_GLOBAL_SEED: {repr(env_seed)}")
4949
elif not isinstance(seed, int):
5050
seed = int(seed)
5151

5252
if not (min_seed_value <= seed <= max_seed_value):
53-
rank_zero_warn(f"{seed} is not in bounds, numpy accepts from {min_seed_value} to {max_seed_value}")
54-
seed = 0
53+
raise ValueError(f"{seed} is not in bounds, numpy accepts from {min_seed_value} to {max_seed_value}")
5554

5655
if verbose:
5756
log.info(rank_prefixed_message(f"Seed set to {seed}", _get_rank()))

src/lightning/pytorch/CHANGELOG.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
1010

1111
### Added
1212

13-
-
13+
- Added support for general mappings being returned from `training_step` when using manual optimization ([#21011](https://github.com/Lightning-AI/pytorch-lightning/pull/21011))
14+
1415

1516

1617
### Changed
@@ -26,6 +27,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
2627
### Fixed
2728

2829
- fix progress bar console clearing for Rich `14.1+` ([#21016](https://github.com/Lightning-AI/pytorch-lightning/pull/21016))
30+
- fix `AdvancedProfiler` to handle nested profiling actions for Python 3.12+ ([#20809](https://github.com/Lightning-AI/pytorch-lightning/pull/20809))
2931

3032

3133
---

src/lightning/pytorch/loops/loop.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ class _Loop:
2323
def __init__(self, trainer: "pl.Trainer") -> None:
2424
self._restarting = False
2525
self._loaded_from_state_dict = False
26+
self._resuming_from_checkpoint = False
2627
self.trainer = trainer
2728

2829
@property
@@ -38,6 +39,11 @@ def restarting(self, restarting: bool) -> None:
3839
if isinstance(loop, _Loop):
3940
loop.restarting = restarting
4041

42+
@property
43+
def is_resuming(self) -> bool:
44+
"""Indicates whether training is being resumed from a checkpoint."""
45+
return self._resuming_from_checkpoint
46+
4147
def reset_restart_stage(self) -> None:
4248
pass
4349

@@ -87,6 +93,7 @@ def load_state_dict(
8793
v.load_state_dict(state_dict.copy(), prefix + k + ".")
8894
self.restarting = True
8995
self._loaded_from_state_dict = True
96+
self._resuming_from_checkpoint = True
9097

9198
def _load_from_state_dict(self, state_dict: dict, prefix: str) -> None:
9299
for k, v in self.__dict__.items():
@@ -102,4 +109,5 @@ def _load_from_state_dict(self, state_dict: dict, prefix: str) -> None:
102109
def on_iteration_done(self) -> None:
103110
self._restarting = False
104111
self._loaded_from_state_dict = False
112+
self._resuming_from_checkpoint = False
105113
self.reset_restart_stage()

0 commit comments

Comments
 (0)