Skip to content

Commit 00c6640

Browse files
awaelchlijustusschock
authored andcommitted
1.4.6 release
Co-authored-by: Justus Schock <[email protected]>
1 parent 3e6df2f commit 00c6640

File tree

18 files changed

+113
-288
lines changed

18 files changed

+113
-288
lines changed

CHANGELOG.md

Lines changed: 4 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -7,40 +7,23 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
77

88
## [1.4.6] - 2021-09-07
99

10-
- Fixed signature of `Timer.on_train_epoch_end` and `StochasticWeightAveraging.on_train_epoch_end` to prevent unwanted deprecation warnings ([#9347](https://github.com/PyTorchLightning/pytorch-lightning/pull/9347))
11-
12-
13-
## [1.4.5] - 2021-08-31
14-
1510
- Fixed an issues with export to ONNX format when a model has multiple inputs ([#8800](https://github.com/PyTorchLightning/pytorch-lightning/pull/8800))
16-
1711
- Removed deprecation warnings being called for `on_{task}_dataloader` ([#9279](https://github.com/PyTorchLightning/pytorch-lightning/pull/9279))
18-
1912
- Fixed save/load/resume from checkpoint for DeepSpeed Plugin (
2013
[#8397](https://github.com/PyTorchLightning/pytorch-lightning/pull/8397),
2114
[#8644](https://github.com/PyTorchLightning/pytorch-lightning/pull/8644),
2215
[#8627](https://github.com/PyTorchLightning/pytorch-lightning/pull/8627))
23-
24-
2516
- Fixed `EarlyStopping` running on train epoch end when `check_val_every_n_epoch>1` is set ([#9156](https://github.com/PyTorchLightning/pytorch-lightning/pull/9156))
26-
27-
2817
- Fixed an issue with logger outputs not being finalized correctly after prediction runs ([#8333](https://github.com/PyTorchLightning/pytorch-lightning/issues/8333))
29-
30-
18+
- Fixed the Apex and DeepSpeed plugin closure running after the `on_before_optimizer_step` hook ([#9288](https://github.com/PyTorchLightning/pytorch-lightning/issues/9288))
19+
- Fixed the Native AMP plugin closure not running with manual optimization ([#9288](https://github.com/PyTorchLightning/pytorch-lightning/issues/9288))
3120
- Fixed bug where data-loading functions where not getting the correct running stage passed ([#8858](https://github.com/PyTorchLightning/pytorch-lightning/pull/8858))
32-
33-
3421
- Fixed intra-epoch evaluation outputs staying in memory when the respective `*_epoch_end` hook wasn't overridden ([#9261](https://github.com/PyTorchLightning/pytorch-lightning/pull/9261))
35-
36-
3722
- Fixed error handling in DDP process reconciliation when `_sync_dir` was not initialized ([#9267](https://github.com/PyTorchLightning/pytorch-lightning/pull/9267))
38-
39-
4023
- Fixed PyTorch Profiler not enabled for manual optimization ([#9316](https://github.com/PyTorchLightning/pytorch-lightning/pull/9316))
41-
42-
4324
- Fixed inspection of other args when a container is specified in `save_hyperparameters` ([#9125](https://github.com/PyTorchLightning/pytorch-lightning/pull/9125))
25+
- Fixed signature of `Timer.on_train_epoch_end` and `StochasticWeightAveraging.on_train_epoch_end` to prevent unwanted deprecation warnings ([#9347](https://github.com/PyTorchLightning/pytorch-lightning/pull/9347))
26+
4427

4528
## [1.4.5] - 2021-08-31
4629

docs/source/extensions/callbacks.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,6 @@ Lightning has a few built-in callbacks.
108108
ModelPruning
109109
ProgressBar
110110
ProgressBarBase
111-
RichProgressBar
112111
QuantizationAwareTraining
113112
StochasticWeightAveraging
114113
XLAStatsMonitor

pytorch_lightning/__about__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import time
22

33
_this_year = time.strftime("%Y")
4-
__version__ = "1.4.5"
4+
__version__ = "1.4.6"
55
__author__ = "William Falcon et al."
66
__author_email__ = "[email protected]"
77
__license__ = "Apache-2.0"

pytorch_lightning/callbacks/early_stopping.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ def __init__(
9191
check_finite: bool = True,
9292
stopping_threshold: Optional[float] = None,
9393
divergence_threshold: Optional[float] = None,
94-
check_on_train_epoch_end: bool = True,
94+
check_on_train_epoch_end: Optional[bool] = None,
9595
):
9696
super().__init__()
9797
self.min_delta = min_delta
@@ -201,7 +201,7 @@ def _run_early_stopping_check(self, trainer: "pl.Trainer") -> None:
201201
# when in dev debugging
202202
trainer.dev_debugger.track_early_stopping_history(self, current)
203203

204-
should_stop, reason = self._evalute_stopping_criteria(current)
204+
should_stop, reason = self._evaluate_stopping_criteria(current)
205205

206206
# stop every ddp process if any world process decides to stop
207207
should_stop = trainer.training_type_plugin.reduce_boolean_decision(should_stop)
@@ -211,7 +211,7 @@ def _run_early_stopping_check(self, trainer: "pl.Trainer") -> None:
211211
if reason and self.verbose:
212212
self._log_info(trainer, reason)
213213

214-
def _evalute_stopping_criteria(self, current: torch.Tensor) -> Tuple[bool, str]:
214+
def _evaluate_stopping_criteria(self, current: torch.Tensor) -> Tuple[bool, str]:
215215
should_stop = False
216216
reason = None
217217
if self.check_finite and not torch.isfinite(current):

pytorch_lightning/core/lightning.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -669,10 +669,8 @@ def training_step(self, *args, **kwargs) -> STEP_OUTPUT:
669669
670670
- :class:`~torch.Tensor` - The loss tensor
671671
- ``dict`` - A dictionary. Can include any keys, but must include the key ``'loss'``
672-
- ``None`` - Training will skip to the next batch
673-
674-
Note:
675-
Returning ``None`` is currently not supported for multi-GPU or TPU, or with 16-bit precision enabled.
672+
- ``None`` - Training will skip to the next batch. This is only for automatic optimization.
673+
This is not supported for multi-GPU or TPU, or using ``DeepSpeed``.
676674
677675
In this step you'd normally do the forward pass and calculate the loss for a batch.
678676
You can also do fancier things like multiple forward passes or something model specific.

pytorch_lightning/plugins/precision/apex_amp.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -97,10 +97,13 @@ def pre_optimizer_step(
9797
**kwargs: Any,
9898
) -> bool:
9999
"""Hook to do something before each optimizer step."""
100+
result = lambda_closure() # APEX amp does not support closures
100101
super().pre_optimizer_step(model, optimizer, optimizer_idx, lambda_closure, **kwargs)
101-
# the following should be in a `optimizer_step` hook but we don't have one in the precision plugin.
102-
lambda_closure() # APEX amp does not support closures
103-
optimizer.step(**kwargs)
102+
skipped_backward = result is None
103+
# in manual optimization, the closure does not return a value
104+
if not model.automatic_optimization or not skipped_backward:
105+
# the following should be in a `optimizer_step` hook but we don't have one in the precision plugin.
106+
optimizer.step(**kwargs)
104107
return False
105108

106109
def on_load_checkpoint(self, checkpoint: Dict[str, Any]) -> None:

pytorch_lightning/plugins/precision/deepspeed_precision.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
import pytorch_lightning as pl
2121
from pytorch_lightning.plugins.precision.precision_plugin import PrecisionPlugin
2222
from pytorch_lightning.utilities import GradClipAlgorithmType
23+
from pytorch_lightning.utilities.exceptions import MisconfigurationException
2324
from pytorch_lightning.utilities.model_helpers import is_overridden
2425
from pytorch_lightning.utilities.warnings import WarningCache
2526

@@ -42,9 +43,14 @@ def pre_optimizer_step(
4243
**kwargs: Any,
4344
) -> bool:
4445
"""Hook to do something before each optimizer step."""
46+
result = lambda_closure() # DeepSpeed does not support closures
4547
super().pre_optimizer_step(model, optimizer, optimizer_idx, lambda_closure, **kwargs)
48+
# in manual optimization, the closure does not return a value
49+
if model.automatic_optimization and result is None:
50+
raise MisconfigurationException(
51+
"Skipping backward by returning `None` from your `training_step` is not supported by `DeepSpeed`"
52+
)
4653
# the following should be in a `optimizer_step` hook but we don't have one in the precision plugin.
47-
lambda_closure() # DeepSpeed does not support closures
4854
deepspeed_engine = model.trainer.model
4955
deepspeed_engine.step()
5056
return False

pytorch_lightning/plugins/precision/native_amp.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -54,13 +54,13 @@ def pre_optimizer_step(
5454
f"native PyTorch amp and lbfgs are not compatible (optimizer {optimizer_idx})."
5555
" To request, please file a Github issue in PyTorch and tag @mcarilli"
5656
)
57-
result = True
58-
if model.automatic_optimization:
59-
result = lambda_closure()
57+
result = lambda_closure() # native amp does not support closures
6058
self.scaler.unscale_(optimizer)
6159
super().pre_optimizer_step(model, optimizer, optimizer_idx, lambda_closure, **kwargs)
62-
# lambda_closure returning None indicates that backward has been skipped
63-
if result is not None:
60+
skipped_backward = result is None
61+
# in manual optimization, the closure does not return a value
62+
if not model.automatic_optimization or not skipped_backward:
63+
# note: the scaler will skip the `optimizer.step` if nonfinite gradients are found
6464
self.scaler.step(optimizer)
6565
self.scaler.update()
6666
return False

pytorch_lightning/plugins/training_type/deepspeed.py

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
from pytorch_lightning.utilities.exceptions import MisconfigurationException
3636
from pytorch_lightning.utilities.imports import _DEEPSPEED_AVAILABLE
3737
from pytorch_lightning.utilities.types import LRSchedulerTypeTuple
38-
from pytorch_lightning.utilities.warnings import _warn, LightningDeprecationWarning, warning_cache
38+
from pytorch_lightning.utilities.warnings import _warn, LightningDeprecationWarning
3939

4040
if _DEEPSPEED_AVAILABLE:
4141
import deepspeed
@@ -671,19 +671,18 @@ def save_checkpoint(self, checkpoint: Dict, filepath: str) -> None:
671671
checkpoint: The checkpoint state dictionary
672672
filepath: write-target file's path
673673
"""
674-
if self.zero_stage_3 and self._multi_device and self.is_global_zero:
675-
warning_cache.warn(
676-
"When saving the DeepSpeed Stage 3 checkpoint, "
677-
"each worker will save a shard of the checkpoint within a directory. "
678-
"If a single file is required after training, "
679-
"see https://pytorch-lightning.readthedocs.io/en/latest/advanced/advanced_gpu.html#"
680-
"deepspeed-zero-stage-3-single-file for instructions."
681-
)
682-
# Use deepspeed's internal checkpointing function to handle partitioned weights across processes
683-
# dump states as a checkpoint dictionary object
684-
_exclude_keys = ["state_dict", "optimizer_states", "lr_schedulers"]
685-
checkpoint = {k: v for k, v in checkpoint.items() if k not in _exclude_keys}
686-
self.deepspeed_engine.save_checkpoint(filepath, client_state=checkpoint)
674+
if self.world_size > 1 and self.zero_stage_3:
675+
if self.save_full_weights:
676+
# todo: expose this as general function in deepspeed
677+
state_dict = self.deepspeed_engine._zero3_consolidated_fp16_state_dict()
678+
if self.is_global_zero:
679+
# State dict keys will include reference to wrapper LightningDeepSpeedModule
680+
# Delete `module` prefix before saving.
681+
state_dict = {k.partition("module.")[2]: state_dict[k] for k in state_dict.keys()}
682+
checkpoint["state_dict"] = state_dict
683+
return super().save_checkpoint(checkpoint, filepath)
684+
return
685+
687686
# Use deepspeed's internal checkpointing function to handle partitioned weights across processes
688687
# dump states as a checkpoint dictionary object
689688
save_dir = self._filepath_to_dir(filepath)

pytorch_lightning/profiler/pytorch.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -286,19 +286,16 @@ def __init__(
286286
"""
287287
super().__init__(dirpath=dirpath, filename=filename, output_filename=output_filename)
288288

289-
record_functions = self.__deprecation_check(profiled_functions, record_functions)
290-
291289
self._group_by_input_shapes = group_by_input_shapes and profiler_kwargs.get("record_shapes", False)
292290
self._emit_nvtx = emit_nvtx
293291
self._export_to_chrome = export_to_chrome
294292
self._row_limit = row_limit
295293
self._sort_by_key = sort_by_key or f"{'cuda' if profiler_kwargs.get('use_cuda', False) else 'cpu'}_time_total"
296-
self._user_record_functions = record_functions
294+
self._user_record_functions = set(record_functions or set())
297295
self._record_functions_start = self._user_record_functions | self.START_RECORD_FUNCTIONS
298296
self._record_functions = self._user_record_functions | self.RECORD_FUNCTIONS
299297
self._record_module_names = record_module_names
300298
self._profiler_kwargs = profiler_kwargs
301-
302299
self.profiler: Optional[_PROFILER] = None
303300
self.function_events: Optional["EventList"] = None
304301
self._lightning_module: Optional["LightningModule"] = None # set by ProfilerConnector

0 commit comments

Comments
 (0)