Releases · Lightning-AI/pytorch-lightning

10 Sep 16:24

justusschock

1.4.6

00c6640

Standard weekly patch release

[1.4.6] - 2021-09-10

Fixed an issues with export to ONNX format when a model has multiple inputs (#8800)
Removed deprecation warnings being called for on_{task}_dataloader (#9279)
Fixed save/load/resume from checkpoint for DeepSpeed Plugin (#8397, #8644, #8627)
Fixed EarlyStopping running on train epoch end when check_val_every_n_epoch>1 is set (#9156)
Fixed an issue with logger outputs not being finalized correctly after prediction runs (#8333)
Fixed the Apex and DeepSpeed plugin closure running after the on_before_optimizer_step hook (#9288)
Fixed the Native AMP plugin closure not running with manual optimization (#9288)
Fixed bug where data-loading functions where not getting the correct running stage passed (#8858)
Fixed intra-epoch evaluation outputs staying in memory when the respective *_epoch_end hook wasn't overridden (#9261)
Fixed error handling in DDP process reconciliation when _sync_dir was not initialized (#9267)
Fixed PyTorch Profiler not enabled for manual optimization (#9316)
Fixed inspection of other args when a container is specified in save_hyperparameters (#9125)
Fixed signature of Timer.on_train_epoch_end and StochasticWeightAveraging.on_train_epoch_end to prevent unwanted deprecation warnings (#9347)

Contributors

@ananthsub @awaelchli @Borda @four4fish @justusschock @kaushikb11 @s-rog @SeanNaren @tangbinh @tchaton @xerus

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

ananthsub, xerus, and 9 other contributors

Assets 4

01 Sep 13:27

ethanwharris

1.4.5

645eabe

Standard weekly patch release

[1.4.5] - 2021-08-31

Fixed reduction using self.log(sync_dict=True, reduce_fx={mean,max}) (#9142)
Fixed not setting a default value for max_epochs if max_time was specified on the Trainer constructor (#9072)
Fixed the CometLogger, no longer modifies the metrics in place. Instead creates a copy of metrics before performing any operations (#9150)
Fixed DDP "CUDA error: initialization error" due to a copy instead of deepcopy on ResultCollection (#9239)

Contributors

@ananthsub @bamblebam @carmocca @daniellepintz @ethanwharris @kaushikb11 @sohamtiwari3120 @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

ananthsub, ethanwharris, and 6 other contributors

Assets 4

24 Aug 15:10

awaelchli

1.4.4

817d344

Standard weekly patch release

[1.4.4] - 2021-08-24

Fixed a bug in the binary search mode of auto batch size scaling where exception was raised if the first trainer run resulted in OOM (#8954)
Fixed a bug causing logging with log_gpu_memory='min_max' not working (#9013)

Contributors

@SkafteNicki @eladsegal

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

eladsegal and SkafteNicki

Assets 4

23 Aug 15:55

kaushikb11

1.4.3

490e9c9

Standard weekly patch release

[1.4.3] - 2021-08-17

Fixed plateau scheduler stepping on incomplete epoch (#8861)
Fixed infinite loop with CycleIterator and multiple loaders (#8889)
Fixed StochasticWeightAveraging with a list of learning rates not applying them to each param group (#8747)
Restore original loaders if replaced by entrypoint (#8885)
Fixed lost reference to _Metadata object in ResultMetricCollection (#8932)
Ensure the existence of DDPPlugin._sync_dir in reconciliate_processes (#8939)

Contributors

@awaelchli @carmocca @justusschock @tchaton @yifuwang
If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

yifuwang, awaelchli, and 3 other contributors

Assets 4

11 Aug 13:51

SeanNaren

1.4.2

b845c44

Standard weekly patch release

[1.4.2] - 2021-08-10

Fixed recursive call for apply_to_collection(include_none=False) (#8719)
Fixed truncated backprop through time enablement when set as a property on the LightningModule and not the Trainer (#8804)
Fixed comments and exception message for metrics_to_scalars (#8782)
Fixed typo error in LightningLoggerBase.after_save_checkpoint docstring (#8737)

Contributors

@Aiden-Jeon @ananthsub @awaelchli @edward-io
If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

ananthsub, awaelchli, and 2 other contributors

Assets 4

03 Aug 14:14

awaelchli

1.4.1

e6b0277

Standard weekly patch release

[1.4.1] - 2021-08-03

Fixed trainer.fit_loop.split_idx always returning None (#8601)
Fixed references for ResultCollection.extra (#8622)
Fixed reference issues during epoch end result collection (#8621)
Fixed horovod auto-detection when horovod is not installed and the launcher is mpirun (#8610)
Fixed an issue with training_step outputs not getting collected correctly for training_epoch_end (#8613)
Fixed distributed types support for CPUs (#8667)
Fixed a deadlock issue with DDP and torchelastic (#8655)
Fixed accelerator=ddp choice for CPU (#8645)

Contributors

@awaelchli, @Borda, @carmocca, @kaushikb11, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

awaelchli, Borda, and 3 other contributors

Assets 4

27 Jul 15:30

kaushikb11

1.4.0

c7f8c8c

TPU Pod Training, IPU Accelerator, DeepSpeed Infinity, Fully Sharded Data Parallel

Today we are excited to announce Lightning 1.4, introducing support for TPU pods, XLA profiling, IPUs, and new plugins to reach 10+ billion parameters, including Deep Speed Infinity, Fully Sharded Data-Parallel and more!

https://devblog.pytorchlightning.ai/announcing-lightning-1-4-8cd20482aee9

[1.4.0] - 2021-07-27

Added

Added extract_batch_size utility and corresponding tests to extract batch dimension from multiple batch types (#8357)
Added support for named parameter groups in LearningRateMonitor (#7987)
Added dataclass support for pytorch_lightning.utilities.apply_to_collection (#7935)
Added support to LightningModule.to_torchscript for saving to custom filesystems with fsspec (#7617)
Added KubeflowEnvironment for use with the PyTorchJob operator in Kubeflow
Added LightningCLI support for config files on object stores (#7521)
Added ModelPruning(prune_on_train_epoch_end=True|False) to choose when to apply pruning (#7704)
Added support for checkpointing based on a provided time interval during training (#7515)
Progress tracking
- Added dataclasses for progress tracking (#6603, #7574, #8140, #8362)
- Add {,load_}state_dict to the progress tracking dataclasses (#8140)
- Connect the progress tracking dataclasses to the loops (#8244, #8362)
- Do not reset the progress tracking dataclasses total counters (#8475)
Added support for passing a LightningDataModule positionally as the second argument to trainer.{validate,test,predict} (#7431)
Added argument trainer.predict(ckpt_path) (#7430)
Added clip_grad_by_value support for TPUs (#7025)
Added support for passing any class to is_overridden (#7918)
Added sub_dir parameter to TensorBoardLogger (#6195)
Added correct dataloader_idx to batch transfer hooks (#6241)
Added include_none=bool argument to apply_to_collection (#7769)
Added apply_to_collections to apply a function to two zipped collections (#7769)
Added ddp_fully_sharded support (#7487)
Added should_rank_save_checkpoint property to Training Plugins (#7684)
Added log_grad_norm hook to LightningModule to customize the logging of gradient norms (#7873)
Added save_config_filename init argument to LightningCLI to ease resolving name conflicts (#7741)
Added save_config_overwrite init argument to LightningCLI to ease overwriting existing config files (#8059)
Added reset dataloader hooks to Training Plugins and Accelerators (#7861)
Added trainer stage hooks for Training Plugins and Accelerators (#7864)
Added the on_before_optimizer_step hook (#8048)
Added IPU Accelerator (#7867)
Fault-tolerant training
- Added {,load_}state_dict to ResultCollection (#7948)
- Added {,load_}state_dict to Loops (#8197)
- Set Loop.restarting=False at the end of the first iteration (#8362)
- Save the loops state with the checkpoint (opt-in) (#8362)
- Save a checkpoint to restore the state on exception (opt-in) (#8362)
- Added state_dict and load_state_dict utilities for CombinedLoader + utilities for dataloader (#8364)
Added rank_zero_only to LightningModule.log function (#7966)
Added metric_attribute to LightningModule.log function (#7966)
Added a warning if Trainer(log_every_n_steps) is a value too high for the training dataloader (#7734)
Added LightningCLI support for argument links applied on instantiation (#7895)
Added LightningCLI support for configurable callbacks that should always be present (#7964)
Added DeepSpeed Infinity Support, and updated to DeepSpeed 0.4.0 (#7234)
Added support for torch.nn.UninitializedParameter in ModelSummary (#7642)
Added support LightningModule.save_hyperparameters when LightningModule is a dataclass (#7992)
Added support for overriding optimizer_zero_grad and optimizer_step when using accumulate_grad_batches (#7980)
Added logger boolean flag to save_hyperparameters (#7960)
Added support for calling scripts using the module syntax (python -m package.script) (#8073)
Added support for optimizers and learning rate schedulers to LightningCLI (#8093)
Added XLA Profiler (#8014)
Added PrecisionPlugin.{pre,post}_backward (#8328)
Added on_load_checkpoint and on_save_checkpoint hooks to the PrecisionPlugin base class (#7831)
Added max_depth parameter in ModelSummary (#8062)
Added XLAStatsMonitor callback (#8235)
Added restore function and restarting attribute to base Loop (#8247)
Added FastForwardSampler and CaptureIterableDataset (#8307)
Added support for save_hyperparameters in LightningDataModule (#3792)
Added the ModelCheckpoint(save_on_train_epoch_end) to choose when to run the saving logic (#8389)
Added LSFEnvironment for distributed training with the LSF resource manager jsrun (#5102)
Added support for accelerator='cpu'|'gpu'|'tpu'|'ipu'|'auto' (#7808)
Added tpu_spawn_debug to plugin registry (#7933)
Enabled traditional/manual launching of DDP processes through LOCAL_RANK and NODE_RANK environment variable assignments (#7480)
Added quantize_on_fit_end argument to QuantizationAwareTraining (#8464)
Added experimental support for loop specialization (#8226)
Added support for devices flag to Trainer (#8440)
Added private prevent_trainer_and_dataloaders_deepcopy context manager on the LightningModule (#8472)
Added support for providing callables to the Lightning CLI instead of types (#8400)

Changed

Decoupled device parsing logic from Accelerator connector to Trainer (#8180)
Changed the Trainer's checkpoint_callback argument to allow only boolean values (#7539)
Log epoch metrics before the on_evaluation_end hook (#7272)
Explicitly disallow calling self.log(on_epoch=False) during epoch-only or single-call hooks (#7874)
Changed these Trainer methods to be protected: call_setup_hook, call_configure_sharded_model, pre_dispatch, dispatch, post_dispatch, call_teardown_hook, run_train, run_sanity_check, run_evaluate, run_evaluation, run_predict, track_output_for_epoch_end
Changed metrics_to_scalars to work with any collection or value (#7888)
Changed clip_grad_norm to use torch.nn.utils.clip_grad_norm_ (#7025)
Validation is now always run inside the training epoch scope (#7357)
ModelCheckpoint now runs at the end of the training epoch by default (#8389)
EarlyStopping now runs at the end of the training epoch by default (#8286)
Refactored Loops
- Moved attributes global_step, current_epoch, max/min_steps, max/min_epochs, batch_idx, and total_batch_idx to TrainLoop (#7437)
- Refactored result handling in training loop (#7506)
- Moved attributes hiddens and split_idx to TrainLoop (#7507)
- Refactored the logic around manual and automatic optimization inside the optimizer loop (#7526)
- Simplified "should run validation" logic (#7682)
- Simplified logic for updating the learning rate for schedulers (#7682)
- Removed the on_epoch guard from the "should stop" validation check (#7701)
- Refactored internal loop interface; added new classes FitLoop, TrainingEpochLoop, TrainingBatchLoop (#7871, #8077)
- Removed pytorch_lightning/trainer/training_loop.py (#7985)
- Refactored evaluation loop interface; added new classes DataLoaderLoop, EvaluationLoop, EvaluationEpochLoop (#7990, #8077)
- Removed pytorch_lightning/trainer/evaluation_loop.py (#8056)
- Restricted public access to several internal functions (#8024)
- Refactored trainer _run_* functions and separate evaluation loops (#8065)
- Refactored prediction loop interface; added new classes PredictionLoop, PredictionEpochLoop (#7700, #8077)
- Removed pytorch_lightning/trainer/predict_loop.py (#8094)
- Moved result teardown to the loops (#8245)
- Improve Loop API to better handle children state_dict and progress (#8334)
Refactored logging
- Renamed and moved core/step_result.py to trainer/connectors/logger_connector/result.py (#7736)
- Dramatically simplify the LoggerConnector (#7882)
- trainer.{logged,progress_bar,callback}_metrics are now updated on-demand (#7882)
- Completely overhaul the Result object in favor of ResultMetric (#7882)
- Improve epoch-level reduction time and overall memory usage (#7882)
- Allow passing self.log(batch_size=...) (#7891)
- Each of the training loops now keeps its own results collection (#7891)
- Remove EpochResultStore and HookResultStore in favor of ResultCollection (#7909)
- Remove MetricsHolder (#7909)
Moved ignore_scalar_return_in_dp warning suppression to the DataParallelPlugin class (#7421)
Changed the behaviour when logging evaluation step metrics to no longer append /epoch_* to the metric name (#7351)
Raised ValueError when a None value is self.log-ed (#7771)
Changed resolve_training_type_plugins to allow setting num_nodes and sync_batchnorm from Trainer setting (#7026)
Default seed_everything(workers=True) in the LightningCLI (#7504)
Changed model.state_dict() in CheckpointConnector to allow training_type_plugin to customize the model's state_dict() (#7474)
MLflowLogger now uses the env variable MLFLOW_TRACKING_URI as default tracking URI (#7457)
Changed Trainer arg and functionality from reload_dataloaders_every_epoch to reload_dataloaders_every_n_epochs (#5043)
Changed WandbLogger(log_model={True/'all'}) to log models as artifacts (#6231)
MLFlowLogger now accepts run_name as an constructor argument (#7622)
Changed teardown() in Accelerator to allow training_type_plugin to customize teardown logic (#7579)
Trainer.fit now raises an error when using manual optimization with unsupp...

Assets 4

01 Jul 13:55

awaelchli

1.3.8

7b3bf48

Standard weekly patch release

[1.3.8] - 2021-07-01

Fixed

Fixed a sync deadlock when checkpointing a LightningModule that uses a torchmetrics 0.4 Metric (#8218)
Fixed compatibility TorchMetrics v0.4 (#8206)
Added torchelastic check when sanitizing GPUs (#8095)
Fixed a DDP info message that was never shown (#8111)
Fixed metrics deprecation message at module import level (#8163)
Fixed a bug where an infinite recursion would be triggered when using the BaseFinetuning callback on a model that contains a ModuleDict (#8170)
Added a mechanism to detect deadlock for DDP when only 1 process trigger an Exception. The mechanism will kill the processes when it happens (#8167)
Fixed NCCL error when selecting non-consecutive device ids (#8165)
Fixed SWA to also work with IterableDataset (#8172)

Contributors

@GabrielePicco @SeanNaren @ethanwharris @carmocca @tchaton @justusschock

Assets 4

23 Jun 13:03

awaelchli

1.3.7post0

0ba147d

Hotfix Patch Release

[1.3.7post0] - 2021-06-23

Fixed

Fixed backward compatibility of moved functions rank_zero_warn and rank_zero_deprecation (#8085)

Contributors

@kaushikb11 @carmocca

Assets 4

22 Jun 14:08

awaelchli

1.3.7

1a6709d

Standard weekly patch release

[1.3.7] - 2021-06-22

Fixed

Fixed a bug where skipping an optimizer while using amp causes amp to trigger an assertion error (#7975)
This conversation was marked as resolved by carmocca
Fixed deprecation messages not showing due to incorrect stacklevel (#8002, #8005)
Fixed setting a DistributedSampler when using a distributed plugin in a custom accelerator (#7814)
Improved PyTorchProfiler chrome traces names (#8009)
Fixed moving the best score to device in EarlyStopping callback for TPU devices (#7959)

Contributors

@yifuwang @kaushikb11 @ajtritt @carmocca @tchaton

Assets 4

Releases: Lightning-AI/pytorch-lightning

Standard weekly patch release

[1.4.6] - 2021-09-10

Contributors

Contributors

Uh oh!

Standard weekly patch release

[1.4.5] - 2021-08-31

Contributors

Contributors

Uh oh!

Standard weekly patch release

[1.4.4] - 2021-08-24

Contributors

Contributors

Uh oh!

Standard weekly patch release

[1.4.3] - 2021-08-17

Contributors

Contributors

Uh oh!

Standard weekly patch release

[1.4.2] - 2021-08-10

Contributors

Contributors

Uh oh!

Standard weekly patch release

[1.4.1] - 2021-08-03

Contributors

Contributors

Uh oh!

TPU Pod Training, IPU Accelerator, DeepSpeed Infinity, Fully Sharded Data Parallel

[1.4.0] - 2021-07-27

Added

Changed

Uh oh!

Standard weekly patch release

[1.3.8] - 2021-07-01

Fixed

Contributors

Uh oh!

Hotfix Patch Release

[1.3.7post0] - 2021-06-23

Fixed

Contributors

Uh oh!

Standard weekly patch release

[1.3.7] - 2021-06-22

Fixed

Contributors

Uh oh!