Releases: Lightning-AI/pytorch-lightning
Standard weekly patch release
[1.4.6] - 2021-09-10
- Fixed an issues with export to ONNX format when a model has multiple inputs (#8800)
- Removed deprecation warnings being called for
on_{task}_dataloader(#9279) - Fixed save/load/resume from checkpoint for DeepSpeed Plugin (#8397, #8644, #8627)
- Fixed
EarlyStoppingrunning on train epoch end whencheck_val_every_n_epoch>1is set (#9156) - Fixed an issue with logger outputs not being finalized correctly after prediction runs (#8333)
- Fixed the Apex and DeepSpeed plugin closure running after the
on_before_optimizer_stephook (#9288) - Fixed the Native AMP plugin closure not running with manual optimization (#9288)
- Fixed bug where data-loading functions where not getting the correct running stage passed (#8858)
- Fixed intra-epoch evaluation outputs staying in memory when the respective
*_epoch_endhook wasn't overridden (#9261) - Fixed error handling in DDP process reconciliation when
_sync_dirwas not initialized (#9267) - Fixed PyTorch Profiler not enabled for manual optimization (#9316)
- Fixed inspection of other args when a container is specified in
save_hyperparameters(#9125) - Fixed signature of
Timer.on_train_epoch_endandStochasticWeightAveraging.on_train_epoch_endto prevent unwanted deprecation warnings (#9347)
Contributors
@ananthsub @awaelchli @Borda @four4fish @justusschock @kaushikb11 @s-rog @SeanNaren @tangbinh @tchaton @xerus
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.4.5] - 2021-08-31
- Fixed reduction using
self.log(sync_dict=True, reduce_fx={mean,max})(#9142) - Fixed not setting a default value for
max_epochsifmax_timewas specified on theTrainerconstructor (#9072) - Fixed the CometLogger, no longer modifies the metrics in place. Instead creates a copy of metrics before performing any operations (#9150)
- Fixed
DDP"CUDA error: initialization error" due to acopyinstead ofdeepcopyonResultCollection(#9239)
Contributors
@ananthsub @bamblebam @carmocca @daniellepintz @ethanwharris @kaushikb11 @sohamtiwari3120 @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.4.4] - 2021-08-24
- Fixed a bug in the binary search mode of auto batch size scaling where exception was raised if the first trainer run resulted in OOM (#8954)
- Fixed a bug causing logging with
log_gpu_memory='min_max'not working (#9013)
Contributors
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.4.3] - 2021-08-17
- Fixed plateau scheduler stepping on incomplete epoch (#8861)
- Fixed infinite loop with
CycleIteratorand multiple loaders (#8889) - Fixed
StochasticWeightAveragingwith a list of learning rates not applying them to each param group (#8747) - Restore original loaders if replaced by entrypoint (#8885)
- Fixed lost reference to
_Metadataobject inResultMetricCollection(#8932) - Ensure the existence of
DDPPlugin._sync_dirinreconciliate_processes(#8939)
Contributors
@awaelchli @carmocca @justusschock @tchaton @yifuwang
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.4.2] - 2021-08-10
- Fixed recursive call for
apply_to_collection(include_none=False)(#8719) - Fixed truncated backprop through time enablement when set as a property on the LightningModule and not the Trainer (#8804)
- Fixed comments and exception message for metrics_to_scalars (#8782)
- Fixed typo error in LightningLoggerBase.after_save_checkpoint docstring (#8737)
Contributors
@Aiden-Jeon @ananthsub @awaelchli @edward-io
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.4.1] - 2021-08-03
- Fixed
trainer.fit_loop.split_idxalways returningNone(#8601) - Fixed references for
ResultCollection.extra(#8622) - Fixed reference issues during epoch end result collection (#8621)
- Fixed horovod auto-detection when horovod is not installed and the launcher is
mpirun(#8610) - Fixed an issue with
training_stepoutputs not getting collected correctly fortraining_epoch_end(#8613) - Fixed distributed types support for CPUs (#8667)
- Fixed a deadlock issue with DDP and torchelastic (#8655)
- Fixed
accelerator=ddpchoice for CPU (#8645)
Contributors
@awaelchli, @Borda, @carmocca, @kaushikb11, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
TPU Pod Training, IPU Accelerator, DeepSpeed Infinity, Fully Sharded Data Parallel
Today we are excited to announce Lightning 1.4, introducing support for TPU pods, XLA profiling, IPUs, and new plugins to reach 10+ billion parameters, including Deep Speed Infinity, Fully Sharded Data-Parallel and more!
https://devblog.pytorchlightning.ai/announcing-lightning-1-4-8cd20482aee9
[1.4.0] - 2021-07-27
Added
- Added
extract_batch_sizeutility and corresponding tests to extract batch dimension from multiple batch types (#8357) - Added support for named parameter groups in
LearningRateMonitor(#7987) - Added
dataclasssupport forpytorch_lightning.utilities.apply_to_collection(#7935) - Added support to
LightningModule.to_torchscriptfor saving to custom filesystems withfsspec(#7617) - Added
KubeflowEnvironmentfor use with thePyTorchJoboperator in Kubeflow - Added LightningCLI support for config files on object stores (#7521)
- Added
ModelPruning(prune_on_train_epoch_end=True|False)to choose when to apply pruning (#7704) - Added support for checkpointing based on a provided time interval during training (#7515)
- Progress tracking
- Added support for passing a
LightningDataModulepositionally as the second argument totrainer.{validate,test,predict}(#7431) - Added argument
trainer.predict(ckpt_path)(#7430) - Added
clip_grad_by_valuesupport for TPUs (#7025) - Added support for passing any class to
is_overridden(#7918) - Added
sub_dirparameter toTensorBoardLogger(#6195) - Added correct
dataloader_idxto batch transfer hooks (#6241) - Added
include_none=boolargument toapply_to_collection(#7769) - Added
apply_to_collectionsto apply a function to two zipped collections (#7769) - Added
ddp_fully_shardedsupport (#7487) - Added
should_rank_save_checkpointproperty to Training Plugins (#7684) - Added
log_grad_normhook toLightningModuleto customize the logging of gradient norms (#7873) - Added
save_config_filenameinit argument toLightningCLIto ease resolving name conflicts (#7741) - Added
save_config_overwriteinit argument toLightningCLIto ease overwriting existing config files (#8059) - Added reset dataloader hooks to Training Plugins and Accelerators (#7861)
- Added trainer stage hooks for Training Plugins and Accelerators (#7864)
- Added the
on_before_optimizer_stephook (#8048) - Added IPU Accelerator (#7867)
- Fault-tolerant training
- Added
{,load_}state_dicttoResultCollection(#7948) - Added
{,load_}state_dicttoLoops(#8197) - Set
Loop.restarting=Falseat the end of the first iteration (#8362) - Save the loops state with the checkpoint (opt-in) (#8362)
- Save a checkpoint to restore the state on exception (opt-in) (#8362)
- Added
state_dictandload_state_dictutilities forCombinedLoader+ utilities for dataloader (#8364)
- Added
- Added
rank_zero_onlytoLightningModule.logfunction (#7966) - Added
metric_attributetoLightningModule.logfunction (#7966) - Added a warning if
Trainer(log_every_n_steps)is a value too high for the training dataloader (#7734) - Added LightningCLI support for argument links applied on instantiation (#7895)
- Added LightningCLI support for configurable callbacks that should always be present (#7964)
- Added DeepSpeed Infinity Support, and updated to DeepSpeed 0.4.0 (#7234)
- Added support for
torch.nn.UninitializedParameterinModelSummary(#7642) - Added support
LightningModule.save_hyperparameterswhenLightningModuleis a dataclass (#7992) - Added support for overriding
optimizer_zero_gradandoptimizer_stepwhen using accumulate_grad_batches (#7980) - Added
loggerboolean flag tosave_hyperparameters(#7960) - Added support for calling scripts using the module syntax (
python -m package.script) (#8073) - Added support for optimizers and learning rate schedulers to
LightningCLI(#8093) - Added XLA Profiler (#8014)
- Added
PrecisionPlugin.{pre,post}_backward(#8328) - Added
on_load_checkpointandon_save_checkpointhooks to thePrecisionPluginbase class (#7831) - Added
max_depthparameter inModelSummary(#8062) - Added
XLAStatsMonitorcallback (#8235) - Added
restorefunction andrestartingattribute to baseLoop(#8247) - Added
FastForwardSamplerandCaptureIterableDataset(#8307) - Added support for
save_hyperparametersinLightningDataModule(#3792) - Added the
ModelCheckpoint(save_on_train_epoch_end)to choose when to run the saving logic (#8389) - Added
LSFEnvironmentfor distributed training with the LSF resource managerjsrun(#5102) - Added support for
accelerator='cpu'|'gpu'|'tpu'|'ipu'|'auto'(#7808) - Added
tpu_spawn_debugto plugin registry (#7933) - Enabled traditional/manual launching of DDP processes through
LOCAL_RANKandNODE_RANKenvironment variable assignments (#7480) - Added
quantize_on_fit_endargument toQuantizationAwareTraining(#8464) - Added experimental support for loop specialization (#8226)
- Added support for
devicesflag to Trainer (#8440) - Added private
prevent_trainer_and_dataloaders_deepcopycontext manager on theLightningModule(#8472) - Added support for providing callables to the Lightning CLI instead of types (#8400)
Changed
- Decoupled device parsing logic from Accelerator connector to Trainer (#8180)
- Changed the
Trainer'scheckpoint_callbackargument to allow only boolean values (#7539) - Log epoch metrics before the
on_evaluation_endhook (#7272) - Explicitly disallow calling
self.log(on_epoch=False)during epoch-only or single-call hooks (#7874) - Changed these
Trainermethods to be protected:call_setup_hook,call_configure_sharded_model,pre_dispatch,dispatch,post_dispatch,call_teardown_hook,run_train,run_sanity_check,run_evaluate,run_evaluation,run_predict,track_output_for_epoch_end - Changed
metrics_to_scalarsto work with any collection or value (#7888) - Changed
clip_grad_normto usetorch.nn.utils.clip_grad_norm_(#7025) - Validation is now always run inside the training epoch scope (#7357)
ModelCheckpointnow runs at the end of the training epoch by default (#8389)EarlyStoppingnow runs at the end of the training epoch by default (#8286)- Refactored Loops
- Moved attributes
global_step,current_epoch,max/min_steps,max/min_epochs,batch_idx, andtotal_batch_idxto TrainLoop (#7437) - Refactored result handling in training loop (#7506)
- Moved attributes
hiddensandsplit_idxto TrainLoop (#7507) - Refactored the logic around manual and automatic optimization inside the optimizer loop (#7526)
- Simplified "should run validation" logic (#7682)
- Simplified logic for updating the learning rate for schedulers (#7682)
- Removed the
on_epochguard from the "should stop" validation check (#7701) - Refactored internal loop interface; added new classes
FitLoop,TrainingEpochLoop,TrainingBatchLoop(#7871, #8077) - Removed
pytorch_lightning/trainer/training_loop.py(#7985) - Refactored evaluation loop interface; added new classes
DataLoaderLoop,EvaluationLoop,EvaluationEpochLoop(#7990, #8077) - Removed
pytorch_lightning/trainer/evaluation_loop.py(#8056) - Restricted public access to several internal functions (#8024)
- Refactored trainer
_run_*functions and separate evaluation loops (#8065) - Refactored prediction loop interface; added new classes
PredictionLoop,PredictionEpochLoop(#7700, #8077) - Removed
pytorch_lightning/trainer/predict_loop.py(#8094) - Moved result teardown to the loops (#8245)
- Improve
LoopAPI to better handle childrenstate_dictandprogress(#8334)
- Moved attributes
- Refactored logging
- Renamed and moved
core/step_result.pytotrainer/connectors/logger_connector/result.py(#7736) - Dramatically simplify the
LoggerConnector(#7882) trainer.{logged,progress_bar,callback}_metricsare now updated on-demand (#7882)- Completely overhaul the
Resultobject in favor ofResultMetric(#7882) - Improve epoch-level reduction time and overall memory usage (#7882)
- Allow passing
self.log(batch_size=...)(#7891) - Each of the training loops now keeps its own results collection (#7891)
- Remove
EpochResultStoreandHookResultStorein favor ofResultCollection(#7909) - Remove
MetricsHolder(#7909)
- Renamed and moved
- Moved
ignore_scalar_return_in_dpwarning suppression to the DataParallelPlugin class (#7421) - Changed the behaviour when logging evaluation step metrics to no longer append
/epoch_*to the metric name (#7351) - Raised
ValueErrorwhen aNonevalue isself.log-ed (#7771) - Changed
resolve_training_type_pluginsto allow settingnum_nodesandsync_batchnormfromTrainersetting (#7026) - Default
seed_everything(workers=True)in theLightningCLI(#7504) - Changed
model.state_dict()inCheckpointConnectorto allowtraining_type_pluginto customize the model'sstate_dict()(#7474) MLflowLoggernow uses the env variableMLFLOW_TRACKING_URIas default tracking URI (#7457)- Changed
Trainerarg and functionality fromreload_dataloaders_every_epochtoreload_dataloaders_every_n_epochs(#5043) - Changed
WandbLogger(log_model={True/'all'})to log models as artifacts (#6231) - MLFlowLogger now accepts
run_nameas an constructor argument (#7622) - Changed
teardown()inAcceleratorto allowtraining_type_pluginto customizeteardownlogic (#7579) Trainer.fitnow raises an error when using manual optimization with unsupp...
Standard weekly patch release
[1.3.8] - 2021-07-01
Fixed
- Fixed a sync deadlock when checkpointing a
LightningModulethat uses a torchmetrics 0.4Metric(#8218) - Fixed compatibility TorchMetrics v0.4 (#8206)
- Added torchelastic check when sanitizing GPUs (#8095)
- Fixed a DDP info message that was never shown (#8111)
- Fixed metrics deprecation message at module import level (#8163)
- Fixed a bug where an infinite recursion would be triggered when using the
BaseFinetuningcallback on a model that contains aModuleDict(#8170) - Added a mechanism to detect
deadlockforDDPwhen only 1 process trigger anException. The mechanism willkill the processeswhen it happens (#8167) - Fixed NCCL error when selecting non-consecutive device ids (#8165)
- Fixed SWA to also work with
IterableDataset(#8172)
Contributors
@GabrielePicco @SeanNaren @ethanwharris @carmocca @tchaton @justusschock
Hotfix Patch Release
[1.3.7post0] - 2021-06-23
Fixed
- Fixed backward compatibility of moved functions
rank_zero_warnandrank_zero_deprecation(#8085)
Contributors
Standard weekly patch release
[1.3.7] - 2021-06-22
Fixed
- Fixed a bug where skipping an optimizer while using amp causes amp to trigger an assertion error (#7975)
This conversation was marked as resolved by carmocca - Fixed deprecation messages not showing due to incorrect stacklevel (#8002, #8005)
- Fixed setting a
DistributedSamplerwhen using a distributed plugin in a custom accelerator (#7814) - Improved
PyTorchProfilerchrome traces names (#8009) - Fixed moving the best score to device in
EarlyStoppingcallback for TPU devices (#7959)