Releases: Lightning-AI/pytorch-lightning
Standard weekly patch release
[1.3.6] - 2021-06-15
Fixed
- Fixed logs overwriting issue for remote filesystems (#7889)
- Fixed
DataModule.prepare_datacould only be called on the global rank 0 process (#7945) - Fixed setting
worker_init_fnto seed dataloaders correctly when using DDP (#7942) - Fixed
BaseFinetuningcallback to properly handle parent modules w/ parameters (#7931)
Contributors
@awaelchli @Borda @kaushikb11 @Queuecumber @SeanNaren @senarvi @speediedan
Standard weekly patch release
[1.3.5] - 2021-06-08
Added
- Added warning to Training Step output (#7779)
Fixed
- Fixed LearningRateMonitor + BackboneFinetuning (#7835)
- Minor improvements to
apply_to_collectionand type signature oflog_dict(#7851) - Fixed docker versions (#7834)
- Fixed sharded training check for fp16 precision (#7825)
- Fixed support for torch Module type hints in LightningCLI (#7807)
Changed
- Move
training_outputvalidation to aftertrain_step_end(#7868)
Contributors
@Borda, @justusschock, @kandluis, @mauvilsa, @shuyingsunshine21, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
Standard weekly patch release
[1.3.3] - 2021-05-26
Changed
- Changed calling of
untoggle_optimizer(opt_idx)out of the closure function (#7563)
Fixed
- Fixed
ProgressBarpickling after callingtrainer.predict(#7608) - Fixed broadcasting in multi-node, multi-gpu DDP using torch 1.7 (#7592)
- Fixed dataloaders are not reset when tuning the model (#7566)
- Fixed print errors in
ProgressBarwhentrainer.fitis not called (#7674) - Fixed global step update when the epoch is skipped (#7677)
- Fixed training loop total batch counter when accumulate grad batches was enabled (#7692)
Contributors
@carmocca @kaushikb11 @ryanking13 @Lucklyric @ajtritt @yifuwang
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.3.2] - 2021-05-18
Changed
DataModules now avoid duplicate{setup,teardown,prepare_data}calls for the same stage (#7238)
Fixed
- Fixed parsing of multiple training dataloaders (#7433)
- Fixed recursive passing of
wrong_typekeyword argument inpytorch_lightning.utilities.apply_to_collection(#7433) - Fixed setting correct
DistribTypeforddp_cpu(spawn) backend (#7492) - Fixed incorrect number of calls to LR scheduler when
check_val_every_n_epoch > 1(#7032)
Contributors
@alanhdu @carmocca @justusschock @tkng
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.3.1] - 2021-05-11
Fixed
- Fixed DeepSpeed with IterableDatasets (#7362)
- Fixed
Trainer.current_epochnot getting restored after tuning (#7434) - Fixed local rank displayed in console log (#7395)
Contributors
@akihironitta @awaelchli @leezu
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Lightning CLI, PyTorch Profiler, Improved Early Stopping
Today we are excited to announce Lightning 1.3, containing highly anticipated new features including a new Lightning CLI, improved TPU support, integrations such as PyTorch profiler, new early stopping strategies, predict and validate trainer routines, and more.
[1.3.0] - 2021-05-06
Added
- Added support for the
EarlyStoppingcallback to run at the end of the training epoch (#6944) - Added synchronization points before and after
setuphooks are run (#7202) - Added a
teardownhook toClusterEnvironment(#6942) - Added utils for metrics to scalar conversions (#7180)
- Added utils for NaN/Inf detection for gradients and parameters (#6834)
- Added more explicit exception message when trying to execute
trainer.test()ortrainer.validate()withfast_dev_run=True(#6667) - Added
LightningCLIclass to provide simple reproducibility with minimum boilerplate training CLI (#4492, #6862, #7156, #7299) - Added
gradient_clip_algorithmargument to Trainer for gradient clipping by value (#6123). - Added a way to print to terminal without breaking up the progress bar (#5470)
- Added support to checkpoint after training steps in
ModelCheckpointcallback (#6146) - Added
TrainerStatus.{INITIALIZING,RUNNING,FINISHED,INTERRUPTED}(#7173) - Added
Trainer.validate()method to perform one evaluation epoch over the validation set (#4948) - Added
LightningEnvironmentfor Lightning-specific DDP (#5915) - Added
teardown()hook to LightningDataModule (#4673) - Added
auto_insert_metric_nameparameter toModelCheckpoint(#6277) - Added arg to
self.logthat enables users to give custom names when dealing with multiple dataloaders (#6274) - Added
teardownmethod toBaseProfilerto enable subclasses defining post-profiling steps outside of__del__(#6370) - Added
setupmethod toBaseProfilerto enable subclasses defining pre-profiling steps for every process (#6633) - Added no return warning to predict (#6139)
- Added
Trainer.predictconfig validation (#6543) - Added
AbstractProfilerinterface (#6621) - Added support for including module names for forward in the autograd trace of
PyTorchProfiler(#6349) - Added support for the PyTorch 1.8.1 autograd profiler (#6618)
- Added
outputsparameter to callback'son_validation_epoch_end&on_test_epoch_endhooks (#6120) - Added
configure_sharded_modelhook (#6679) - Added support for
precision=64, enabling training with double precision (#6595) - Added support for DDP communication hooks (#6736)
- Added
artifact_locationargument toMLFlowLoggerwhich will be passed to theMlflowClient.create_experimentcall (#6677) - Added
modelparameter to precision plugins'clip_gradientssignature (#6764, #7231) - Added
is_last_batchattribute toTrainer(#6825) - Added
LightningModule.lr_schedulers()for manual optimization (#6567) - Added
MpModelWrapperin TPU Spawn (#7045) - Added
max_timeTrainer argument to limit training time (#6823) - Added
on_predict_{batch,epoch}_{start,end}hooks (#7141) - Added new
EarlyStoppingparametersstopping_thresholdanddivergence_threshold(#6868) - Added
debugflag to TPU Training Plugins (PT_XLA_DEBUG) (#7219) - Added new
UnrepeatedDistributedSamplerandIndexBatchSamplerWrapperfor tracking distributed predictions (#7215) - Added
trainer.predict(return_predictions=None|False|True)(#7215) - Added
BasePredictionWritercallback to implement prediction saving (#7127) - Added
trainer.tune(scale_batch_size_kwargs, lr_find_kwargs)arguments to configure the tuning algorithms (#7258) - Added
tpu_distributedcheck for TPU Spawn barrier (#7241) - Added device updates to TPU Spawn for Pod training (#7243)
- Added warning when missing
Callbackand usingresume_from_checkpoint(#7254) - DeepSpeed single file saving (#6900)
- Added Training type Plugins Registry (#6982, #7063, #7214, #7224)
- Add
ignoreparam tosave_hyperparameters(#6056)
Changed
- Changed
LightningModule.truncated_bptt_stepsto be property (#7323) - Changed
EarlyStoppingcallback from by default runningEarlyStopping.on_validation_endif only training is run. Setcheck_on_train_epoch_endto run the callback at the end of the train epoch instead of at the end of the validation epoch (#7069) - Renamed
pytorch_lightning.callbacks.swatopytorch_lightning.callbacks.stochastic_weight_avg(#6259) - Refactor
RunningStageandTrainerStateusage (#4945, #7173)- Added
RunningStage.SANITY_CHECKING - Added
TrainerFn.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING} - Changed
trainer.evaluatingto returnTrueif validating or testing
- Added
- Changed
setup()andteardown()stage argument to take any of{fit,validate,test,predict}(#6386) - Changed profilers to save separate report files per state and rank (#6621)
- The trainer no longer tries to save a checkpoint on exception or run callback's
on_train_endfunctions (#6864) - Changed
PyTorchProfilerto usetorch.autograd.profiler.record_functionto record functions (#6349) - Disabled
lr_scheduler.step()in manual optimization (#6825) - Changed warnings and recommendations for dataloaders in
ddp_spawn(#6762) pl.seed_everythingwill now also set the seed on theDistributedSampler(#7024)- Changed default setting for communication of multi-node training using
DDPShardedPlugin(#6937) trainer.tune()now returns the tuning result (#7258)LightningModule.from_datasets()now acceptsIterableDatasetinstances as training datasets. (#7503)- Changed
resume_from_checkpointwarning to an error when the checkpoint file does not exist (#7075) - Automatically set
sync_batchnormfortraining_type_plugin(#6536) - Allowed training type plugin to delay optimizer creation (#6331)
- Removed ModelSummary validation from train loop on_trainer_init (#6610)
- Moved
save_functionto accelerator (#6689) - Updated DeepSpeed ZeRO (#6546, #6752, #6142, #6321)
- Improved verbose logging for
EarlyStoppingcallback (#6811) - Run ddp_spawn dataloader checks on Windows (#6930)
- Updated mlflow with using
resolve_tags(#6746) - Moved
save_hyperparametersto its own function (#7119) - Replaced
_DataModuleWrapperwith__new__(#7289) - Reset
current_fxproperties on lightning module in teardown (#7247) - Auto-set
DataLoader.worker_init_fnwithseed_everything(#6960) - Remove
model.trainercall inside of dataloading mixin (#7317) - Split profilers module (#6261)
- Ensure accelerator is valid if running interactively (#5970)
- Disabled batch transfer in DP mode (#6098)
Deprecated
- Deprecated
outputsin bothLightningModule.on_train_epoch_endandCallback.on_train_epoch_endhooks (#7339) - Deprecated
Trainer.truncated_bptt_stepsin favor ofLightningModule.truncated_bptt_steps(#7323) - Deprecated
outputsin bothLightningModule.on_train_epoch_endandCallback.on_train_epoch_endhooks (#7339) - Deprecated
LightningModule.grad_normin favor ofpytorch_lightning.utilities.grads.grad_norm(#7292) - Deprecated the
save_functionproperty from theModelCheckpointcallback (#7201) - Deprecated
LightningModule.write_predictionsandLightningModule.write_predictions_dict(#7066) - Deprecated
TrainerLoggingMixinin favor of a separate utilities module for metric handling (#7180) - Deprecated
TrainerTrainingTricksMixinin favor of a separate utilities module for NaN/Inf detection for gradients and parameters (#6834) periodhas been deprecated in favor ofevery_n_val_epochsin theModelCheckpointcallback (#6146)- Deprecated
trainer.running_sanity_checkin favor oftrainer.sanity_checking(#4945) - Deprecated
Profiler(output_filename)in favor ofdirpathandfilename(#6621) - Deprecated
PytorchProfiler(profiled_functions)in favor ofrecord_functions(#6349) - Deprecated
@auto_move_datain favor oftrainer.predict(#6993) - Deprecated
Callback.on_load_checkpoint(checkpoint)in favor ofCallback.on_load_checkpoint(trainer, pl_module, checkpoint)(#7253) - Deprecated metrics in favor of
torchmetrics(#6505, #6530, #6540, #6547, #6515, #6572, #6573, #6584, #6636, #6637, #6649, #6659, #7131) - Deprecated the
LightningModule.datamodulegetter and setter methods; access them throughTrainer.datamoduleinstead (#7168) - Deprecated the use of
Trainer(gpus="i")(string) for selecting the i-th GPU; from v1.5 this will set the number of GPUs instead of the index (#6388)
Removed
- Removed the
exp_save_pathproperty from theLightningModule(#7266) - Removed training loop explicitly calling
EarlyStopping.on_validation_endif no validation is run (#7069) - Removed
automatic_optimizationas a property from the training loop in favor ofLightningModule.automatic_optimization(#7130) - Removed evaluation loop legacy returns for
*_epoch_endhooks (#6973) - Removed support for passing a bool value to
profilerargument of Trainer (#6164) - Removed no return warning from val/test step (#6139)
- Removed passing a
ModelCheckpointinstance toTrainer(checkpoint_callback)(#6166) - Removed deprecated Trainer argument
enable_pl_optimizerandautomatic_optimization(#6163) - Removed deprecated metrics (#6161)
- from
pytorch_lightning.metrics.functional.classificationremovedto_onehot,to_categorical,get_num_classes,roc,multiclass_roc,average_precision,precision_recall_curve,multiclass_precision_recall_curve - from
pytorch_lightning.metrics.functional.reductionremovedreduce,class_reduce
- from
- Removed deprecated
ModelCheckpointargumentsprefix,mode="auto"(#6162) - Removed
mode='auto'fromEarlyStopping(#6167) - Removed
epochandstepargume...
Quick patch release
Fixing missing packaging package in dependencies, which was affecting the only installation to a very blank system.
Standard weekly patch release
Standard weekly patch release
[1.2.8] - 2021-04-14
Added
- Added TPUSpawn + IterableDataset error message (#6875)
Fixed
- Fixed process rank not being available right away after
Trainerinstantiation (#6941) - Fixed
sync_distfor tpus (#6950) - Fixed
AttributeError forrequire_backward_grad_sync` when running manual optimization with sharded plugin (#6915) - Fixed
--gpusdefault for parser returned byTrainer.add_argparse_args(#6898) - Fixed TPU Spawn all gather (#6896)
- Fixed
EarlyStoppinglogic whenmin_epochsormin_stepsrequirement is not met (#6705) - Fixed csv extension check (#6436)
- Fixed checkpoint issue when using Horovod distributed backend (#6958)
- Fixed tensorboard exception raising (#6901)
- Fixed setting the eval/train flag correctly on accelerator model (#6983)
- Fixed DDP_SPAWN compatibility with bug_report_model.py (#6892)
- Fixed bug where
BaseFinetuning.flatten_modules()was duplicating leaf node parameters (#6879) - Set better defaults for
rank_zero_only.rankwhen training is launched with SLURM and torchelastic:
Contributors
@ananthsub @awaelchli @ethanwharris @justusschock @kandluis @kaushikb11 @liob @SeanNaren @skmatz
If we forgot someone due to not matching commit email with GitHub account, let us know :]