Releases · Lightning-AI/pytorch-lightning

13 Jul 00:26

carmocca

1.6.5

ff53616

PyTorch Lightning 1.6.5: Standard patch release

[1.6.5] - 2022-07-13

Fixed

Fixed estimated_stepping_batches requiring distributed comms in configure_optimizers for the DeepSpeedStrategy (#13350)
Fixed bug with Python version check that prevented use with development versions of Python (#13420)
The loops now call .set_epoch() also on batch samplers if the dataloader has one wrapped in a distributed sampler (#13396)
Fixed the restoration of log step during restart (#13467)

Contributors

@adamjstewart @akihironitta @awaelchli @Borda @martinosorb @rohitgr7 @SeanNaren

Contributors

awaelchli, Borda, and 5 other contributors

Assets 4

01 Jun 14:32

carmocca

1.6.4

74b1317

PyTorch Lightning 1.6.4: Standard patch release

[1.6.4] - 2022-06-01

Added

Added all DDP params to be exposed through hpu parallel strategy (#13067)

Changed

Keep torch.backends.cudnn.benchmark=False by default (unlike in v1.6.{0-4}) after speed and memory problems depending on the data used. Please consider tuning Trainer(benchmark) manually. (#13154)
Prevent modification of torch.backends.cudnn.benchmark when Trainer(benchmark=...) is not set (#13154)

Fixed

Fixed an issue causing zero-division error for empty dataloaders (#12885)
Fixed mismatching default values for the types of some arguments in the DeepSpeed and Fully-Sharded strategies which made the CLI unable to use them (#12989)
Avoid redundant callback restore warning while tuning (#13026)
Fixed Trainer(precision=64) during evaluation which now uses the wrapped precision module (#12983)
Fixed an issue to use wrapped LightningModule for evaluation during trainer.fit for BaguaStrategy (#12983)
Fixed an issue wrt unnecessary usage of habana mixed precision package for fp32 types (#13028)
Fixed the number of references of LightningModule so it can be deleted (#12897)
Fixed materialize_module setting a module's child recursively (#12870)
Fixed issue where the CLI could not pass a Profiler to the Trainer (#13084)
Fixed torchelastic detection with non-distributed installations (#13142)
Fixed logging's step values when multiple dataloaders are used during evaluation (#12184)
Fixed epoch logging on train epoch end (#13025)
Fixed DDPStrategy and DDPSpawnStrategy to initialize optimizers only after moving the module to the device (#11952)

Contributors

@akihironitta @ananthsub @ar90n @awaelchli @Borda @carmocca @dependabot @jerome-habana @mads-oestergaard @otaj @rohitgr7

Contributors

ar90n, ananthsub, and 9 other contributors

Assets 4

03 May 20:36

carmocca

1.6.3

a64e1df

PyTorch Lightning 1.6.3: Standard patch release

[1.6.3] - 2022-05-03

Fixed

Use only a single instance of rich.console.Console throughout codebase (#12886)
Fixed an issue to ensure all the checkpoint states are saved in a common filepath with DeepspeedStrategy (#12887)
Fixed trainer.logger deprecation message (#12671)
Fixed an issue where sharded grad scaler is passed in when using BF16 with the ShardedStrategy (#12915)
Fixed an issue wrt recursive invocation of DDP configuration in hpu parallel plugin (#12912)
Fixed printing of ragged dictionaries in Trainer.validate and Trainer.test (#12857)
Fixed threading support for legacy loading of checkpoints (#12814)
Fixed pickling of KFoldLoop (#12441)
Stopped optimizer_zero_grad from being called after IPU execution (#12913)
Fixed fuse_modules to be qat-aware for torch>=1.11 (#12891)
Enforced eval shuffle warning only for default samplers in DataLoader (#12653)
Enable mixed precision in DDPFullyShardedStrategy when precision=16 (#12965)
Fixed TQDMProgressBar reset and update to show correct time estimation (#12889)
Fixed fit loop restart logic to enable resume using the checkpoint (#12821)

Contributors

@akihironitta @carmocca @hmellor @jerome-habana @kaushikb11 @krshrimali @mauvilsa @niberger @ORippler @otaj @rohitgr7 @SeanNaren

Contributors

mauvilsa, otaj, and 10 other contributors

Assets 4

27 Apr 17:04

carmocca

1.6.2

305c218

PyTorch Lightning 1.6.2: Standard patch release

[1.6.2] - 2022-04-27

Fixed

Fixed ImportError when torch.distributed is not available. (#12794)
When using custom DataLoaders in LightningDataModule, multiple inheritance is resolved properly (#12716)
Fixed encoding issues on terminals that do not support unicode characters (#12828)
Fixed support for ModelCheckpoint monitors with dots (#12783)

Contributors

@akihironitta @alvitawa @awaelchli @Borda @carmocca @code-review-doctor @ethanfurman @HenryLau0220 @krshrimali @otaj

Contributors

awaelchli, Borda, and 8 other contributors

Assets 4

13 Apr 18:30

rohitgr7

1.6.1

0b0f1ec

PyTorch Lightning 1.6.1: Standard weekly patch release

[1.6.1] - 2022-04-13

Changed

Support strategy argument being case insensitive (#12528)

Fixed

Run main progress bar updates independent of val progress bar updates in TQDMProgressBar (#12563)
Avoid calling average_parameters multiple times per optimizer step (#12452)
Properly pass some Logger's parent's arguments to super().__init__() (#12609)
Fixed an issue where incorrect type warnings appear when the overridden LightningLite.run method accepts user-defined arguments (#12629)
Fixed rank_zero_only decorator in LSF environments (#12587)
Don't raise a warning when nn.Module is not saved under hparams (#12669)
Raise MisconfigurationException when the accelerator is available but the user passes invalid ([]/0/"0") values to the devices flag (#12708)
Support auto_select_gpus with the accelerator and devices API (#12608)

Contributors

@akihironitta @awaelchli @Borda @carmocca @kaushikb11 @krshrimali @mauvilsa @otaj @pre-commit-ci @rohitgr7 @semaphore-egg @tkonopka @wayi1

If we forgot someone due to not matching the commit email with the GitHub account, let us know :]

Contributors

awaelchli, mauvilsa, and 11 other contributors

Assets 4

29 Mar 19:35

carmocca

1.6.0

44e3edb

PyTorch Lightning 1.6: Support Intel's Habana Accelerator, New efficient DDP strategy (Bagua), Manual Fault-tolerance, Stability and Reliability.

The core team is excited to announce the PyTorch Lightning 1.6 release ⚡

Highlights
Backward Incompatible Changes
Full Changelog
Contributors

Highlights

PyTorch Lightning 1.6 is the work of 99 contributors who have worked on features, bug-fixes, and documentation for a total of over 750 commits since 1.5. This is our most active release yet. Here are some highlights:

Introducing Intel's Habana Accelerator

Lightning 1.6 now supports the Habana® framework, which includes Gaudi® AI training processors. Their heterogeneous architecture includes a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries and a configurable Matrix Math engine.

You can leverage the Habana hardware to accelerate your Deep Learning training workloads simply by passing:

trainer = pl.Trainer(accelerator="hpu")

# single Gaudi training
trainer = pl.Trainer(accelerator="hpu", devices=1)

# distributed training with 8 Gaudi
trainer = pl.Trainer(accelerator="hpu", devices=8)

The Bagua Strategy

The Bagua Strategy is a deep learning acceleration framework that supports multiple, advanced distributed training algorithms with state-of-the-art system relaxation techniques. Enabling Bagua, which can be considerably faster than vanilla PyTorch DDP, is as simple as:

trainer = pl.Trainer(strategy="bagua")

# or to choose a custom algorithm
trainer = pl.Trainer(strategy=BaguaStrategy(algorithm="gradient_allreduce")  # default

Towards stable Accelerator, Strategy, and Plugin APIs

The Accelerator, Strategy, and Plugin APIs are a core part of PyTorch Lightning. They're where all the distributed boilerplate lives, and we're constantly working to improve both them and the overall PyTorch Lightning platform experience.

In this release, we've made some large changes to achieve that goal. Not to worry, though! The only users affected by these changes are those who use custom implementations of Accelerator and Strategy (TrainingTypePlugin) as well as certain Plugins. In particular, we want to highlight the following changes:

All TrainingTypePlugins have been renamed to Strategy (#11120). Strategy is a more appropriate name because it encompasses more than simply training communcation. This change is now aligned with the changes we implemented in 1.5, which introduced the new strategy and devices flags to the Trainer.
```
# Before
from pytorch_lightning.plugins import DDPPlugin

# New
from pytorch_lightning.strategies import DDPStrategy
```
The Accelerator and PrecisionPlugin have moved into Strategy. All strategies now take an optional parameter accelerator and precision_plugin (#11022, #10570).
Custom Accelerator implementations must now implement two new abstract methods: is_available() (#11797) and auto_device_count() (#10222). The latter determines how many devices get used by default when specifying Trainer(accelerator=..., devices="auto").
We redesigned the process creation for spawn-based strategies such as DDPSpawnStrategy and TPUSpawnStrategy (#10896). All spawn-based strategies now spawn processes immediately upon calling Trainer.{fit,validate,test,predict}, which means the hooks/callbacks prepare_data, setup, configure_sharded_model and teardown all run under an initialized process group. These changes align the spawn-based strategies with their non-spawn counterparts (such as DDPStrategy).

We've also exposed the process group backend for use. For example, you can now easily enable fairring like this:

# Explicitly specify the process group backend if you choose to
ddp = pl.strategies.DDPStrategy(process_group_backend="fairring")
trainer = Trainer(strategy=ddp, accelerator="gpu", devices=8)

In a similar fashion, if installing torch>=1.11, you can enable DDP static graph to apply special runtime optimizations:

trainer = Trainer(devices=4, strategy=DDPStrategy(static_graph=True))

`LightningCLI` improvements

In the previous release, we added shorthand notation support for registered components. In this release, we added a flag to automatically register all available components:

from pytorch_lightning.utilities.cli import LightningCLI

LightningCLI(auto_registry=True)

We have also added support for the ReduceLROnPlateau scheduler with shorthand notation:

$ python script.py fit --optimizer=Adam --lr_scheduler=ReduceLROnPlateau --lr_scheduler.monitor=metric_to_track

If you need to customize the learning rate scheduler configuration, you can do so by overriding:

class MyLightningCLI(LightningCLI):
    @staticmethod
    def configure_optimizers(lightning_module, optimizer, lr_scheduler=None):
        return {"optimizer": optimizer, "lr_scheduler": {"scheduler": lr_scheduler, ...}}

Finally, loggers are also now configurable with shorthand:

$ python script.py fit --trainer.logger=WandbLogger --trainer.logger.name="my_lightning_run"

Control SLURM's re-queueing

We've added the ability to turn the automatic resubmission on or off when a job gets interrupted by the SLURM controller (via signal handling). Users who prefer to let their code handle the resubmission (for example, when submitit is used) can now pass:

from pytorch_lightning.plugins.environments import SLURMEnvironment

trainer = pl.Trainer(plugins=SLURMEnvironment(auto_requeue=False))

Fault-tolerance improvements

The Fault-tolerance training under manual optimization now tracks optimization progress. We also changed the graceful exit signal from SIGUSR1 to SIGTERM for better support inside cloud instances.
An additional feature we're excited to announce is support for consecutive trainer.fit() calls.

trainer = pl.Trainer(max_epochs=2)
trainer.fit(model)

# now, run 2 more epochs
trainer.fit_loop.max_epochs = 4
trainer.fit(model)

Loop customization improvements

The Loop's state is now included as part of the checkpoints saved by the library. This enables finer restoration of custom loops.

We've also made it easier to replace Lightning's loops with your own. For example:

class MyCustomLoop(pl.loops.TrainingEpochLoop):
    ...

trainer = pl.Trainer(...)
trainer.fit_loop.replace(epoch_loop=MyCustomLoop)
# Trainer runs the fit loop with your new epoch loop!
trainer.fit(model)

Data-Loading improvements

In previous versions, Lightning required that the DataLoader instance set its input arguments as instance attributes. This meant that custom DataLoaders also had this hidden requirement. In this release, we do this automatically for the user, easing the passing of custom loaders:

class MyDataLoader(torch.utils.data.DataLoader):
    def __init__(self, a=123, *args, **kwargs):
-       # this was required before
-       self.a = a
        super().__init__(*args, **kwargs)

trainer.fit(model, train_dataloader=MyDataLoader())

As of this release, Lightning no longer pre-fetches 1 extra batch if it doesn't need to. Previously, doing so would conflict with the internal pre-fetching done by optimized data loaders such as FFCV's. You can now define your own pre-fetching value like this:

class MyCustomLoop(pl.loops.FitLoop):
    @property
    def prefetch_batches(self):
        return 7  # lucky number 7

trainer = pl.Trainer(...)
trainer.fit_loop = MyCustomLoop(min_epochs=trainer.min_epochs, max_epochs=trainer.max_epochs)

New Hooks

`LightningModule.lr_scheduler_step`

Lightning now allows the use of custom learning rate schedulers that aren't natively available in PyTorch. A great example of this is Timm Schedulers.

When using custom learning rate schedulers relying on an API other than PyTorch's, you can now define the LightningModule.lr_scheduler_step with your desired logic.

from timm.scheduler import TanhLRScheduler


class MyLightningModule(pl.LightningModule):
    def configure_optimizers(self):...

Contributors

sethvargo, abhinavarora, and 96 other contributors

Assets 4

09 Feb 20:42

rohitgr7

1.5.10

9ebdc52

Standard weekly patch release

[1.5.10] - 2022-02-08

Fixed

Fixed an issue to avoid validation loop run on restart (#11552)
The Rich progress bar now correctly shows the on_epoch logged values on train epoch end (#11689)
Fixed an issue to make the step argument in WandbLogger.log_image work (#11716)
Fixed restore_optimizers for mapping states (#11757)
With DPStrategy, the batch is not explicitly moved to the device (#11780)
Fixed an issue to avoid val bar disappear after trainer.validate() (#11700)
Fixed supporting remote filesystems with Trainer.weights_save_path for fault-tolerant training (#11776)
Fixed check for available modules (#11526)
Fixed bug where the path for "last" checkpoints was not getting saved correctly which caused newer runs to not remove the previous "last" checkpoint (#11481)
Fixed bug where the path for best checkpoints was not getting saved correctly when no metric was monitored which caused newer runs to not use the best checkpoint (#11481)

Contributors

@ananthsub @Borda @circlecrystal @NathanGodey @nithinraok @rohitgr7

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

ananthsub, circlecrystal, and 4 other contributors

Assets 4

20 Jan 19:48

rohitgr7

1.5.9

ab1c2ff

Standard weekly patch release

[1.5.9] - 2022-01-20

Fixed

Pinned sphinx-autodoc-typehints with <v1.15 (#11400)
Skipped testing with PyTorch 1.7 and Python 3.9 on Ubuntu (#11217)
Fixed type promotion when tensors of higher category than float are logged (#11401)
Fixed the format of the configuration saved automatically by the CLI's SaveConfigCallback (#11532)

Changed

Changed LSFEnvironment to use LSB_DJOB_RANKFILE environment variable instead of LSB_HOSTS for determining node rank and main address (#10825)
Disabled sampler replacement when using IterableDataset (#11507)

Contributors

@ajtritt @akihironitta @carmocca @rohitgr7

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

ajtritt, carmocca, and 2 other contributors

Assets 4

05 Jan 15:23

awaelchli

1.5.8

b707c67

Standard weekly patch release

[1.5.8] - 2022-01-05

Fixed

Fixed LightningCLI race condition while saving the config (#11199)
Fixed the default value used with log(reduce_fx=min|max) (#11310)
Fixed data fetcher selection (#11294)
Fixed a race condition that could result in incorrect (zero) values being observed in prediction writer callbacks (#11288)
Fixed dataloaders not getting reloaded the correct amount of times when setting reload_dataloaders_every_n_epochs and check_val_every_n_epoch (#10948)

Contributors

@adamviola @akihironitta @awaelchli @Borda @carmocca @edpizzi

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

awaelchli, Borda, and 4 other contributors

Assets 4

21 Dec 18:33

awaelchli

1.5.7

98fcf1e

Standard weekly patch release

[1.5.7] - 2021-12-21

Fixed

Fixed NeptuneLogger when using DDP (#11030)
Fixed a bug to disable logging hyperparameters in logger if there are no hparams (#11105)
Avoid the deprecated onnx.export(example_outputs=...) in torch 1.10 (#11116)
Fixed an issue when torch-scripting a LightningModule after training with Trainer(sync_batchnorm=True) (#11078)
Fixed an AttributeError occuring when using a CombinedLoader (multiple dataloaders) for prediction (#11111)
Fixed bug where Trainer(track_grad_norm=..., logger=False) would fail (#11114)
Fixed an incorrect warning being produced by the model summary when using bf16 precision on CPU (#11161)

Changed

DeepSpeed does not require lightning module zero 3 partitioning (#10655)
The ModelCheckpoint callback now saves and restores attributes best_k_models, kth_best_model_path, kth_value, and last_model_path (#10995)

Contributors

@awaelchli @borchero @carmocca @guyang3532 @kaushikb11 @ORippler @Raalsky @rohitgr7 @SeanNaren

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

Raalsky, awaelchli, and 7 other contributors

Assets 4

Releases: Lightning-AI/pytorch-lightning

PyTorch Lightning 1.6.5: Standard patch release

[1.6.5] - 2022-07-13

Fixed

Contributors

Contributors

Uh oh!

PyTorch Lightning 1.6.4: Standard patch release

[1.6.4] - 2022-06-01

Added

Changed

Fixed

Contributors

Contributors

Uh oh!

PyTorch Lightning 1.6.3: Standard patch release

[1.6.3] - 2022-05-03

Fixed

Contributors

Contributors

Uh oh!

PyTorch Lightning 1.6.2: Standard patch release

[1.6.2] - 2022-04-27

Fixed

Contributors

Contributors

Uh oh!

PyTorch Lightning 1.6.1: Standard weekly patch release

[1.6.1] - 2022-04-13

Changed

Fixed

Contributors

Contributors

Uh oh!

PyTorch Lightning 1.6: Support Intel's Habana Accelerator, New efficient DDP strategy (Bagua), Manual Fault-tolerance, Stability and Reliability.

Highlights

Introducing Intel's Habana Accelerator

The Bagua Strategy

Towards stable Accelerator, Strategy, and Plugin APIs

LightningCLI improvements

Control SLURM's re-queueing

Fault-tolerance improvements

Loop customization improvements

Data-Loading improvements

New Hooks

LightningModule.lr_scheduler_step

Contributors

Uh oh!

Standard weekly patch release

[1.5.10] - 2022-02-08

Fixed

Contributors

Contributors

Uh oh!

Standard weekly patch release

[1.5.9] - 2022-01-20

Fixed

Changed

Contributors

Contributors

Uh oh!

Standard weekly patch release

[1.5.8] - 2022-01-05

Fixed

Contributors

Contributors

Uh oh!

Standard weekly patch release

[1.5.7] - 2021-12-21

Fixed

Changed

Contributors

Contributors

Uh oh!

`LightningCLI` improvements

`LightningModule.lr_scheduler_step`