Skip to content

Releases: Lightning-AI/pytorch-lightning

PyTorch Lightning 1.7.3: Standard patch release

25 Aug 19:06

Choose a tag to compare

[1.7.3] - 2022-08-25

Fixed

  • Fixed an assertion error when using a ReduceOnPlateau scheduler with the Horovod strategy (#14215)
  • Fixed an AttributeError when accessing LightningModule.logger and the Trainer has multiple loggers (#14234)
  • Fixed wrong num padding for RichProgressBar (#14296)
  • Added back support for logging in the configure_gradient_clipping hook after unintended removal in v1.7.2 (#14298)
  • Fixed an issue to avoid the impact of sanity check on reload_dataloaders_every_n_epochs for validation (#13964)

Contributors

@awaelchli @Borda @carmocca @dependabot @kaushikb11 @otaj @rohitgr7

Dependency hotfix

22 Aug 15:57

Choose a tag to compare

[0.5.7] - 2022-08-22

Changed

  • Release LAI docs as stable (#14250)
  • Compatibility for Python 3.10

Fixed

  • Pinning starsessions to 1.x (#14333)
  • Parsed local package versions (#13933)

Contributors

@Borda, @hhsecond, @manskx

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Minor patch release

18 Aug 13:20

Choose a tag to compare

[0.5.6] - 2022-08-16

Fixed

  • Resolved a bug where the install command was not installing the latest version of an app/component by default (#14181)

Contributors

@manskx

If we forgot someone due to not matching commit email with GitHub account, let us know :]

PyTorch Lightning 1.7.2: Standard patch release

17 Aug 20:25
4fae327

Choose a tag to compare

[1.7.2] - 2022-08-17

Added

  • Added FullyShardedNativeNativeMixedPrecisionPlugin to handle precision for DDPFullyShardedNativeStrategy (#14092)
  • Added profiling to these hooks: on_before_batch_transfer, transfer_batch_to_device, on_after_batch_transfer, configure_gradient_clipping, clip_gradients (#14069)

Changed

  • Updated compatibility for LightningLite to run with the latest DeepSpeed 0.7.0 (13967)
  • Raised a MisconfigurationException if batch transfer hooks are overriden with IPUAccelerator (13961)
  • The default project name in WandbLogger is now "lightning_logs" (#14145)
  • The WandbLogger.name property no longer returns the name of the experiment, and instead returns the project's name (#14145)

Fixed

  • Fixed a bug that caused spurious AttributeError when multiple DataLoader classes are imported (#14117)
  • Fixed epoch-end logging results not being reset after the end of the epoch (#14061)
  • Fixed saving hyperparameters in a composition where the parent class is not a LightningModule or LightningDataModule (#14151)
  • Fixed epoch-end logging results not being reset after the end of the epoch (#14061)
  • Fixed the device placement when LightningModule.cuda() gets called without specifying a device index and the current cuda device was not 0 (#14128)
  • Avoided false positive warning about using sync_dist when using torchmetrics (#14143)
  • Avoid metadata.entry_points deprecation warning on Python 3.10 (#14052)
  • Avoid raising the sampler warning if num_replicas=1 (#14097)
  • Fixed resuming from a checkpoint when using Stochastic Weight Averaging (SWA) (#9938)
  • Avoided requiring the FairScale package to use precision with the fsdp native strategy (#14092)
  • Fixed an issue in which the default name for a run in WandbLogger would be set to the project name instead of a randomly generated string (#14145)
  • Fixed not preserving set attributes on DataLoader and BatchSampler when instantiated inside *_dataloader hooks (#14212)

Contributors

@adamreeve @akihironitta @awaelchli @Borda @carmocca @dependabot @otaj @rohitgr7

PyTorch Lightning 1.7.1: Standard patch release

09 Aug 19:41

Choose a tag to compare

[1.7.1] - 2022-08-09

Fixed

  • Casted only floating point tensors to fp16 with IPUs (#13983)
  • Casted tensors to fp16 before moving them to device with DeepSpeedStrategy (#14000)
  • Fixed the NeptuneLogger dependency being unrecognized (#13988)
  • Fixed an issue where users would be warned about unset max_epochs even when fast_dev_run was set (#13262)
  • Fixed MPS device being unrecognized (#13992)
  • Fixed incorrect precision="mixed" being used with DeepSpeedStrategy and IPUStrategy (#14041)
  • Fixed dtype inference during gradient norm computation (#14051)
  • Fixed a bug that caused ddp_find_unused_parameters to be set False, whereas the intended default is True (#14095)

Contributors

@adamjstewart @akihironitta @awaelchli @Birch-san @carmocca @clementpoiret @dependabot @rohitgr7

Week bugfix release

09 Aug 13:36

Choose a tag to compare

[0.5.5] - 2022-08-9

Deprecated

  • Deprecate sheety API (#14004)

Fixed

  • Resolved a bug where the work statuses will grow quickly and be duplicated (#13970)
  • Resolved a bug about a race condition when sending the work state through the caller_queue (#14074)
  • Fixed Start Lightning App on Cloud if Repo Begins With Name "Lightning" (#14025)

Contributors

@manskx, @rlizzo, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

PyTorch Lightning 1.7: Apple Silicon support, Native FSDP, Collaborative training, and multi-GPU support with Jupyter notebooks

02 Aug 16:21
d2c086b

Choose a tag to compare

The core team is excited to announce the release of PyTorch Lightning 1.7 ⚡

PyTorch Lightning 1.7 is the culmination of work from 106 contributors who have worked on features, bug-fixes, and documentation for a total of over 492 commits since 1.6.0.

Highlights

Apple Silicon Support

For those using PyTorch 1.12 on M1 or M2 Apple machines, we have created the MPSAccelerator. MPSAccelerator enables accelerated GPU training on Apple’s Metal Performance Shaders (MPS) as a backend process.


NOTE

Support for this accelerator is currently marked as experimental in PyTorch. Because many operators are still missing, you may run into a few rough edges.


# Selects the accelerator
trainer = pl.Trainer(accelerator="mps")

# Equivalent to
from pytorch_lightning.accelerators import MPSAccelerator
trainer = pl.Trainer(accelerator=MPSAccelerator())

# Defaults to "mps" when run on M1 or M2 Apple machines
# to avoid code changes when switching computers
trainer = pl.Trainer(accelerator="gpu")

Native Fully Sharded Data Parallel Strategy

PyTorch 1.12 also added native support for Fully Sharded Data Parallel (FSDP). Previously, PyTorch Lightning enabled this by using the fairscale project. You can now choose between both options.


NOTE

Support for this strategy is marked as beta in PyTorch.


# Native PyTorch implementation
trainer = pl.Trainer(strategy="fsdp_native")

# Equivalent to
from pytorch_lightning.strategies import DDPFullyShardedNativeStrategy
trainer = pl.Trainer(strategy=DDPFullyShardedNativeStrategy())

# For reference, FairScale's implementation can be used with
trainer = pl.Trainer(strategy="fsdp")

A Collaborative Training strategy using Hivemind

Collaborative Training solves the need for top-tier multi-GPU servers by allowing you to train across unreliable machines such as local ones or even preemptible cloud compute across the Internet.

Under the hood, we use Hivemind. This provides de-centralized training across the Internet.

from pytorch_lightning.strategies import HivemindStrategy

trainer = pl.Trainer(
    strategy=HivemindStrategy(target_batch_size=8192), 
    accelerator="gpu", 
    devices=1
)

For more information, check out the docs.

Distributed support in Jupyter Notebooks

So far, the only multi-GPU strategy supported in Jupyter notebooks (including Grid.ai, Google Colab, and Kaggle, for example) has been the Data-Parallel (DP) strategy (strategy="dp"). DP, however, has several limitations that often obstruct users' workflows. It can be slow, it's incompatible with TorchMetrics, it doesn't persist state changes on replicas, and it's difficult to use with non-primitive input- and output structures.

In this release, we've added support for Distributed Data Parallel in Jupyter notebooks using the fork mechanism to address these shortcomings. This is only available for MacOS and Linux (sorry Windows!).


NOTE

This feature is experimental.


This is how you use multi-device in notebooks now:

# Train on 2 GPUs in a Jupyter notebook
trainer = pl.Trainer(accelerator="gpu", devices=2)

# Can be set explicitly
trainer = pl.Trainer(accelerator="gpu", devices=2, strategy="ddp_notebook")

# Can also be used in non-interactive environments
trainer = pl.Trainer(accelerator="gpu", devices=2, strategy="ddp_fork")

By default, the Trainer detects the interactive environment and selects the right strategy for you. Learn more in the full documentation.

Versioning of "last" checkpoints

If a run is configured to save to the same directory as a previous run and ModelCheckpoint(save_last=True) is enabled, the "last" checkpoint is now versioned with a simple -v1 suffix to avoid overwriting the existing "last" checkpoint. This mimics the behaviour for checkpoints that monitor a metric.

Automatically reload the "last" checkpoint

In certain scenarios, like when running in a cloud spot instance with fault-tolerant training enabled, it is useful to load the latest available checkpoint. It is now possible to pass the string ckpt_path="last" in order to load the latest available checkpoint from the set of existing checkpoints.

trainer = Trainer(...)
trainer.fit(..., ckpt_path="last")

Validation every N batches across epochs

In some cases, for example iteration based training, it is useful to run validation after every N number of training batches without being limited by the epoch boundary. Now, you can enable validation based on total training batches.

trainer = Trainer(..., val_check_interval=N, check_val_every_n_epoch=None)
trainer.fit(...)

For example, given 5 epochs of 10 batches, setting N=25 would run validation in the 3rd and 5th epoch.

CPU stats monitoring

PyTorch Lightning provides the DeviceStatsMonitor callback to monitor the stats of the hardware currently used. However, users often also want to monitor the stats of other hardware. In this release, we have added an option to additionally monitor CPU stats:

from pytorch_lightning.callbacks import DeviceStatsMonitor

# Log both CPU stats and GPU stats
trainer = pl.Trainer(callbacks=DeviceStatsMonitor(cpu_stats=True), accelerator="gpu")

# Log just the GPU stats
trainer = pl.Trainer(callbacks=DeviceStatsMonitor(cpu_stats=False), accelerator="gpu")

# Equivalent to `DeviceStatsMonitor()`
trainer = pl.Trainer(callbacks=DeviceStatsMonitor(cpu_stats=True), accelerator="cpu")

The CPU stats are gathered using the psutil package.

Automatic distributed samplers

It is now possible to use custom samplers in a distributed environment without the need to set replace_ddp_sampler=False and wrap your sampler manually with the DistributedSampler.

Inference mode support

PyTorch 1.9 introduced torch.inference_mode, which is a faster alternative for torch.no_grad. Lightning will now use inference_mode wherever possible during evaluation.

Support for warn-level determinism

In Pytorch 1.11, operations that do not have a deterministic implementation can be set to throw a warning instead of an error when ran in deterministic mode. This is now supported by our Trainer:

trainer = pl.Trainer(deterministic="warn")

LightningCLI improvements

After the latest updates to jsonargparse, the library supporting the LightningCLI, there's now complete support for shorthand notation. This includes automatic support for shorthand notation to all arguments, not just the ones that are part of the registries, plus support inside configuration files.

+ # pytorch_lightning==1.7.0
  trainer:
  callbacks:
-   - class_path: pytorch_lightning.callbacks.EarlyStopping
+   - class_path: EarlyStopping
      init_args:
        monitor: "loss"

A header with the version that generated the config is now included.

All subclasses for a given base class can be specified by name, so there's no need to explicitly register them. The only requirement is that the module where the subclass is defined is imported prior to parsing.

from pytorch_lightning.cli import LightningCLI
import my_code.models
import my_code.optimizers

cli = LightningCLI()
# Now use any of the classes:
# python trainer.py fit --model=Model1 --optimizer=CustomOptimizer

The new version renders the registries and the auto_registry flag, introduced in 1.6.0, unnecessary, so we have deprecated them.

Support was also added for list appending; for example, to add a callback to an existing list that might be already configured:

$ python trainer.py fit \
-   --trainer.callbacks=EarlyStopping \
+   --trainer.callbacks+=EarlyStopping \
    --trainer.callbacks.patience=5 \
-   --trainer.callbacks=LearningRateMonitor \
+   --trainer.callbacks+=LearningRateMonitor \
    --trainer.callbacks.logging_interval=epoch

Callback registration through entry points

Entry Points are an advanced feature in Python's setuptools that allow packages to expose metadata to other packages. In Lightning, we ...

Read more

Build-in templates

01 Aug 14:39
99fce3b

Choose a tag to compare

[0.5.4] - 2022-08-01

Changed

  • Wrapped imports for traceability (#13924)
  • Set version as today (#13906)

Fixed

  • Included app templates to the lightning and app packages (#13731)
  • Added UI for installing it all (#13732)
  • Fixed build meta pkg flow (#13926)

Contributors

@Borda, @manskx

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Minor bug-fix release

26 Jul 17:11

Choose a tag to compare

[0.5.3] - 2022-07-25

Changed

  • Pruned requirements duplicity (#13739)

Fixed

  • Use correct python version in lightning component template (#13790)

Lightning App 0.5.2

18 Jul 16:36

Choose a tag to compare

[0.5.2] - 2022-07-18

Added

  • Update the Lightning App docs (#13537)

Changed

  • Added LIGHTNING_ prefix to Platform AWS credentials (#13703)