You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyTorch Lightning 1.7 is the culmination of work from 106 contributors who have worked on features, bug-fixes, and documentation for a total of over 492 commits since 1.6.0.
Highlights
Apple Silicon Support
For those using PyTorch 1.12 on M1 or M2 Apple machines, we have created the MPSAccelerator. MPSAccelerator enables accelerated GPU training on Apple’s Metal Performance Shaders (MPS) as a backend process.
NOTE
Support for this accelerator is currently marked as experimental in PyTorch. Because many operators are still missing, you may run into a few rough edges.
# Selects the acceleratortrainer=pl.Trainer(accelerator="mps")
# Equivalent tofrompytorch_lightning.acceleratorsimportMPSAcceleratortrainer=pl.Trainer(accelerator=MPSAccelerator())
# Defaults to "mps" when run on M1 or M2 Apple machines# to avoid code changes when switching computerstrainer=pl.Trainer(accelerator="gpu")
Native Fully Sharded Data Parallel Strategy
PyTorch 1.12 also added native support for Fully Sharded Data Parallel (FSDP). Previously, PyTorch Lightning enabled this by using the fairscale project. You can now choose between both options.
NOTE
Support for this strategy is marked as beta in PyTorch.
# Native PyTorch implementationtrainer=pl.Trainer(strategy="fsdp_native")
# Equivalent tofrompytorch_lightning.strategiesimportDDPFullyShardedNativeStrategytrainer=pl.Trainer(strategy=DDPFullyShardedNativeStrategy())
# For reference, FairScale's implementation can be used withtrainer=pl.Trainer(strategy="fsdp")
A Collaborative Training strategy using Hivemind
Collaborative Training solves the need for top-tier multi-GPU servers by allowing you to train across unreliable machines such as local ones or even preemptible cloud compute across the Internet.
Under the hood, we use Hivemind. This provides de-centralized training across the Internet.
So far, the only multi-GPU strategy supported in Jupyter notebooks (including Grid.ai, Google Colab, and Kaggle, for example) has been the Data-Parallel (DP) strategy (strategy="dp"). DP, however, has several limitations that often obstruct users' workflows. It can be slow, it's incompatible with TorchMetrics, it doesn't persist state changes on replicas, and it's difficult to use with non-primitive input- and output structures.
In this release, we've added support for Distributed Data Parallel in Jupyter notebooks using the fork mechanism to address these shortcomings. This is only available for MacOS and Linux (sorry Windows!).
NOTE
This feature is experimental.
This is how you use multi-device in notebooks now:
# Train on 2 GPUs in a Jupyter notebooktrainer=pl.Trainer(accelerator="gpu", devices=2)
# Can be set explicitlytrainer=pl.Trainer(accelerator="gpu", devices=2, strategy="ddp_notebook")
# Can also be used in non-interactive environmentstrainer=pl.Trainer(accelerator="gpu", devices=2, strategy="ddp_fork")
By default, the Trainer detects the interactive environment and selects the right strategy for you. Learn more in the full documentation.
Versioning of "last" checkpoints
If a run is configured to save to the same directory as a previous run and ModelCheckpoint(save_last=True) is enabled, the "last" checkpoint is now versioned with a simple -v1 suffix to avoid overwriting the existing "last" checkpoint. This mimics the behaviour for checkpoints that monitor a metric.
Automatically reload the "last" checkpoint
In certain scenarios, like when running in a cloud spot instance with fault-tolerant training enabled, it is useful to load the latest available checkpoint. It is now possible to pass the string ckpt_path="last" in order to load the latest available checkpoint from the set of existing checkpoints.
In some cases, for example iteration based training, it is useful to run validation after every N number of training batches without being limited by the epoch boundary. Now, you can enable validation based on total training batches.
For example, given 5 epochs of 10 batches, setting N=25 would run validation in the 3rd and 5th epoch.
CPU stats monitoring
PyTorch Lightning provides the DeviceStatsMonitor callback to monitor the stats of the hardware currently used. However, users often also want to monitor the stats of other hardware. In this release, we have added an option to additionally monitor CPU stats:
frompytorch_lightning.callbacksimportDeviceStatsMonitor# Log both CPU stats and GPU statstrainer=pl.Trainer(callbacks=DeviceStatsMonitor(cpu_stats=True), accelerator="gpu")
# Log just the GPU statstrainer=pl.Trainer(callbacks=DeviceStatsMonitor(cpu_stats=False), accelerator="gpu")
# Equivalent to `DeviceStatsMonitor()`trainer=pl.Trainer(callbacks=DeviceStatsMonitor(cpu_stats=True), accelerator="cpu")
The CPU stats are gathered using the psutil package.
Automatic distributed samplers
It is now possible to use custom samplers in a distributed environment without the need to set replace_ddp_sampler=False and wrap your sampler manually with the DistributedSampler.
Inference mode support
PyTorch 1.9 introduced torch.inference_mode, which is a faster alternative for torch.no_grad. Lightning will now use inference_mode wherever possible during evaluation.
Support for warn-level determinism
In Pytorch 1.11, operations that do not have a deterministic implementation can be set to throw a warning instead of an error when ran in deterministic mode. This is now supported by our Trainer:
trainer=pl.Trainer(deterministic="warn")
LightningCLI improvements
After the latest updates to jsonargparse, the library supporting the LightningCLI, there's now complete support for shorthand notation. This includes automatic support for shorthand notation to all arguments, not just the ones that are part of the registries, plus support inside configuration files.
A header with the version that generated the config is now included.
All subclasses for a given base class can be specified by name, so there's no need to explicitly register them. The only requirement is that the module where the subclass is defined is imported prior to parsing.
frompytorch_lightning.cliimportLightningCLIimportmy_code.modelsimportmy_code.optimizerscli=LightningCLI()
# Now use any of the classes:# python trainer.py fit --model=Model1 --optimizer=CustomOptimizer
The new version renders the registries and the auto_registry flag, introduced in 1.6.0, unnecessary, so we have deprecated them.
Support was also added for list appending; for example, to add a callback to an existing list that might be already configured:
Entry Points are an advanced feature in Python's setuptools that allow packages to expose metadata to other packages. In Lightning, we allow an arbitrary package to include callbacks that the Lightning Trainer can automatically use when installed, without you having to manually add them to the Trainer. This is useful in production environments where it is common to provide specialized monitoring and logging callbacks globally for every application.
A setup.py file for a callbacks plugin package could look something like this:
fromsetuptoolsimportsetupsetup(
name="my-package",
version="0.0.1",
entry_points={
# Lightning will look for this key here in the environment:"pytorch_lightning.callbacks_factory": [
"monitor_callbacks=factories:my_custom_callbacks_factory"
]
},
)
Read more about callback entry points in our docs.
Rank-zero only EarlyStopping messages
Our EarlyStopping callback implementation, by default, logs the stopping messages on every rank when it's run in a distributed environment. This was done in case the monitored values were not synchronized. However, some users found this verbose. To avoid this, you can now set a flag:
If you want to customize ModelCheckpoint callback, without all the extra functionality this class provides, this release provides an empty class Checkpoint for easier inheritance. In all internal code, the check is made against the Checkpoint class in order to ensure everything works properly for custom classes.
Validation now runs in overfitting mode
Setting overfit_batches=N, now enables validation and runs N number of validation batches during trainer.fit.
# Uses 1% of each train & val settrainer=Trainer(overfit_batches=0.01)
# Uses 10 batches for each train & val settrainer=Trainer(overfit_batches=10)
Device Stats Monitoring support for HPUs
DeviceStatsMonitor callback can now be used to automatically monitor and log device stats during the training stage with Habana devices.
Now, hyper-parameters from LightningDataModule save to checkpoints and reload when training is resumed. And just like you use LightningModule.load_from_checkpoint to load a model using a checkpoint filepath, you can now load LightningDataModule using the same hook.
# Lad weights without mapping ...datamodule=MyLightningDataModule.load_from_checkpoint('path/to/checkpoint.ckpt')
# Or load weights and hyperparameters from separate files.datamodule=MyLightningDataModule.load_from_checkpoint(
'path/to/checkpoint.ckpt',
hparams_file='/path/to/hparams_file.yaml'
)
# Override some of the params with new valuesdatamodule=MyLightningDataModule.load_from_checkpoint(
'path/to/checkpoint.ckpt',
batch_size=32,
num_workers=10,
)
Experimental Features
ServableModule and its Servable Module Validator Callback
When serving models in production, it generally is a good pratice to ensure that the model can be served and optimzed before starting training to avoid wasting money.
To do so, you can import a ServableModule (an nn.Module) and add it as an extra base class to your base model as follows:
You can now save checkpoints asynchronously using the AsyncCheckpointIO plugin without blocking your training process. To enable this, you can pass a AsyncCheckpointIO plugin to the Trainer.
This section outlines notable changes that are not backward compatible with previous versions. The full list of changes and removals can be found in the CHANGELOG below.
Removed support for the DDP2 strategy
The DDP2 strategy, previously known as the DDP2 plugin, has been part of Lightning since its inception. Due to both the technical challenges in maintaining the plugin after PyTorch's removal of the multi-device support in DistributedDataParallel, as well as a general lack of interest, we have decided to retire the strategy entirely.
Do not force metric synchronization on epoch end
In previous versions, metrics logged inside epoch-end hooks were forcefully synced. This makes the sync_dist flag irrelevant and causes communication overhead that might be undesired. In this release, we've removed this behaviour and instead warn the user that synchronization might be desired.
Added breaking of lazy graph across training, validation, test and predict steps when training with habana accelerators to ensure better performance (#12938)
Added dataclass support to extract_batch_size (#12573)
Changed checkpoints save path in the case of one logger and user-provided weights_save_path from weights_save_path/name/version/checkpoints to weights_save_path/checkpoints (#12372)
Changed checkpoints save path in the case of multiple loggers and user-provided weights_save_path from weights_save_path/name1_name2/version1_version2/checkpoints to weights_save_path/checkpoints (#12372)
Marked swa_lrs argument in StochasticWeightAveraging callback as required (#12556)
LightningCLI's shorthand notation changed to use jsonargparse native feature (#12614)
LightningCLI changed to use jsonargparse native support for list append (#13129)
Changed seed_everything_default argument in the LightningCLI to type Union[bool, int]. If set to True a seed is automatically generated for the parser argument --seed_everything. (#12822, #13110)
Make positional arguments required for classes passed into the add_argparse_args function. (#12504)
Raise an error if there are insufficient training batches when using a float value of limit_train_batches (#12885)
DataLoader instantiated inside a *_dataloader hook will not set the passed arguments as attributes anymore (#12981)
When a multi-element tensor is logged, an error is now raised instead of silently taking the mean of all elements (#13164)
The WandbLogger will now use the run name in the logs folder if it is provided, and otherwise the project name (#12604)
Enabled using any Sampler in distributed environment in Lite (#13646)
Raised a warning instead of forcing sync_dist=True on epoch end (13364)
Updated val_check_interval(int) to consider total train batches processed instead of _batches_that_stepped for validation check during training (#12832
Updated Habana Accelerator's auto_device_count, is_available & get_device_name methods based on the latest torch habana package (#13423)
Disallowed using BatchSampler when running on multiple IPUs (#13854)
Deprecated
Deprecated pytorch_lightning.accelerators.gpu.GPUAccelerator in favor of pytorch_lightning.accelerators.cuda.CUDAAccelerator (#13636)
Deprecated pytorch_lightning.loggers.base.LightningLoggerBase in favor of pytorch_lightning.loggers.logger.Logger, and deprecated pytorch_lightning.loggers.base in favor of pytorch_lightning.loggers.logger (#120148)
Deprecated pytorch_lightning.callbacks.base.Callback in favor of pytorch_lightning.callbacks.callback.Callback (#13031)
Deprecated num_processes, gpus, tpu_cores, and ipus from the Trainer constructor in favor of using the accelerator and devices arguments (#11040)
Deprecated setting LightningCLI(seed_everything_default=None) in favor of False (#12804).
Deprecated pytorch_lightning.core.lightning.LightningModule in favor of pytorch_lightning.core.module.LightningModule (#12740)
Deprecated pytorch_lightning.loops.base.Loop in favor of pytorch_lightning.loops.loop.Loop (#13043)
Deprecated Trainer.reset_train_val_dataloaders() in favor of Trainer.reset_{train,val}_dataloader (#12184)
Deprecated LightningCLI's registries in favor of importing the respective package (#13221)
Deprecated public utilities in pytorch_lightning.utilities.cli.LightningCLI in favor of equivalent copies in pytorch_lightning.cli.LightningCLI (#13767)
Deprecated pytorch_lightning.profiler in favor of pytorch_lightning.profilers (#12308)
Removed deprecated ClusterEnvironment properties master_address and master_port in favor of main_address and main_port (#13458)
Removed deprecated ClusterEnvironment methods KubeflowEnvironment.is_using_kubelfow(), LSFEnvironment.is_using_lsf() and TorchElasticEnvironment.is_using_torchelastic() in favor of the detect() method (#13458)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
The core team is excited to announce the release of PyTorch Lightning 1.7 ⚡
PyTorch Lightning 1.7 is the culmination of work from 106 contributors who have worked on features, bug-fixes, and documentation for a total of over 492 commits since 1.6.0.
Highlights
Apple Silicon Support
For those using PyTorch 1.12 on M1 or M2 Apple machines, we have created the
MPSAccelerator
.MPSAccelerator
enables accelerated GPU training on Apple’s Metal Performance Shaders (MPS) as a backend process.NOTE
Support for this accelerator is currently marked as experimental in PyTorch. Because many operators are still missing, you may run into a few rough edges.
Native Fully Sharded Data Parallel Strategy
PyTorch 1.12 also added native support for Fully Sharded Data Parallel (FSDP). Previously, PyTorch Lightning enabled this by using the
fairscale
project. You can now choose between both options.NOTE
Support for this strategy is marked as beta in PyTorch.
A Collaborative Training strategy using Hivemind
Collaborative Training solves the need for top-tier multi-GPU servers by allowing you to train across unreliable machines such as local ones or even preemptible cloud compute across the Internet.
Under the hood, we use Hivemind. This provides de-centralized training across the Internet.
For more information, check out the docs.
Distributed support in Jupyter Notebooks
So far, the only multi-GPU strategy supported in Jupyter notebooks (including Grid.ai, Google Colab, and Kaggle, for example) has been the Data-Parallel (DP) strategy (
strategy="dp"
). DP, however, has several limitations that often obstruct users' workflows. It can be slow, it's incompatible with TorchMetrics, it doesn't persist state changes on replicas, and it's difficult to use with non-primitive input- and output structures.In this release, we've added support for Distributed Data Parallel in Jupyter notebooks using the fork mechanism to address these shortcomings. This is only available for MacOS and Linux (sorry Windows!).
NOTE
This feature is experimental.
This is how you use multi-device in notebooks now:
By default, the Trainer detects the interactive environment and selects the right strategy for you. Learn more in the full documentation.
Versioning of "last" checkpoints
If a run is configured to save to the same directory as a previous run and
ModelCheckpoint(save_last=True)
is enabled, the "last" checkpoint is now versioned with a simple-v1
suffix to avoid overwriting the existing "last" checkpoint. This mimics the behaviour for checkpoints that monitor a metric.Automatically reload the "last" checkpoint
In certain scenarios, like when running in a cloud spot instance with fault-tolerant training enabled, it is useful to load the latest available checkpoint. It is now possible to pass the string
ckpt_path="last"
in order to load the latest available checkpoint from the set of existing checkpoints.Validation every N batches across epochs
In some cases, for example iteration based training, it is useful to run validation after every
N
number of training batches without being limited by the epoch boundary. Now, you can enable validation based on total training batches.For example, given 5 epochs of 10 batches, setting
N=25
would run validation in the 3rd and 5th epoch.CPU stats monitoring
PyTorch Lightning provides the
DeviceStatsMonitor
callback to monitor the stats of the hardware currently used. However, users often also want to monitor the stats of other hardware. In this release, we have added an option to additionally monitor CPU stats:The CPU stats are gathered using the
psutil
package.Automatic distributed samplers
It is now possible to use custom samplers in a distributed environment without the need to set
replace_ddp_sampler=False
and wrap your sampler manually with theDistributedSampler
.Inference mode support
PyTorch 1.9 introduced
torch.inference_mode
, which is a faster alternative fortorch.no_grad
. Lightning will now useinference_mode
wherever possible during evaluation.Support for warn-level determinism
In Pytorch 1.11, operations that do not have a deterministic implementation can be set to throw a warning instead of an error when ran in deterministic mode. This is now supported by our
Trainer
:LightningCLI improvements
After the latest updates to
jsonargparse
, the library supporting theLightningCLI
, there's now complete support for shorthand notation. This includes automatic support for shorthand notation to all arguments, not just the ones that are part of the registries, plus support inside configuration files.A header with the version that generated the config is now included.
All subclasses for a given base class can be specified by name, so there's no need to explicitly register them. The only requirement is that the module where the subclass is defined is imported prior to parsing.
The new version renders the registries and the
auto_registry
flag, introduced in 1.6.0, unnecessary, so we have deprecated them.Support was also added for list appending; for example, to add a callback to an existing list that might be already configured:
Callback registration through entry points
Entry Points are an advanced feature in Python's setuptools that allow packages to expose metadata to other packages. In Lightning, we allow an arbitrary package to include callbacks that the Lightning Trainer can automatically use when installed, without you having to manually add them to the Trainer. This is useful in production environments where it is common to provide specialized monitoring and logging callbacks globally for every application.
A
setup.py
file for a callbacks plugin package could look something like this:Read more about callback entry points in our docs.
Rank-zero only
EarlyStopping
messagesOur
EarlyStopping
callback implementation, by default, logs the stopping messages on every rank when it's run in a distributed environment. This was done in case the monitored values were not synchronized. However, some users found this verbose. To avoid this, you can now set a flag:A base
Checkpoint
class for extra customizationIf you want to customize
ModelCheckpoint
callback, without all the extra functionality this class provides, this release provides an empty classCheckpoint
for easier inheritance. In all internal code, the check is made against theCheckpoint
class in order to ensure everything works properly for custom classes.Validation now runs in overfitting mode
Setting
overfit_batches=N
, now enables validation and runsN
number of validation batches duringtrainer.fit
.Device Stats Monitoring support for HPUs
DeviceStatsMonitor
callback can now be used to automatically monitor and log device stats during the training stage with Habana devices.New Hooks
LightningDataModule.load_from_checkpoint
Now, hyper-parameters from
LightningDataModule
save to checkpoints and reload when training is resumed. And just like you useLightningModule.load_from_checkpoint
to load a model using a checkpoint filepath, you can now loadLightningDataModule
using the same hook.Experimental Features
ServableModule and its Servable Module Validator Callback
When serving models in production, it generally is a good pratice to ensure that the model can be served and optimzed before starting training to avoid wasting money.
To do so, you can import a
ServableModule
(annn.Module
) and add it as an extra base class to your base model as follows:To make your model servable, you would need to implement three hooks:
configure_payload
: Describe the format of the payload (data sent to the server).configure_serialization
: Describe the functions used to convert the payload to tensors (de-serialization) and tensors to payload (serialization)serve_step
: The method used to transform the input tensors to a dictionary of prediction tensors.Finally, add the
ServableModuleValidator
callback to the Trainer to validate the model is servableon_train_start
. This uses a FastAPI server.Have a look at the full example here.
Asynchronous Checkpointing
You can now save checkpoints asynchronously using the
AsyncCheckpointIO
plugin without blocking your training process. To enable this, you can pass aAsyncCheckpointIO
plugin to theTrainer
.Have a look at the full example here.
Backward Incompatible Changes
This section outlines notable changes that are not backward compatible with previous versions. The full list of changes and removals can be found in the CHANGELOG below.
Removed support for the DDP2 strategy
The DDP2 strategy, previously known as the DDP2 plugin, has been part of Lightning since its inception. Due to both the technical challenges in maintaining the plugin after PyTorch's removal of the multi-device support in DistributedDataParallel, as well as a general lack of interest, we have decided to retire the strategy entirely.
Do not force metric synchronization on epoch end
In previous versions, metrics logged inside epoch-end hooks were forcefully synced. This makes the
sync_dist
flag irrelevant and causes communication overhead that might be undesired. In this release, we've removed this behaviour and instead warn the user that synchronization might be desired.Deprecations
pytorch_lightning.loggers.base.LightningLoggerBase
pytorch_lightning.loggers.logger.Logger
pytorch_lightning.callbacks.base.Callback
pytorch_lightning.callbacks.callback.Callback
pytorch_lightning.core.lightning.LightningModule
pytorch_lightning.core.module.LightningModule
pytorch_lightning.loops.base.Loop
pytorch_lightning.loops.loop.Loop
pytorch_lightning.profiler
pytorch_lightning.profilers
Trainer(num_processes=..., gpus=..., tpu_cores=..., ipus=...)
Trainer(accelerator=..., devices=...)
LightningCLI(seed_everything_default=None)
LightningCLI(seed_everything_default=False)
Trainer.reset_train_val_dataloaders()
Trainer.reset_{train,val}_dataloader
pytorch_lightning.utilities.cli
modulepytorch_lightning.cli
pytorch_lightning.utilities.cli.{OPTIMIZER,LR_SCHEDULER,MODEL,DATAMODULE,CALLBACK,LOGGER}_REGISTRY
LightningCLI(auto_registry=...)
Trainer(strategy="ddp2")
and classpytorch_lightning.strategies.DDP2Strategy
CHANGELOG
Added
ServableModule
and its associated callback calledServableModuleValidator
to ensure the model can served (#13614)PossibleUserWarning
(#13377)log_rank_zero_only
toEarlyStopping
to disable logging to non-zero rank processes (#13233)ckpt_path="last"
(#12816)LightningDataModule.load_from_checkpoint
to support loading datamodules directly from checkpoint (#12550)Trainer.save_checkpoint()
without a model attached (#12772)DeepSpeedStrategy
on unsupported accelerators (#12699)torch.inference_mode
for evaluation and prediction (#12715)val_check_interval
to a value higher than the amount of training batches whencheck_val_every_n_epoch=None
(#11993)pytorch_lightning
version as a header in the CLI config files (#12532)Callback
registration through entry points (#12739)Trainer(deterministic="warn")
to warn instead of fail when a non-deterministic operation is encountered (#12588)__next__
calls (#12124)CollaborativeStrategy
(#12842)CollaborativeStrategy
toHivemindStrategy
(#13388)collaborative
tohivemind
(#13392)predict_dataset
argument inLightningDataModule.from_datasets
to create predict dataloaders (#12942)DeviceStatsMonitor
(#12228)DistributedSamplerWrapper
(#12959)LightningDataModule
hooks (#12971)Checkpoint
class to inherit from (#13024)DeviceStatsMonitor
(#11795)teardown()
method toAccelerator
(#11935)timeout
argument toDDPStrategy
andDDPSpawnStrategy
. (#13244, #13383)XLAEnvironment
cluster environment plugin (#11330)FitLoop
stopping conditions are met (#9749)DummyLogger
(#13224Trainer
reference for ensembles ofLightningModule
s (#13638MPSAccelerator
(#13123)Changed
accelerator="gpu"
now automatically selects an available GPU backend (CUDA and MPS currently) (#13642)extract_batch_size
(#12573)weights_save_path/name/version/checkpoints
toweights_save_path/checkpoints
(#12372)weights_save_path/name1_name2/version1_version2/checkpoints
toweights_save_path/checkpoints
(#12372)swa_lrs
argument inStochasticWeightAveraging
callback as required (#12556)LightningCLI
's shorthand notation changed to use jsonargparse native feature (#12614)LightningCLI
changed to use jsonargparse native support for list append (#13129)seed_everything_default
argument in theLightningCLI
to typeUnion[bool, int]
. If set toTrue
a seed is automatically generated for the parser argument--seed_everything
. (#12822, #13110)add_argparse_args
function. (#12504)limit_train_batches
(#12885)DataLoader
instantiated inside a*_dataloader
hook will not set the passed arguments as attributes anymore (#12981)WandbLogger
will now use the run name in the logs folder if it is provided, and otherwise the project name (#12604)sync_dist=True
on epoch end (13364)val_check_interval
(int) to consider total train batches processed instead of_batches_that_stepped
for validation check during training (#12832auto_device_count
,is_available
&get_device_name
methods based on the latest torch habana package (#13423)BatchSampler
when running on multiple IPUs (#13854)Deprecated
pytorch_lightning.accelerators.gpu.GPUAccelerator
in favor ofpytorch_lightning.accelerators.cuda.CUDAAccelerator
(#13636)pytorch_lightning.loggers.base.LightningLoggerBase
in favor ofpytorch_lightning.loggers.logger.Logger
, and deprecatedpytorch_lightning.loggers.base
in favor ofpytorch_lightning.loggers.logger
(#120148)pytorch_lightning.callbacks.base.Callback
in favor ofpytorch_lightning.callbacks.callback.Callback
(#13031)num_processes
,gpus
,tpu_cores,
andipus
from theTrainer
constructor in favor of using theaccelerator
anddevices
arguments (#11040)LightningCLI(seed_everything_default=None)
in favor ofFalse
(#12804).pytorch_lightning.core.lightning.LightningModule
in favor ofpytorch_lightning.core.module.LightningModule
(#12740)pytorch_lightning.loops.base.Loop
in favor ofpytorch_lightning.loops.loop.Loop
(#13043)Trainer.reset_train_val_dataloaders()
in favor ofTrainer.reset_{train,val}_dataloader
(#12184)pytorch_lightning.utilities.cli.LightningCLI
in favor of equivalent copies inpytorch_lightning.cli.LightningCLI
(#13767)pytorch_lightning.profiler
in favor ofpytorch_lightning.profilers
(#12308)Removed
IndexBatchSamplerWrapper.batch_indices
(#13565)LightningModule.add_to_queue
andLightningModule.get_from_queue
method (#13600)pytorch_lightning.core.decorators.parameter_validation
fromdecorators
(#13514)Logger.close
method (#13149)weights_summary
argument from theTrainer
constructor (#13070)flush_logs_every_n_steps
argument from theTrainer
constructor (#13074)process_position
argument from theTrainer
constructor (13071)checkpoint_callback
argument from theTrainer
constructor (#13027)on_{train,val,test,predict}_dataloader
hooks from theLightningModule
andLightningDataModule
(#13033)TestTubeLogger
(#12859)pytorch_lightning.core.memory.LayerSummary
andpytorch_lightning.core.memory.ModelSummary
(#12593)summarize
method from theLightningModule
(#12559)model_size
property from theLightningModule
class (#12641)stochastic_weight_avg
argument from theTrainer
constructor (#12535)progress_bar_refresh_rate
argument from theTrainer
constructor (#12514)prepare_data_per_node
argument from theTrainer
constructor (#12536)pytorch_lightning.core.memory.{get_gpu_memory_map,get_memory_profile}
(#12659)terminate_on_nan
argument from theTrainer
constructor (#12553)XLAStatsMonitor
callback (#12688)pytorch_lightning.callbacks.progress.progress
(#12658)dim
andsize
arguments from theLightningDataModule
constructor(#12780)train_transforms
argument from theLightningDataModule
constructor(#12662)log_gpu_memory
argument from theTrainer
constructor (#12657)GPUStatsMonitor
callback (#12554)val_transforms
argument from theLightningDataModule
constructor (#12763)test_transforms
argument from theLightningDataModule
constructor (#12773)Trainer(max_steps=None)
(#13591)dataloader_idx
argument fromon_train_batch_start/end
hooksCallback
andLightningModule
(#12769, #12977)get_progress_bar_dict
property fromLightningModule
(#12839)Strategy.post_dispatch()
hook (#13461)pytorch_lightning.callbacks.lr_monitor.LearningRateMonitor.lr_sch_names
(#13353)Trainer.slurm_job_id
in favor ofSLURMEnvironment.job_id
(#13459)DDP2Strategy
(#12705)LightningDistributed
(#13549)master_address
andmaster_port
in favor ofmain_address
andmain_port
(#13458)KubeflowEnvironment.is_using_kubelfow()
,LSFEnvironment.is_using_lsf()
andTorchElasticEnvironment.is_using_torchelastic()
in favor of thedetect()
method (#13458)Callback.on_keyboard_interrupt
(#13438)LightningModule.on_post_move_to_device
(#13548)TPUSpawnStrategy.{tpu_local_core_rank,tpu_global_core_rank}
attributes in favor ofTPUSpawnStrategy.{local_rank,global_rank}
(#11163)SingleTPUStrategy.{tpu_local_core_rank,tpu_global_core_rank}
attributes in favor ofSingleTPUStrategy.{local_rank,global_rank}
(#11163)Fixed
DataLoader
s when instantiated in*_dataloader
hook (#12981)BatchSampler
s when instantiated in*_dataloader
hook #13640)LightningLite.setup()
now properly supports pass-through when looking up attributes (#12597)LightningCLI
signature parameter resolving for some lightning classes (#13283)pytorch_lightning.utilities.distributed.gather_all_tensors
to handle tensors of different dimensions (#12630)Trainer.predict(return_predictions=False)
to track prediction's batch_indices (#13629)CheckpointIO
plugin with strategies (#13785)val_check_interval=int
andcheck_val_every_n_epoch=None
(#12832ReduceLROnPlateau
scheduler ifreduce_on_plateau
is set by the user in scheduler config (#13838)global_step
while restoring logging step for old checkpoints (#13645)precision=16
on IPU, the cast has been moved off the IPU onto the host, making the copies from host to IPU cheaper (#13880)amp_level
forDeepSpeedPrecisionPlugin
toO2
(#13897)TQDMProgressBar
reset and update to show correct time estimation (2/2) (#13962)Full commit list: 1.6.0...1.7.0
Contributors
Veteran
@akashkw @akihironitta @aniketmaurya @awaelchli @Benjamin-Etheredge @Borda @carmocca @catalys1 @daniellepintz @edenlightning @edward-io @EricWiener @fschlatt @ftorres16 @jerome-habana @justusschock @karthikrangasai @kaushikb11 @krishnakalyan3 @krshrimali @mauvilsa @nikvaessen @otaj @pre-commit-ci @puhuk @raoakarsha @rasbt @rohitgr7 @SeanNaren @s-rog @talregev @tchaton @tshu-w @twsl @weiji14 @williamFalcon @WrRan
New
@alvitawa @aminst @ankitaS11 @ar90n @Atharva-Phatak @bibhabasumohapatra @BongYang @code-review-doctor @CompRhys @Cyprien-Ricque @dependabot @digital-idiot @DN6 @donlapark @ekagra-ranjan @ethanfurman @gautierdag @georgestein @HallerPatrick @HenryLau0220 @hhsecond @himkt @HMellor @igorgad @inwaves @ishtos @JeroenDelcour @JiahaoYao @jiny419 @jinyoung-lim @JustinGoheen @jxmorris12 @Keiku @kingjuno @lsy643 @luca-medeiros @lukasugar @maciek-pioro @mads-oestergaard @manskx @martinosorb @MohammedAlkhrashi @MrShevan @myxik @naisofly @NathanielDamours @nayoungjun @niberger @nitinramvelraj @nninept @pbsds @Pragyanstha @PrajwalBorkar @Prometheos2 @rampartrange @rhjohnstone @rschireman @samz5320 @Schinkikami @semaphore-egg @shantam-8 @shenoynikhil @sisilmehta2000 @s-kumano @stanbiryukov @talregev @tanmoyio @tkonopka @vumichien @wangherr @yhl48 @YongWookHa
If we forgot somebody or you have a suggestion, find support here ⚡
Did you know?
Chuck Norris can unit-test entire applications with a single assert.
This discussion was created from the release PyTorch Lightning 1.7: Apple Silicon support, Native FSDP, Collaborative training, and multi-GPU support with Jupyter notebooks.
Beta Was this translation helpful? Give feedback.
All reactions