Skip to content

Commit ef6b20c

Browse files
authored
Merge branch 'master' into feature/manual_optimization_tensordict
2 parents 42418f8 + 25b1343 commit ef6b20c

File tree

22 files changed

+134
-23
lines changed

22 files changed

+134
-23
lines changed

.github/workflows/call-clear-cache.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ on:
2323
jobs:
2424
cron-clear:
2525
if: github.event_name == 'schedule' || github.event_name == 'pull_request'
26-
uses: Lightning-AI/utilities/.github/workflows/cleanup-caches.yml@v0.14.3
26+
uses: Lightning-AI/utilities/.github/workflows/cleanup-caches.yml@v0.15.0
2727
with:
2828
scripts-ref: v0.14.3
2929
dry-run: ${{ github.event_name == 'pull_request' }}
@@ -32,7 +32,7 @@ jobs:
3232

3333
direct-clear:
3434
if: github.event_name == 'workflow_dispatch' || github.event_name == 'pull_request'
35-
uses: Lightning-AI/utilities/.github/workflows/cleanup-caches.yml@v0.14.3
35+
uses: Lightning-AI/utilities/.github/workflows/cleanup-caches.yml@v0.15.0
3636
with:
3737
scripts-ref: v0.14.3
3838
dry-run: ${{ github.event_name == 'pull_request' }}

.github/workflows/ci-schema.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ on:
88

99
jobs:
1010
check:
11-
uses: Lightning-AI/utilities/.github/workflows/check-schema.yml@v0.14.3
11+
uses: Lightning-AI/utilities/.github/workflows/check-schema.yml@v0.15.0
1212
with:
1313
# skip azure due to the wrong schema file by MSFT
1414
# https://github.com/Lightning-AI/lightning-flash/pull/1455#issuecomment-1244793607

docs/source-fabric/advanced/model_parallel/tp_fsdp.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ The :doc:`Tensor Parallelism documentation <tp>` and a general understanding of
99

1010
.. raw:: html
1111

12-
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-lightning-fabric">
12+
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/pretrain-an-llm-with-pytorch-lightning">
1313
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
1414
</a>
1515

docs/source-pytorch/accelerators/accelerator_prepare.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Synchronize validation and test logging
7878
***************************************
7979

8080
When running in distributed mode, we have to ensure that the validation and test step logging calls are synchronized across processes.
81-
This is done by adding ``sync_dist=True`` to all ``self.log`` calls in the validation and test step.
81+
This is done by adding ``sync_dist=True`` to all ``self.log`` calls in the validation and test step. This will automatically average values across all processes.
8282
This ensures that each GPU worker has the same behaviour when tracking model checkpoints, which is important for later downstream tasks such as testing the best checkpoint across all workers.
8383
The ``sync_dist`` option can also be used in logging calls during the step methods, but be aware that this can lead to significant communication overhead and slow down your training.
8484

docs/source-pytorch/advanced/model_parallel/tp.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This method is most effective for models with very large layers, significantly e
88

99
.. raw:: html
1010

11-
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning">
11+
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/pretrain-an-llm-with-pytorch-lightning">
1212
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
1313
</a>
1414

docs/source-pytorch/extensions/logging.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ The :meth:`~lightning.pytorch.core.LightningModule.log` method has a few options
137137
* ``logger``: Logs to the logger like ``Tensorboard``, or any other custom logger passed to the :class:`~lightning.pytorch.trainer.trainer.Trainer` (Default: ``True``).
138138
* ``reduce_fx``: Reduction function over step values for end of epoch. Uses :func:`torch.mean` by default and is not applied when a :class:`torchmetrics.Metric` is logged.
139139
* ``enable_graph``: If True, will not auto detach the graph.
140-
* ``sync_dist``: If True, reduces the metric across devices. Use with care as this may lead to a significant communication overhead.
140+
* ``sync_dist``: If True, averages the metric across devices. Use with care as this may lead to a significant communication overhead.
141141
* ``sync_dist_group``: The DDP group to sync across.
142142
* ``add_dataloader_idx``: If True, appends the index of the current dataloader to the name (when using multiple dataloaders). If False, user needs to give unique names for each dataloader to not mix the values.
143143
* ``batch_size``: Current batch size used for accumulating logs logged with ``on_epoch=True``. This will be directly inferred from the loaded batch, but for some data structures you might need to explicitly provide it.

requirements/ci.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
setuptools <80.9.1
22
wheel <0.46.0
3-
awscli >=1.30.0, <1.42.0
3+
awscli >=1.30.0, <1.43.0
44
twine ==6.1.0
55
importlib-metadata <9.0.0
66
wget

requirements/docs.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ myst-parser >=0.18.1, <4.0.0
33
nbsphinx >=0.8.5, <=0.9.7
44
nbconvert >7.14, <7.17
55
pandoc >=1.0, <=2.4
6-
docutils>=0.18.1,<=0.19
6+
docutils>=0.18.1,<=0.22
77
sphinxcontrib-fulltoc >=1.0, <=1.2.0
88
sphinxcontrib-mockautodoc
99
sphinx-autobuild
@@ -17,7 +17,7 @@ sphinx-rtd-dark-mode
1717
sphinxcontrib-video ==0.4.1
1818
jinja2 <3.2.0
1919

20-
lightning-utilities >=0.11.1, <0.15.0
20+
lightning-utilities >=0.11.1, <0.16.0
2121

2222
# installed from S3 location and fetched in advance
2323
lai-sphinx-theme

requirements/fabric/base.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ torch >=2.1.0, <2.8.0
55
fsspec[http] >=2022.5.0, <2025.8.0
66
packaging >=20.0, <=25.0
77
typing-extensions >=4.5.0, <4.15.0
8-
lightning-utilities >=0.10.0, <0.15.0
8+
lightning-utilities >=0.10.0, <0.16.0

requirements/fabric/examples.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
# in case you want to preserve/enforce restrictions on the latest compatible version, add "strict" as an in-line comment
33

44
torchvision >=0.16.0, <0.23.0
5-
torchmetrics >=0.10.0, <1.8.0
5+
torchmetrics >=0.10.0, <1.9.0

0 commit comments

Comments
 (0)