Skip to content

Commit 452557b

Browse files
authored
Merge branch 'master' into comet-flush-experiment-after-checkpoint
2 parents c884d6b + 791753b commit 452557b

File tree

44 files changed

+311
-65
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+311
-65
lines changed

.github/CONTRIBUTING.md

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Welcome to the PyTorch Lightning community! We're building the most advanced research platform on the planet to implement the latest, best practices
44
and integrations that the amazing PyTorch team and other research organization rolls out!
55

6-
If you are new to open source, check out [this blog to get started with your first Open Source contribution](https://devblog.pytorchlightning.ai/quick-contribution-guide-86d977171b3a).
6+
If you are new to open source, check out [this blog to get started with your first Open Source contribution](https://medium.com/pytorch-lightning/quick-contribution-guide-86d977171b3a).
77

88
## Main Core Value: One less thing to remember
99

@@ -109,6 +109,50 @@ ______________________________________________________________________
109109

110110
## Guidelines
111111

112+
### Development environment
113+
114+
To set up a local development environment, we recommend using `uv`, which can be installed following their [instructions](https://docs.astral.sh/uv/getting-started/installation/).
115+
116+
Once `uv` has been installed, begin by cloning the forked repository:
117+
118+
```bash
119+
git clone https://github.com/{YOUR_GITHUB_USERNAME}/pytorch-lightning.git
120+
cd pytorch-lightning
121+
```
122+
123+
> If you're using [Lightning Studio](https://lightning.ai) or already have your `uv venv` activated, you can quickly set up the project by running:
124+
125+
```bash
126+
make setup
127+
```
128+
129+
This will:
130+
131+
- Install all required dependencies.
132+
- Perform an editable install of the `pytorch-lightning` project.
133+
- Install and configure `pre-commit`.
134+
135+
#### Manual Setup (Optional)
136+
137+
If you prefer more fine-grained control over the dependencies, you can set up the environment manually:
138+
139+
```bash
140+
uv venv
141+
# uv venv --python 3.11 # use this instead if you need a specific python version
142+
143+
source .venv/bin/activate # command may differ based on your shell
144+
uv pip install ".[dev, examples]"
145+
```
146+
147+
Once the dependencies have been installed, install pre-commit and set up the git hook scripts:
148+
149+
```bash
150+
uv pip install pre-commit
151+
pre-commit install
152+
```
153+
154+
If you would like more information regarding the uv commands, please refer to uv's documentation for more information on their [pip interface](https://docs.astral.sh/uv/pip/).
155+
112156
### Developments scripts
113157

114158
To build the documentation locally, simply execute the following commands from project root (only for Unix):

.github/markdown-links-config.json

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,9 @@
2222
"Accept-Encoding": "zstd, br, gzip, deflate"
2323
}
2424
}
25-
]
25+
],
26+
"timeout": "20s",
27+
"retryOn429": true,
28+
"retryCount": 5,
29+
"fallbackRetryDelay": "20s"
2630
}

.github/workflows/call-clear-cache.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ on:
2323
jobs:
2424
cron-clear:
2525
if: github.event_name == 'schedule' || github.event_name == 'pull_request'
26-
uses: Lightning-AI/utilities/.github/workflows/cleanup-caches.yml@v0.14.3
26+
uses: Lightning-AI/utilities/.github/workflows/cleanup-caches.yml@v0.15.0
2727
with:
2828
scripts-ref: v0.14.3
2929
dry-run: ${{ github.event_name == 'pull_request' }}
@@ -32,7 +32,7 @@ jobs:
3232

3333
direct-clear:
3434
if: github.event_name == 'workflow_dispatch' || github.event_name == 'pull_request'
35-
uses: Lightning-AI/utilities/.github/workflows/cleanup-caches.yml@v0.14.3
35+
uses: Lightning-AI/utilities/.github/workflows/cleanup-caches.yml@v0.15.0
3636
with:
3737
scripts-ref: v0.14.3
3838
dry-run: ${{ github.event_name == 'pull_request' }}

.github/workflows/ci-schema.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ on:
88

99
jobs:
1010
check:
11-
uses: Lightning-AI/utilities/.github/workflows/check-schema.yml@v0.14.3
11+
uses: Lightning-AI/utilities/.github/workflows/check-schema.yml@v0.15.0
1212
with:
1313
# skip azure due to the wrong schema file by MSFT
1414
# https://github.com/Lightning-AI/lightning-flash/pull/1455#issuecomment-1244793607

Makefile

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.PHONY: test clean docs
1+
.PHONY: test clean docs setup
22

33
# to imitate SLURM set only single node
44
export SLURM_LOCALID=0
@@ -7,6 +7,23 @@ export SPHINX_MOCK_REQUIREMENTS=1
77
# install only Lightning Trainer packages
88
export PACKAGE_NAME=pytorch
99

10+
setup:
11+
uv pip install -r requirements.txt \
12+
-r requirements/pytorch/base.txt \
13+
-r requirements/pytorch/test.txt \
14+
-r requirements/pytorch/extra.txt \
15+
-r requirements/pytorch/strategies.txt \
16+
-r requirements/fabric/base.txt \
17+
-r requirements/fabric/test.txt \
18+
-r requirements/fabric/strategies.txt \
19+
-r requirements/typing.txt \
20+
-e ".[all]" \
21+
pre-commit
22+
pre-commit install
23+
@echo "-----------------------------"
24+
@echo "✅ Environment setup complete. Ready to Contribute ⚡️!"
25+
26+
1027
clean:
1128
# clean all temp runs
1229
rm -rf $(shell find . -name "mlruns")

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,12 @@ ______________________________________________________________________
5555

5656
 
5757

58+
# Why PyTorch Lightning?
59+
60+
Training models in plain PyTorch is tedious and error-prone - you have to manually handle things like backprop, mixed precision, multi-GPU, and distributed training, often rewriting code for every new project. PyTorch Lightning organizes PyTorch code to automate those complexities so you can focus on your model and data, while keeping full control and scaling from CPU to multi-node without changing your core code. But if you want control of those things, you can still opt into more DIY.
61+
62+
Fun analogy: If PyTorch is Javascript, PyTorch Lightning is ReactJS or NextJS.
63+
5864
# Lightning has 2 core packages
5965

6066
[PyTorch Lightning: Train and deploy PyTorch at scale](#why-pytorch-lightning).

_notebooks

docs/source-fabric/advanced/model_parallel/tp_fsdp.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ The :doc:`Tensor Parallelism documentation <tp>` and a general understanding of
99

1010
.. raw:: html
1111

12-
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-lightning-fabric">
12+
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/pretrain-an-llm-with-pytorch-lightning">
1313
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
1414
</a>
1515

docs/source-pytorch/accelerators/accelerator_prepare.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Synchronize validation and test logging
7878
***************************************
7979

8080
When running in distributed mode, we have to ensure that the validation and test step logging calls are synchronized across processes.
81-
This is done by adding ``sync_dist=True`` to all ``self.log`` calls in the validation and test step.
81+
This is done by adding ``sync_dist=True`` to all ``self.log`` calls in the validation and test step. This will automatically average values across all processes.
8282
This ensures that each GPU worker has the same behaviour when tracking model checkpoints, which is important for later downstream tasks such as testing the best checkpoint across all workers.
8383
The ``sync_dist`` option can also be used in logging calls during the step methods, but be aware that this can lead to significant communication overhead and slow down your training.
8484

docs/source-pytorch/advanced/model_parallel/tp.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This method is most effective for models with very large layers, significantly e
88

99
.. raw:: html
1010

11-
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/tensor-parallelism-supercharging-large-model-training-with-pytorch-lightning">
11+
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/pretrain-an-llm-with-pytorch-lightning">
1212
<img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio" style="width: auto; max-width: none;"/>
1313
</a>
1414

0 commit comments

Comments
 (0)