Working branch #20402

pmatczak2 · 2024-11-06T19:18:53Z

Enhanced Documentation:
Added detailed comments throughout the _FitLoop class to explain the purpose and functionality of key methods, particularly setup_data(), on_run_start(), and on_advance_start().
Clarified Logic for Data Loading:
Explained the rationale behind multiple calls to setup_data(), emphasizing its role in ensuring that data loaders are fresh for each epoch and the conditions under which they are reloaded.
Improved Readability:
General improvements to code readability by adding comments that provide context for the flow of the code and the design decisions made, making it easier for future developers to understand the implementation.
Specific Method Highlights:
setup_data(): Documented its purpose in managing training data loaders and handling overfitting scenarios.
on_run_start(): Clarified its role in setting up validation data loaders and invoking relevant hooks.
on_advance_start(): Explained the necessity of calling setup_data() to prepare for the current epoch.

📚 Documentation preview 📚: https://pytorch-lightning--20402.org.readthedocs.build/en/20402/

…19734)

Co-authored-by: Alexander Jipa <[email protected]>

…#19727) Co-authored-by: dominicgkerr <[email protected]>

…-AI#19756)

…Lightning-AI#19755)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]>

)

…perparameters()` (Lightning-AI#19769)

…htning-AI#19774)

Lightning-AI#19781)

…AI#19690) Co-authored-by: Sebastian Raschka <[email protected]>

* update * update

…19852) * distributed checkpoints * use decorator * refactor if-strict * update example * filter non-persistent buffers (todo, add test) * simplify checkpoint loading for model

…oints (Lightning-AI#19870) * memory-optimized loading of full checkpoints into dist model * simplify * handle buffers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * handle strict loading, buffers, and add test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chlog --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…ghtning-AI#19872) * Load optimizer state * move to utility * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* ModelParallelStrategy for Lightning Trainer * mypy * import fix * fix torchscript errors * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docs issue * fix test execution * Update src/lightning/pytorch/strategies/model_parallel.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Luca Antiga <[email protected]>

* Add 2D parallel example * replace with torchtitan code

…tning-AI#20260)

* Add documentation note for TQDMProgressBar --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* upggrade requiremnets.txt * update fabric bitsandbytes linear quantization for bnb 0.44.1 * add quant_storage param * exclude macos from bnb upgrade * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…g-AI#20325) Co-authored-by: peter.mcelroy <[email protected]>

* docs: fix removed ref to `deepspeed.initialize` * fix links

…ning-AI#20354) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]>

Co-authored-by: Borda <[email protected]> Co-authored-by: Jirka Borovec <[email protected]>

Co-authored-by: Jirka Borovec <[email protected]>

update tutorials to `b83fde09` Co-authored-by: Borda <[email protected]>

lantiga · 2024-11-12T14:54:37Z

Thank you @pmatczak2 , your PR targets the main branch but it should target master. However when I retarget it, it seems to have no commits.

I'll close it for the time being:

can you re-open it against master
make sure it contains the correct commits
change the PR title to reflect its content

thanks!

awaelchli and others added 30 commits April 3, 2024 17:53

Skip test with compile error on torch=2.2.2 on Windows (Lightning-AI#…

8947d13

…19734)

Add synchronous parameter to MLflowLogger (Lightning-AI#19639)

ce88483

Co-authored-by: Alexander Jipa <[email protected]>

Support pathlib.Path file paths when saving ONNX models (Lightning-AI…

76b691d

…#19727) Co-authored-by: dominicgkerr <[email protected]>

Skip tests that cause CLI argparse errors on Python 3.11.9 (Lightning…

316cc71

…-AI#19756)

Fix initialized weights resetting in Fabric.setup() when using FSDP (…

dcb91d5

…Lightning-AI#19755)

[pre-commit.ci] pre-commit suggestions (Lightning-AI#19723)

3f97e16

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]>

ci/lint: simlify prettier (Lightning-AI#19742)

f642d68

Adding test for legacy checkpoint created with 2.2.2 (Lightning-AI#19760

67b270b

)

Sanitize hparams that can't be json-serialized in `WandbLogger.log_hy…

ce90b38

…perparameters()` (Lightning-AI#19769)

Use step interval in estimated_stepping_batches docs example (Lig…

58ad56a

…htning-AI#19774)

Remove the requirement for FSDPStrategy subclasses to only support GPU (

c235f20

Lightning-AI#19781)

Update Lightning Cloud to 0.5.67 (Lightning-AI#19795)

a2b3ddd

Update changelog after 2.2.2 release (Lightning-AI#19770)

b9680a3

Remove support for PyTorch 1.13 (Lightning-AI#19706)

5e0e02b

Avoid interactions through test artifacts (Lightning-AI#19821)

2913633

Add PyTorch 2.3 to CI matrix (Lightning-AI#19708)

49ed2b1

Fix TensorBoardLogger test on Windows (Lightning-AI#19824)

d194976

Make sure the HTTP client for queues retries for POST and 5xx

8103bd7

Fix formatting

4219f30

xfail tests for deprecated functionality

d623708

bump lightning cloud

0f12271

(1/n) Support 2D Parallelism (Lightning-AI#19846)

0c8a193

Add function to explicitly mark forward methods in Fabric (Lightning-…

e030727

…AI#19690) Co-authored-by: Sebastian Raschka <[email protected]>

Reduce queue fetching (Lightning-AI#19856)

8453e31

* update * update

Update Lightning Cloud 0.5.69 (Lightning-AI#19857)

90d04b5

(2/n) Support 2D Parallelism - Distributed Checkpoints (Lightning-AI#…

9455871

…19852) * distributed checkpoints * use decorator * refactor if-strict * update example * filter non-persistent buffers (todo, add test) * simplify checkpoint loading for model

(6/n) Support 2D Parallelism - Trainer example (Lightning-AI#19879)

c8059d7

* Add 2D parallel example * replace with torchtitan code

tshu-w and others added 12 commits September 30, 2024 18:08

Make RichProgressBar visible for both light and dark background (Ligh…

474bdd0

…tning-AI#20260)

docs: add note for TQDMProgressBar (Lightning-AI#20198)

66508ff

* Add documentation note for TQDMProgressBar --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Decoupled checkpoint artifact path from model artifact path (Lightnin…

8ad3e29

…g-AI#20325) Co-authored-by: peter.mcelroy <[email protected]>

fix(lint): emergency bump docformatter (Lightning-AI#20352)

af19dda

docs: fix removed ref to deepspeed.initialize (Lightning-AI#20353)

0e1e14f

* docs: fix removed ref to `deepspeed.initialize` * fix links

docs: fix pytorch version typo in upgrade/from_2_0 (Lightning-AI#20333)

6f86497

build(deps): bump Lightning-AI/utilities from 0.11.7 to 0.11.8 (Light…

8c5fc89

…ning-AI#20354) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]>

docs: update ref to latest tutorials (Lightning-AI#20322)

2110a39

Co-authored-by: Borda <[email protected]> Co-authored-by: Jirka Borovec <[email protected]>

Update version.info to 2.5.0.dev (Lightning-AI#20316)

06a8d5b

Co-authored-by: Jirka Borovec <[email protected]>

docs: update ref to latest tutorials (Lightning-AI#20387)

897b2af

update tutorials to `b83fde09` Co-authored-by: Borda <[email protected]>

ci: bump deprecated mac 12 to 13 (Lightning-AI#20393)

3627c5b

pmatczak2 requested review from Borda, ethanwharris, justusschock, lantiga, tchaton and williamFalcon as code owners November 6, 2024 19:18

github-actions bot added docs Documentation related ci Continuous Integration fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package dependencies Pull requests that update a dependency file dockers package store app data labels Nov 6, 2024

lantiga closed this Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Working branch #20402

Working branch #20402

Uh oh!

pmatczak2 commented Nov 6, 2024 •

edited by github-actions bot

Loading

Uh oh!

lantiga commented Nov 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

43 participants

Working branch #20402

Working branch #20402

Uh oh!

Conversation

pmatczak2 commented Nov 6, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lantiga commented Nov 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

43 participants

pmatczak2 commented Nov 6, 2024 •

edited by github-actions bot

Loading