-
Notifications
You must be signed in to change notification settings - Fork 41
MLP tutorials update #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
MLP tutorials update #209
Changes from 3 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
cc50a68
Spelling corrections to MLP tutorials
lukasgd 01fe8e5
Updated MLP tutorials
lukasgd d37d03c
Merge branch 'main' into mlp-tutorials-update
lukasgd a10be36
Update docs/access/jupyterlab.md
lukasgd 6017a7e
Apply suggestions from code review
lukasgd e96ee0f
Update docs/access/jupyterlab.md
lukasgd 122c3ff
Update docs/guides/mlp_tutorials/llm-inference.md
lukasgd d0476fd
Update docs/guides/mlp_tutorials/llm-inference.md
lukasgd fb22a00
Update docs/guides/mlp_tutorials/llm-inference.md
lukasgd 758d019
Update docs/guides/mlp_tutorials/llm-inference.md
lukasgd 0e0285f
Update docs/guides/mlp_tutorials/llm-inference.md
lukasgd 4a74f96
Update docs/guides/mlp_tutorials/llm-nanotron-training.md
lukasgd fb1629f
Update docs/guides/mlp_tutorials/llm-nanotron-training.md
lukasgd 691b11f
Update docs/guides/mlp_tutorials/llm-nanotron-training.md
lukasgd 80f2c19
Update docs/guides/mlp_tutorials/index.md
lukasgd b66b0eb
Using console instead of bash with hostnames in the shell prompt and …
lukasgd e0644a3
Merge branch 'main' into mlp-tutorials-update
lukasgd 404a203
Integrating @Madeeks comment
lukasgd 6b56fb4
Update docs/guides/mlp_tutorials/llm-inference.md
lukasgd 2b7f549
Update docs/guides/mlp_tutorials/llm-inference.md
lukasgd File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,11 +1,10 @@ | ||
| [](){#ref-guides-mlp-tutorials} | ||
| # MLP Tutorials | ||
| # Machine Learning Platform Tutorials | ||
|
|
||
| These tutorials solve simple MLP tasks using the [Container Engine][ref-container-engine] on the ML Platform. | ||
|
|
||
| 1. [LLM Inference][ref-mlp-llm-inference-tutorial] | ||
| 2. [LLM Fine-tuning][ref-mlp-llm-finetuning-tutorial] | ||
| 3. [Nanotron Training][ref-mlp-llm-nanotron-tutorial] | ||
| These tutorials gradually introduce key concepts of the Machine Learning Platform. A particular focus is on the [Container Engine][ref-container-engine] for managing the runtime environment. | ||
|
|
||
| In a [first tutorial][ref-mlp-llm-inference-tutorial], you will learn how to run an inference with an LLM on a single node using a container from the NVIDIA GPU Cloud (NGC). Concepts such as container environment description, layering a thin virtual environment on top of the container image and job launching and monitoring will be introduced. | ||
lukasgd marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Building on the first tutorial, in the [second tutorial][ref-mlp-llm-fine-tuning-tutorial] you will learn how to train (fine-tune) an LLM on multiple GPUs on a single node. For this purpose, you will use HuggingFace's `accelerate` and see best practices for dataset management. | ||
lukasgd marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| In the [third tutorial][ref-mlp-llm-nanotron-tutorial], you will apply the techniques from the previous tutorials to enable distributed (pre-)training of a model `nanotron` on multiple nodes. In particular, this tutorial makes use of model-parallelism and introduces the usage of `torchrun` to manage jobs on individual nodes. | ||
lukasgd marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.