Skip to content

Commit d2a4904

Browse files
committed
update README to include new features and remove outdated msg
ghstack-source-id: 99bca8c Pull Request resolved: #574
1 parent 2a25f4d commit d2a4904

File tree

1 file changed

+15
-19
lines changed

1 file changed

+15
-19
lines changed

README.md

Lines changed: 15 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
# torchtitan
55

6-
`torchtitan` is currently in a pre-release state and under extensive development.
6+
`torchtitan` is currently in a pre-release state and under extensive development. Currently we showcase pre-training **Llama 3.1**, **Llama 3**, and **Llama 2** LLMs of various sizes from scratch. To use the latest features of `torchtitan`, we recommend latest PyTorch nightly.
77

88
`torchtitan` is a proof-of-concept for Large-scale LLM training using native PyTorch. It is (and will continue to be) a repo to showcase PyTorch's latest distributed training features in a clean, minimal codebase. torchtitan is complementary to and not a replacement for any of the great large-scale LLM training codebases such as Megatron, Megablocks, LLM Foundry, Deepspeed, etc. Instead, we hope that the features showcased in torchtitan will be adopted by these codebases quickly. torchtitan is unlikely to ever grow a large community around it.
99

@@ -26,34 +26,30 @@ You may want to see how the model is defined or how parallelism techniques are a
2626
* [torchtitan/parallelisms/pipeline_llama.py](torchtitan/parallelisms/pipeline_llama.py) - helpers for applying Pipeline Parallel to the model
2727
* [torchtitan/checkpoint.py](torchtitan/checkpoint.py) - utils for saving/loading distributed checkpoints
2828
* [torchtitan/float8.py](torchtitan/float8.py) - utils for applying Float8 techniques
29-
* [torchtitan/models/llama/model.py](torchtitan/models/llama/model.py) - the Llama model definition (shared for Llama2 and Llama3 variants)
30-
31-
## Pre-Release Updates:
32-
#### (4/25/2024): `torchtitan` is now public but in a pre-release state and under development.
33-
Currently we showcase pre-training **Llama 3 and Llama 2** LLMs of various sizes from scratch. `torchtitan` is tested and verified with the PyTorch nightly version `torch-2.4.0.dev20240412`. (We recommend latest PyTorch nightly).
29+
* [torchtitan/models/llama/model.py](torchtitan/models/llama/model.py) - the Llama model definition (shared for Llama 2 and Llama 3 variants)
3430

3531
### Key features available
3632

37-
1. [FSDP2 with per param sharding](docs/fsdp.md)
38-
2. [Tensor Parallel](https://pytorch.org/docs/stable/distributed.tensor.parallel.html) (including async TP)
33+
1. [FSDP2](docs/fsdp.md) with per param sharding
34+
2. [Tensor Parallel](https://pytorch.org/docs/stable/distributed.tensor.parallel.html) (including [async TP](https://discuss.pytorch.org/t/distributed-w-torchtitan-introducing-async-tensor-parallelism-in-pytorch/209487))
3935
3. Selective layer and operator activation checkpointing
4036
4. Distributed checkpointing (including async checkpointing)
4137
5. Checkpointable data-loading, with the C4 dataset pre-configured (144M entries)
42-
6. Loss, GPU memory, tokens-per-second, and MFU displayed and logged via TensorBoard
43-
7. Learning rate scheduler, meta-init, optional Fused RMSNorm into [`torchtune`](https://github.com/pytorch/torchtune) for fine tuning
44-
8. [Float8 support](docs/float8.md)
38+
6. Loss, GPU memory, tokens-per-second, and MFU displayed and logged via [TensorBoard](#tensorboard)
39+
7. Learning rate scheduler, meta-init, optional Fused RMSNorm
40+
8. [Float8](https://discuss.pytorch.org/t/distributed-w-torchtitan-enabling-float8-all-gather-in-fsdp2/209323) support ([how-to](docs/float8.md))
4541
9. `torch.compile` support
46-
10. All options easily configured via [toml files](train_configs/)
47-
11. [Interoperable checkpoints](docs/checkpoint.md) which can be loaded directly
42+
10. DDP and HSDP
43+
11. All options easily configured via [toml files](train_configs/)
44+
12. [Interoperable checkpoints](docs/checkpoint.md) which can be loaded directly into [`torchtune`](https://github.com/pytorch/torchtune) for fine-tuning
4845

49-
We report our [Performance](docs/performance.md) verified on 64 A100 GPUs
46+
We report our [Performance](docs/performance.md) verified on 64/128 GPUs.
5047

5148

5249
### Coming soon
5350

54-
1. Context Parallel
55-
2. Pipeline Parallel (and 3D parallellism)
56-
3. HSDP
51+
- Pipeline Parallel (and 3D parallellism)
52+
- Context Parallel
5753

5854

5955
## Installation
@@ -74,10 +70,10 @@ Once you have confirmed access, you can run the following command to download th
7470
```bash
7571
# Get your HF token from https://huggingface.co/settings/tokens
7672

77-
# llama3 or 3.1 tokenizer.model
73+
# Llama 3 or 3.1 tokenizer.model
7874
python torchtitan/datasets/download_tokenizer.py --repo_id meta-llama/Meta-Llama-3-8B --tokenizer_path "original" --hf_token=...
7975

80-
# llama2 tokenizer.model
76+
# Llama 2 tokenizer.model
8177
python torchtitan/datasets/download_tokenizer.py --repo_id meta-llama/Llama-2-13b-hf --hf_token=...
8278
```
8379

0 commit comments

Comments
 (0)