Skip to content

Commit 5be3d4a

Browse files
authored
Add README for quickstart + update to codellama url (meta-llama#578)
2 parents f01bbe2 + 02a0386 commit 5be3d4a

File tree

5 files changed

+39
-3
lines changed

5 files changed

+39
-3
lines changed

recipes/quickstart/README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
## Llama-Recipes Quickstart
2+
3+
If you are new to developing with Meta Llama models, this is where you should start. This folder contains introductory-level notebooks across different techniques relating to Meta Llama.
4+
5+
* The [](./Running_Llama3_Anywhere/) notebooks demonstrate how to run Llama inference across Linux, Mac and Windows platforms using the appropriate tooling.
6+
* The [](./Prompt_Engineering_with_Llama_3.ipynb) notebook showcases the various ways to elicit appropriate outputs from Llama. Take this notebook for a spin to get a feel for how Llama responds to different inputs and generation parameters.
7+
* The [](./inference/) folder contains scripts to deploy Llama for inference on server and mobile. See also [](../3p_integration/vllm/) and [](../3p_integration/tgi/) for hosting Llama on open-source model servers.
8+
* The [](./RAG/) folder contains a simple Retrieval-Augmented Generation application using Llama 3.
9+
* The [](./finetuning/) folder contains resources to help you finetune Llama 3 on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-recipes finetuning code found in [](../../src/llama_recipes/finetuning.py) which supports these features:
10+
11+
| Feature | |
12+
| ---------------------------------------------- | - |
13+
| HF support for finetuning ||
14+
| Deferred initialization ( meta init) ||
15+
| HF support for inference ||
16+
| Low CPU mode for multi GPU ||
17+
| Mixed precision ||
18+
| Single node quantization ||
19+
| Flash attention ||
20+
| PEFT ||
21+
| Activation checkpointing FSDP ||
22+
| Hybrid Sharded Data Parallel (HSDP) ||
23+
| Dataset packing & padding ||
24+
| BF16 Optimizer ( Pure BF16) ||
25+
| Profiling & MFU tracking ||
26+
| Gradient accumulation ||
27+
| CPU offloading ||
28+
| FSDP checkpoint conversion to HF for inference ||
29+
| W&B experiment tracker ||
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
## Quickstart > Inference
2+
3+
This folder contains scripts to get you started with inference on Meta Llama models.
4+
5+
* [](./code_llama/) contains scripts for tasks relating to code generation using CodeLlama
6+
* [](./local_inference/) contsin scripts to do memory efficient inference on servers and local machines
7+
* [](./mobile_inference/) has scripts using MLC to serve Llama on Android (h/t to OctoAI for the contribution!)

recipes/quickstart/inference/code_llama/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Code llama was recently released with three flavors, base-model that support mul
44

55
Find the scripts to run Code Llama, where there are two examples of running code completion and infilling.
66

7-
**Note** Please find the right model on HF side [here](https://huggingface.co/codellama).
7+
**Note** Please find the right model on HF [here](https://huggingface.co/models?search=meta-llama%20codellama).
88

99
Make sure to install Transformers from source for now
1010

@@ -36,4 +36,4 @@ To run the 70B Instruct model example run the following (you'll need to enter th
3636
python code_instruct_example.py --model_name codellama/CodeLlama-70b-Instruct-hf --temperature 0.2 --top_p 0.9
3737

3838
```
39-
You can learn more about the chat prompt template [on HF](https://huggingface.co/codellama/CodeLlama-70b-Instruct-hf#chat-prompt) and [original Code Llama repository](https://github.com/facebookresearch/codellama/blob/main/README.md#fine-tuned-instruction-models). HF tokenizer has already taken care of the chat template as shown in this example.
39+
You can learn more about the chat prompt template [on HF](https://huggingface.co/meta-llama/CodeLlama-70b-Instruct-hf#chat-prompt) and [original Code Llama repository](https://github.com/meta-llama/codellama/blob/main/README.md#fine-tuned-instruction-models). HF tokenizer has already taken care of the chat template as shown in this example.

recipes/quickstart/inference/local_inference/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ python inference.py --model_name <training_config.output_dir> --peft_model <trai
6161

6262
```
6363

64-
## Loading back FSDP checkpoints
64+
## Inference with FSDP checkpoints
6565

6666
In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../../src/llama_recipes/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above.
6767
**To convert the checkpoint use the following command**:

0 commit comments

Comments
 (0)