|
4 | 4 | "cell_type": "markdown",
|
5 | 5 | "metadata": {},
|
6 | 6 | "source": [
|
7 |
| - "## Running Llama2 on Google Colab using Hugging Face transformers library\n", |
8 |
| - "This notebook goes over how you can set up and run Llama2 using Hugging Face transformers library\n", |
| 7 | + "## Running Meta Llama 3 on Google Colab using Hugging Face transformers library\n", |
| 8 | + "This notebook goes over how you can set up and run Llama 3 using Hugging Face transformers library\n", |
9 | 9 | "<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/Running_Llama2_Anywhere/Running_Llama_on_HF_transformers.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
|
10 | 10 | ]
|
11 | 11 | },
|
|
14 | 14 | "metadata": {},
|
15 | 15 | "source": [
|
16 | 16 | "### Steps at a glance:\n",
|
17 |
| - "This demo showcases how to run the example with already converted Llama 2 weights on [Hugging Face](https://huggingface.co/meta-llama). Please Note: To use the downloads on Hugging Face, you must first request a download as shown in the steps below making sure that you are using the same email address as your Hugging Face account.\n", |
| 17 | + "This demo showcases how to run the example with already converted Llama 3 weights on [Hugging Face](https://huggingface.co/meta-llama). Please Note: To use the downloads on Hugging Face, you must first request a download as shown in the steps below making sure that you are using the same email address as your Hugging Face account.\n", |
18 | 18 | "\n",
|
19 | 19 | "To use already converted weights, start here:\n",
|
20 | 20 | "1. Request download of model weights from the Llama website\n",
|
21 |
| - "2. Prepare the script\n", |
| 21 | + "2. Login to Hugging Face from your terminal using the same email address as (1). Follow the instructions [here](https://huggingface.co/docs/huggingface_hub/en/quick-start). \n", |
22 | 22 | "3. Run the example\n",
|
23 | 23 | "\n",
|
24 | 24 | "\n",
|
|
45 | 45 | "Request download of model weights from the Llama website\n",
|
46 | 46 | "Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
|
47 | 47 | "\n",
|
48 |
| - "Fill the required information, select the models “Llama 2 & Llama Chat” and accept the terms & conditions. You will receive a URL in your email in a short time." |
| 48 | + "Fill the required information, select the models “Meta Llama 3” and accept the terms & conditions. You will receive a URL in your email in a short time." |
49 | 49 | ]
|
50 | 50 | },
|
51 | 51 | {
|
|
79 | 79 | },
|
80 | 80 | {
|
81 | 81 | "cell_type": "code",
|
82 |
| - "execution_count": null, |
| 82 | + "execution_count": 2, |
83 | 83 | "metadata": {},
|
84 | 84 | "outputs": [],
|
85 | 85 | "source": [
|
|
92 | 92 | "cell_type": "markdown",
|
93 | 93 | "metadata": {},
|
94 | 94 | "source": [
|
95 |
| - "Then, we will set the model variable to a specific model we’d like to use. In this demo, we will use the 7b chat model `meta-llama/Llama-2-7b-chat-hf`." |
| 95 | + "Then, we will set the model variable to a specific model we’d like to use. In this demo, we will use the 8b chat model `meta-llama/Meta-Llama-3-8B-Instruct`. Using Meta models from Hugging Face requires you to\n", |
| 96 | + "\n", |
| 97 | + "1. Accept Terms of Service for Meta Llama 3 on Meta [website](https://llama.meta.com/llama-downloads).\n", |
| 98 | + "2. Use the same email address from Step (1) to login into Hugging Face.\n", |
| 99 | + "\n", |
| 100 | + "Follow the instructions on this Hugging Face page to login from your [terminal](https://huggingface.co/docs/huggingface_hub/en/quick-start). " |
| 101 | + ] |
| 102 | + }, |
| 103 | + { |
| 104 | + "cell_type": "code", |
| 105 | + "execution_count": null, |
| 106 | + "metadata": {}, |
| 107 | + "outputs": [], |
| 108 | + "source": [ |
| 109 | + "pip install --upgrade huggingface_hub" |
96 | 110 | ]
|
97 | 111 | },
|
98 | 112 | {
|
|
101 | 115 | "metadata": {},
|
102 | 116 | "outputs": [],
|
103 | 117 | "source": [
|
104 |
| - "model = \"meta-llama/Llama-2-7b-chat-hf\"\n", |
| 118 | + "from huggingface_hub import login\n", |
| 119 | + "login()" |
| 120 | + ] |
| 121 | + }, |
| 122 | + { |
| 123 | + "cell_type": "code", |
| 124 | + "execution_count": null, |
| 125 | + "metadata": {}, |
| 126 | + "outputs": [], |
| 127 | + "source": [ |
| 128 | + "model = \"meta-llama/Meta-Llama-3-8B-Instruct\"\n", |
105 | 129 | "tokenizer = AutoTokenizer.from_pretrained(model)"
|
106 | 130 | ]
|
107 | 131 | },
|
|
174 | 198 | "Request download of model weights from the Llama website\n",
|
175 | 199 | "Before you can run the model locally, you will need to get the model weights. To get the model weights, visit the [Llama website](https://llama.meta.com/) and click on “download models”. \n",
|
176 | 200 | "\n",
|
177 |
| - "Fill the required information, select the models “Llama 2 & Llama Chat” and accept the terms & conditions. You will receive a URL in your email in a short time.\n" |
| 201 | + "Fill the required information, select the models \"Meta Llama 3\" and accept the terms & conditions. You will receive a URL in your email in a short time." |
178 | 202 | ]
|
179 | 203 | },
|
180 | 204 | {
|
181 | 205 | "cell_type": "markdown",
|
182 | 206 | "metadata": {},
|
183 | 207 | "source": [
|
184 | 208 | "#### 2. Clone the llama repo and get the weights\n",
|
185 |
| - "Git clone the [Llama repo](https://github.com/facebookresearch/llama.git). Enter the URL and get 7B-chat weights. This will download the tokenizer.model, and a directory llama-2-7b-chat with the weights in it.\n", |
| 209 | + "Git clone the [Meta Llama 3 repo](https://github.com/meta-llama/llama3). Run the `download.sh` script and follow the instructions. This will download the model checkpoints and tokenizer.\n", |
186 | 210 | "\n",
|
187 |
| - "This example demonstrates a llama2 model with 7B-chat parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models.\n", |
188 |
| - "\n" |
| 211 | + "This example demonstrates a Meta Llama 3 model with 8B-instruct parameters, but the steps we follow would be similar for other llama models, as well as for other parameter models." |
189 | 212 | ]
|
190 | 213 | },
|
191 | 214 | {
|
192 | 215 | "cell_type": "markdown",
|
193 | 216 | "metadata": {},
|
194 | 217 | "source": [
|
195 |
| - "#### 3. Convert the model weights\n", |
196 |
| - "\n", |
197 |
| - "* Create a link to the tokenizer:\n", |
198 |
| - "Run `ln -h ./tokenizer.model ./llama-2-7b-chat/tokenizer.model` \n", |
199 |
| - "\n", |
| 218 | + "#### 3. Convert the model weights using Hugging Face transformer from source\n", |
200 | 219 | "\n",
|
201 |
| - "* Convert the model weights to run with Hugging Face:``TRANSFORM=`python -c \"import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')\"``\n", |
202 |
| - "\n", |
203 |
| - "* Then run: `pip install protobuf && python $TRANSFORM --input_dir ./llama-2-7b-chat --model_size 7B --output_dir ./llama-2-7b-chat-hf`\n" |
| 220 | + "* `python3 -m venv hf-convertor`\n", |
| 221 | + "* `source hf-convertor/bin/activate`\n", |
| 222 | + "* `git clone https://github.com/huggingface/transformers.git`\n", |
| 223 | + "* `cd transformers`\n", |
| 224 | + "* `pip install -e .`\n", |
| 225 | + "* `pip install torch tiktoken blobfile accelerate`\n", |
| 226 | + "* `python3 src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir ${path_to_meta_downloaded_model} --output_dir ${path_to_save_converted_hf_model} --model_size 8B --llama_version 3`" |
204 | 227 | ]
|
205 | 228 | },
|
206 | 229 | {
|
|
210 | 233 | "\n",
|
211 | 234 | "#### 4. Prepare the script\n",
|
212 | 235 | "Import the following necessary modules in your script: \n",
|
213 |
| - "* `LlamaForCausalLM` is the Llama 2 model class\n", |
214 |
| - "* `LlamaTokenizer` prepares your prompt for the model to process\n", |
215 |
| - "* `pipeline` is an abstraction to generate model outputs\n", |
216 |
| - "* `torch` allows us to use PyTorch and specify the datatype we’d like to use." |
| 236 | + "* `AutoModel` is the Llama 2 model class\n", |
| 237 | + "* `AutoTokenizer` prepares your prompt for the model to process\n", |
| 238 | + "* `pipeline` is an abstraction to generate model outputs" |
217 | 239 | ]
|
218 | 240 | },
|
219 | 241 | {
|
|
224 | 246 | "source": [
|
225 | 247 | "import torch\n",
|
226 | 248 | "import transformers\n",
|
227 |
| - "from transformers import LlamaForCausalLM, LlamaTokenizer\n", |
228 |
| - "\n", |
229 |
| - "\n", |
230 |
| - "model_dir = \"./llama-2-7b-chat-hf\"\n", |
231 |
| - "model = LlamaForCausalLM.from_pretrained(model_dir)\n", |
| 249 | + "from transformers import AutoModelForCausalLM, AutoTokenizer\n", |
232 | 250 | "\n",
|
233 |
| - "tokenizer = LlamaTokenizer.from_pretrained(model_dir)\n" |
| 251 | + "model_dir = \"${path_the_converted_hf_model}\"\n", |
| 252 | + "model = AutoModelForCausalLM.from_pretrained(\n", |
| 253 | + " model_dir,\n", |
| 254 | + " device_map=\"auto\",\n", |
| 255 | + " )\n", |
| 256 | + "tokenizer = AutoTokenizer.from_pretrained(model_dir)\n" |
234 | 257 | ]
|
235 | 258 | },
|
236 | 259 | {
|
|
242 | 265 | },
|
243 | 266 | {
|
244 | 267 | "cell_type": "code",
|
245 |
| - "execution_count": null, |
| 268 | + "execution_count": 2, |
246 | 269 | "metadata": {},
|
247 | 270 | "outputs": [],
|
248 | 271 | "source": [
|
|
272 | 295 | },
|
273 | 296 | {
|
274 | 297 | "cell_type": "code",
|
275 |
| - "execution_count": null, |
| 298 | + "execution_count": 3, |
276 | 299 | "metadata": {},
|
277 |
| - "outputs": [], |
| 300 | + "outputs": [ |
| 301 | + { |
| 302 | + "name": "stderr", |
| 303 | + "output_type": "stream", |
| 304 | + "text": [ |
| 305 | + "Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.\n", |
| 306 | + "Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.\n" |
| 307 | + ] |
| 308 | + } |
| 309 | + ], |
278 | 310 | "source": [
|
279 | 311 | "sequences = pipeline(\n",
|
280 | 312 | " 'I have tomatoes, basil and cheese at home. What can I cook for dinner?\\n',\n",
|
|
296 | 328 | "name": "python3"
|
297 | 329 | },
|
298 | 330 | "language_info": {
|
| 331 | + "codemirror_mode": { |
| 332 | + "name": "ipython", |
| 333 | + "version": 3 |
| 334 | + }, |
| 335 | + "file_extension": ".py", |
| 336 | + "mimetype": "text/x-python", |
299 | 337 | "name": "python",
|
300 |
| - "version": "3.8.3" |
| 338 | + "nbconvert_exporter": "python", |
| 339 | + "pygments_lexer": "ipython3", |
| 340 | + "version": "3.8.10" |
301 | 341 | }
|
302 | 342 | },
|
303 | 343 | "nbformat": 4,
|
|
0 commit comments