You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry-local/how-to/how-to-compile-huggingface-models.md
+19-21Lines changed: 19 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: How to compile Hugging Face models to run on Foundry Local
2
+
title: How to compile HuggingFace models to run on Foundry Local
3
3
titleSuffix: Foundry Local
4
-
description: Learn how to compile and run Hugging Face models with Foundry Local.
4
+
description: Learn how to compile and run HuggingFace models with Foundry Local.
5
5
manager: scottpolly
6
6
ms.service: azure-ai-foundry
7
7
ms.custom: build-2025
@@ -11,17 +11,17 @@ ms.author: samkemp
11
11
author: samuel100
12
12
---
13
13
14
-
# How to compile Hugging Face models to run on Foundry Local
14
+
# How to compile HuggingFace models to run on Foundry Local
15
15
16
-
Foundry Local runs ONNX models on your device with high performance. While the model catalog offers *out-of-the-box* precompiled options, you can use any model in the ONNX format.
16
+
Foundry Local runs ONNX models on your device with high performance. While the model catalog offers _out-of-the-box_ precompiled options, you can use any model in the ONNX format.
17
17
18
-
To compile existing models in Safetensor or PyTorch format into the ONNX format, you can use [Olive](https://microsoft.github.io/Olive). Olive is a tool that optimizes models to ONNX format, making them suitable for deployment in Foundry Local. It uses techniques like *quantization* and *graph optimization* to improve performance.
18
+
To compile existing models in Safetensor or PyTorch format into the ONNX format, you can use [Olive](https://microsoft.github.io/Olive). Olive is a tool that optimizes models to ONNX format, making them suitable for deployment in Foundry Local. It uses techniques like _quantization_ and _graph optimization_ to improve performance.
19
19
20
20
This guide shows you how to:
21
21
22
22
> [!div class="checklist"]
23
23
>
24
-
> -**Convert and optimize** models from Hugging Face to run in Foundry Local. You'll use the `Llama-3.2-1B-Instruct` model as an example, but you can use any generative AI model from Hugging Face.
24
+
> -**Convert and optimize** models from HuggingFace to run in Foundry Local. You'll use the `Llama-3.2-1B-Instruct` model as an example, but you can use any generative AI model from HuggingFace.
25
25
> -**Run** your optimized models with Foundry Local
26
26
27
27
## Prerequisites
@@ -49,9 +49,9 @@ pip install olive-ai[auto-opt]
49
49
> [!TIP]
50
50
> For best results, install Olive in a virtual environment using [venv](https://docs.python.org/3/library/venv.html) or [conda](https://www.anaconda.com/docs/getting-started/miniconda/main).
51
51
52
-
## Sign in to Hugging Face
52
+
## Sign in to HuggingFace
53
53
54
-
You optimize the `Llama-3.2-1B-Instruct` model, which requires Hugging Face authentication:
54
+
You optimize the `Llama-3.2-1B-Instruct` model, which requires HuggingFace authentication:
55
55
56
56
### [Bash](#tab/Bash)
57
57
@@ -68,15 +68,14 @@ huggingface-cli login
68
68
---
69
69
70
70
> [!NOTE]
71
-
> You must first [create a Hugging Face token](https://huggingface.co/docs/hub/security-tokens) and [request model access](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) before proceeding.
71
+
> You must first [create a HuggingFace token](https://huggingface.co/docs/hub/security-tokens) and [request model access](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) before proceeding.
72
72
73
73
## Compile the model
74
74
75
75
### Step 1: Run the Olive auto-opt command
76
76
77
77
Use the Olive `auto-opt` command to download, convert, quantize, and optimize the model:
> If you have a local copy of the model, you can use a local path instead of the Hugging Face ID. For example, `--model_name_or_path models/llama-3.2-1B-Instruct`. Olive handles the conversion, optimization, and quantization automatically.
124
+
> If you have a local copy of the model, you can use a local path instead of the HuggingFace ID. For example, `--model_name_or_path models/llama-3.2-1B-Instruct`. Olive handles the conversion, optimization, and quantization automatically.
127
125
128
126
### Step 2: Rename the output model
129
127
@@ -161,10 +159,10 @@ Foundry Local requires a chat template JSON file called `inference_model.json` i
161
159
}
162
160
```
163
161
164
-
To create the chat template file, you can use the `apply_chat_template` method from the Hugging Face library:
162
+
To create the chat template file, you can use the `apply_chat_template` method from the HuggingFace library:
165
163
166
164
> [!NOTE]
167
-
> The following example uses the Python Hugging Face library to create a chat template. The Hugging Face library is a dependency for Olive, so if you're using the same Python virtual environment you don't need to install. If you're using a different environment, install the library with `pip install transformers`.
165
+
> The following example uses the Python HuggingFace library to create a chat template. The HuggingFace library is a dependency for Olive, so if you're using the same Python virtual environment you don't need to install. If you're using a different environment, install the library with `pip install transformers`.
0 commit comments