Skip to content

Commit 0ddb5b7

Browse files
authored
Merge pull request #13 from samuel100/fixing_merge_errors
Fixed suggestions.
2 parents 4d1f25e + dbb2436 commit 0ddb5b7

File tree

3 files changed

+31
-32
lines changed

3 files changed

+31
-32
lines changed

articles/ai-foundry-local/concepts/foundry-local-architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Key benefits of Foundry Local include:
3131

3232
The Foundry Local architecture consists of these main components:
3333

34-
:::image type="content" source="../media/architecture/foundry-local-arch.png" alt-text="Diagram of Foundry Local Architecture":::
34+
:::image type="content" source="../media/architecture/foundry-local-arch.png" alt-text="Diagram of Foundry Local Architecture.":::
3535

3636
### Foundry Local service
3737

articles/ai-foundry-local/how-to/how-to-compile-huggingface-models.md

Lines changed: 19 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: How to compile Hugging Face models to run on Foundry Local
2+
title: How to compile HuggingFace models to run on Foundry Local
33
titleSuffix: Foundry Local
4-
description: Learn how to compile and run Hugging Face models with Foundry Local.
4+
description: Learn how to compile and run HuggingFace models with Foundry Local.
55
manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.custom: build-2025
@@ -11,17 +11,17 @@ ms.author: samkemp
1111
author: samuel100
1212
---
1313

14-
# How to compile Hugging Face models to run on Foundry Local
14+
# How to compile HuggingFace models to run on Foundry Local
1515

16-
Foundry Local runs ONNX models on your device with high performance. While the model catalog offers *out-of-the-box* precompiled options, you can use any model in the ONNX format.
16+
Foundry Local runs ONNX models on your device with high performance. While the model catalog offers _out-of-the-box_ precompiled options, you can use any model in the ONNX format.
1717

18-
To compile existing models in Safetensor or PyTorch format into the ONNX format, you can use [Olive](https://microsoft.github.io/Olive). Olive is a tool that optimizes models to ONNX format, making them suitable for deployment in Foundry Local. It uses techniques like *quantization* and *graph optimization* to improve performance.
18+
To compile existing models in Safetensor or PyTorch format into the ONNX format, you can use [Olive](https://microsoft.github.io/Olive). Olive is a tool that optimizes models to ONNX format, making them suitable for deployment in Foundry Local. It uses techniques like _quantization_ and _graph optimization_ to improve performance.
1919

2020
This guide shows you how to:
2121

2222
> [!div class="checklist"]
2323
>
24-
> - **Convert and optimize** models from Hugging Face to run in Foundry Local. You'll use the `Llama-3.2-1B-Instruct` model as an example, but you can use any generative AI model from Hugging Face.
24+
> - **Convert and optimize** models from HuggingFace to run in Foundry Local. You'll use the `Llama-3.2-1B-Instruct` model as an example, but you can use any generative AI model from HuggingFace.
2525
> - **Run** your optimized models with Foundry Local
2626
2727
## Prerequisites
@@ -49,9 +49,9 @@ pip install olive-ai[auto-opt]
4949
> [!TIP]
5050
> For best results, install Olive in a virtual environment using [venv](https://docs.python.org/3/library/venv.html) or [conda](https://www.anaconda.com/docs/getting-started/miniconda/main).
5151
52-
## Sign in to Hugging Face
52+
## Sign in to HuggingFace
5353

54-
You optimize the `Llama-3.2-1B-Instruct` model, which requires Hugging Face authentication:
54+
You optimize the `Llama-3.2-1B-Instruct` model, which requires HuggingFace authentication:
5555

5656
### [Bash](#tab/Bash)
5757

@@ -68,15 +68,14 @@ huggingface-cli login
6868
---
6969

7070
> [!NOTE]
71-
> You must first [create a Hugging Face token](https://huggingface.co/docs/hub/security-tokens) and [request model access](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) before proceeding.
71+
> You must first [create a HuggingFace token](https://huggingface.co/docs/hub/security-tokens) and [request model access](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) before proceeding.
7272
7373
## Compile the model
7474

7575
### Step 1: Run the Olive auto-opt command
7676

7777
Use the Olive `auto-opt` command to download, convert, quantize, and optimize the model:
7878

79-
8079
### [Bash](#tab/Bash)
8180

8281
```bash
@@ -112,18 +111,17 @@ olive auto-opt `
112111
113112
The command uses the following parameters:
114113

115-
| Parameter | Description |
116-
| -------------------- | -------------------------------------------------------------------------- |
117-
| `model_name_or_path` | Model source: Hugging Face ID, local path, or Azure AI Model registry ID |
118-
| `output_path` | Where to save the optimized model |
119-
| `device` | Target hardware: `cpu`, `gpu`, or `npu` |
114+
| Parameter | Description |
115+
| -------------------- | --------------------------------------------------------------------------------- |
116+
| `model_name_or_path` | Model source: HuggingFace ID, local path, or Azure AI Model registry ID |
117+
| `output_path` | Where to save the optimized model |
118+
| `device` | Target hardware: `cpu`, `gpu`, or `npu` |
120119
| `provider` | Execution provider (for example, `CPUExecutionProvider`, `CUDAExecutionProvider`) |
121-
| `precision` | Model precision: `fp16`, `fp32`, `int4`, or `int8` |
122-
| `use_ort_genai` | Creates inference configuration files |
123-
120+
| `precision` | Model precision: `fp16`, `fp32`, `int4`, or `int8` |
121+
| `use_ort_genai` | Creates inference configuration files |
124122

125123
> [!TIP]
126-
> If you have a local copy of the model, you can use a local path instead of the Hugging Face ID. For example, `--model_name_or_path models/llama-3.2-1B-Instruct`. Olive handles the conversion, optimization, and quantization automatically.
124+
> If you have a local copy of the model, you can use a local path instead of the HuggingFace ID. For example, `--model_name_or_path models/llama-3.2-1B-Instruct`. Olive handles the conversion, optimization, and quantization automatically.
127125
128126
### Step 2: Rename the output model
129127

@@ -161,10 +159,10 @@ Foundry Local requires a chat template JSON file called `inference_model.json` i
161159
}
162160
```
163161

164-
To create the chat template file, you can use the `apply_chat_template` method from the Hugging Face library:
162+
To create the chat template file, you can use the `apply_chat_template` method from the HuggingFace library:
165163

166164
> [!NOTE]
167-
> The following example uses the Python Hugging Face library to create a chat template. The Hugging Face library is a dependency for Olive, so if you're using the same Python virtual environment you don't need to install. If you're using a different environment, install the library with `pip install transformers`.
165+
> The following example uses the Python HuggingFace library to create a chat template. The HuggingFace library is a dependency for Olive, so if you're using the same Python virtual environment you don't need to install. If you're using a different environment, install the library with `pip install transformers`.
168166
169167
```python
170168
# generate_inference_model.py

articles/ai-foundry-local/tutorials/chat-application-with-open-web-ui.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ Before you start this tutorial, you need:
3333
1. **Install Open Web UI** by following the instructions from the [Open Web UI GitHub repository](https://github.com/open-webui/open-webui).
3434

3535
2. **Launch Open Web UI** with this command in your terminal:
36+
3637
```bash
3738
open-webui serve
3839
```
@@ -41,18 +42,18 @@ Before you start this tutorial, you need:
4142

4243
4. **Connect Open Web UI to Foundry Local**:
4344

44-
- Click **Settings** in the navigation menu
45-
- Select **Connections**
46-
- Click **Manage Direct Connections**
47-
- Click the **+** icon to add a connection
48-
- Enter `http://localhost:5272/v1` for the URL
49-
- Type any value (like `test`) for the API Key, since it cannot be empty
50-
- Save your connection
45+
1. Click **Settings** in the navigation menu
46+
2. Select **Connections**
47+
3. Click **Manage Direct Connections**
48+
4. Click the **+** icon to add a connection
49+
5. Enter `http://localhost:5272/v1` for the URL
50+
6. Type any value (like `test`) for the API Key, since it cannot be empty
51+
7. Save your connection
5152

5253
5. **Start chatting with your model**:
53-
- Your loaded models will appear in the dropdown at the top
54-
- Select any model from the list
55-
- Type your message in the input box at the bottom
54+
1. Your loaded models will appear in the dropdown at the top
55+
2. Select any model from the list
56+
3. Type your message in the input box at the bottom
5657

5758
That's it! You're now chatting with an AI model running entirely on your local device.
5859

0 commit comments

Comments
 (0)