Skip to content

Commit 43cf3ee

Browse files
authored
Merge pull request #4739 from samuel100/samuel100/foundry-local-sdk-updates
Samuel100/foundry local sdk updates
2 parents 495d9f5 + a04ff9e commit 43cf3ee

24 files changed

+1030
-682
lines changed

articles/ai-foundry/foundry-local/concepts/foundry-local-architecture.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,9 @@ The Foundry Local architecture consists of these main components:
3535

3636
### Foundry Local service
3737

38-
The Foundry Local Service is an OpenAI-compatible REST server that provides a standard interface for working with the inference engine and managing models. Developers use this API to send requests, run models, and get results programmatically.
38+
The Foundry Local Service includes an OpenAI-compatible REST server that provides a standard interface for working with the inference engine. It's also possible to manage models over REST. Developers use this API to send requests, run models, and get results programmatically.
3939

40-
- **Endpoint**: `http://localhost:5272/v1`
40+
- **Endpoint**: The endpoint is *dynamically allocated* when the service starts. You can find the endpoint by running the `foundry service status` command. When using Foundry Local in your applications, we recommend using the SDK that automatically handles the endpoint for you. For more details on how to use the Foundry Local SDK, read the [Integrated inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
4141
- **Use Cases**:
4242
- Connect Foundry Local to your custom applications
4343
- Execute models through HTTP requests
@@ -48,7 +48,7 @@ The ONNX Runtime is a core component that executes AI models. It runs optimized
4848

4949
**Features**:
5050

51-
- Works with multiple hardware providers (NVIDIA, AMD, Intel) and device types (NPUs, CPUs, GPUs)
51+
- Works with multiple hardware providers (NVIDIA, AMD, Intel, Qualcomm) and device types (NPUs, CPUs, GPUs)
5252
- Offers a consistent interface for running across models different hardware
5353
- Delivers best-in-class performance
5454
- Supports quantized models for faster inference
@@ -69,7 +69,7 @@ The model cache stores downloaded AI models locally on your device, which ensure
6969

7070
#### Model lifecycle
7171

72-
1. **Download**: Get models from the Azure AI Foundry model catalog and save them to your local disk.
72+
1. **Download**: Download models from the Azure AI Foundry model catalog and save them to your local disk.
7373
2. **Load**: Load models into the Foundry Local service memory for inference. Set a TTL (time-to-live) to control how long the model stays in memory (default: 10 minutes).
7474
3. **Run**: Execute model inference for your requests.
7575
4. **Unload**: Remove models from memory to free up resources when no longer needed.
@@ -114,7 +114,7 @@ Foundry Local supports integration with various SDKs, such as the OpenAI SDK, en
114114
- **Supported SDKs**: Python, JavaScript, C#, and more.
115115

116116
> [!TIP]
117-
> To learn more about integrating with inferencing SDKs, read [Integrate Foundry Local with Inferencing SDKs](../how-to/integrate-with-inference-sdks.md).
117+
> To learn more about integrating with inferencing SDKs, read [Integrate inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md).
118118
119119
#### AI Toolkit for Visual Studio Code
120120

@@ -128,5 +128,5 @@ The AI Toolkit for Visual Studio Code provides a user-friendly interface for dev
128128
## Next Steps
129129

130130
- [Get started with Foundry Local](../get-started.md)
131-
- [Integrate with Inference SDKs](../how-to/integrate-with-inference-sdks.md)
131+
- [Integrate inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md)
132132
- [Foundry Local CLI Reference](../reference/reference-cli.md)

articles/ai-foundry/foundry-local/get-started.md

Lines changed: 26 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -16,43 +16,50 @@ ms.custom: build-2025
1616

1717
# Get started with Foundry Local
1818

19-
This guide walks you through setting up Foundry Local to run AI models on your device. Follow these clear steps to install the tool, discover available models, and launch your first local AI model.
19+
This guide walks you through setting up Foundry Local to run AI models on your device.
2020

2121
## Prerequisites
2222

2323
Your system must meet the following requirements to run Foundry Local:
2424

25-
- **Operating System**: Windows 10 (x64), Windows 11 (x64/ARM), macOS, or Linux (x64/ARM)
25+
- **Operating System**: Windows 10 (x64), Windows 11 (x64/ARM), macOS.
2626
- **Hardware**: Minimum 8GB RAM, 3GB free disk space. Recommended 16GB RAM, 15GB free disk space.
2727
- **Network**: Internet connection for initial model download (optional for offline use)
28-
- **Acceleration (optional)**: NVIDIA GPU (2,000 series or newer), AMD GPU (6,000 series or newer), or Qualcomm Snapdragon X Elite, with 8GB or more of memory (RAM).
28+
- **Acceleration (optional)**: NVIDIA GPU (2,000 series or newer), AMD GPU (6,000 series or newer), Qualcomm Snapdragon X Elite (8GB or more of memory), or Apple silicon.
2929

3030
Also, ensure you have administrative privileges to install software on your device.
3131

3232
## Quickstart
3333

3434
Get started with Foundry Local quickly:
3535

36-
1. **Download** Foundry Local for your platform:
37-
- [Windows](https://aka.ms/foundry-local-windows)
38-
- [macOS](https://aka.ms/foundry-local-macos)
39-
- [Linux](https://aka.ms/foundry-local-linux)
40-
1. **Install** the package by following the on-screen prompts.
41-
1. **Run your first model** Open a terminal window and run the following command to run a model (the model will be downloaded and an interactive prompt will appear):
36+
1. [**Download Foundry Local Installer**](https://aka.ms/foundry-local-installer) and **install** by following the on-screen prompts.
37+
> [!TIP]
38+
> If you're installing on Windows, you can also use `winget` to install Foundry Local. Open a terminal window and run the following command:
39+
>
40+
> ```powershell
41+
> winget install Microsoft.FoundryLocal
42+
> ```
43+
1. **Run your first model** Open a terminal window and run the following command to run a model:
4244
4345
```bash
44-
foundry model run phi-3-mini-4k
46+
foundry model run deepseek-r1-1.5b
4547
```
48+
49+
The model downloads - which can take a few minutes, depending on your internet speed - and the model runs. Once the model is running, you can interact with it using the command line interface (CLI). For example, you can ask:
4650
47-
> [!TIP]
48-
> You can replace `phi-3-mini-4k` with any model name from the catalog (see `foundry model list` for available models). Foundry Local will download the model variant that best matches your system's hardware and software configuration. For example, if you have an NVIDIA GPU, it will download the CUDA version of the model. If you have an QNN NPU, it will download the NPU variant. If you have no GPU or NPU, it will download the CPU version.
51+
```text
52+
Why is the sky blue?
53+
```
54+
55+
You should see a response from the model in the terminal:
56+
:::image type="content" source="media/get-started-output.png" alt-text="Screenshot of output from foundry local run command." lightbox="media/get-started-output.png":::
4957
50-
> [!IMPORTANT]
51-
> **For macOS/Linux users:** Run both components in separate terminals:
52-
> - Neutron Server (`Inference.Service.Agent`) - Make it executable with `chmod +x Inference.Service.Agent`
53-
> - Foundry Client (`foundry`) - Make it executable with `chmod +x foundry` and add it to your PATH
5458
55-
## Explore Foundry Local CLI commands
59+
> [!TIP]
60+
> You can replace `deepseek-r1-1.5b` with any model name from the catalog (see `foundry model list` for available models). Foundry Local downloads the model variant that best matches your system's hardware and software configuration. For example, if you have an NVIDIA GPU, it downloads the CUDA version of the model. If you have a Qualcomm NPU, it downloads the NPU variant. If you have no GPU or NPU, it downloads the CPU version.
61+
62+
## Explore commands
5663
5764
The Foundry CLI organizes commands into these main categories:
5865
@@ -89,9 +96,9 @@ foundry cache --help
8996

9097
## Next steps
9198

92-
- [Learn how to integrate Foundry Local with your applications](how-to/integrate-with-inference-sdks.md)
99+
- [Integrate inferencing SDKs with Foundry Local](how-to/how-to-integrate-with-inference-sdks.md)
93100
- [Explore the Foundry Local documentation](index.yml)
94101
- [Learn about best practices and troubleshooting](reference/reference-best-practice.md)
95102
- [Explore the Foundry Local API reference](reference/reference-catalog-api.md)
96-
- [Learn how to compile Hugging Face models](how-to/how-to-compile-hugging-face-models.md)
103+
- [Learn Compile Hugging Face models](how-to/how-to-compile-hugging-face-models.md)
97104

Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
2-
title: Build a chat application with Open Web UI
2+
title: Integrate Open Web UI with Foundry Local
33
titleSuffix: Foundry Local
44
description: Learn how to create a chat application using Foundry Local and Open Web UI
55
manager: scottpolly
66
keywords: Azure AI services, cognitive, AI models, local inference
77
ms.service: azure-ai-foundry
8-
ms.topic: tutorial
8+
ms.topic: how-to
99
ms.date: 02/20/2025
1010
ms.reviewer: samkemp
1111
ms.author: samkemp
@@ -14,19 +14,15 @@ ms.custom: build-2025
1414
#customer intent: As a developer, I want to get started with Foundry Local so that I can run AI models locally.
1515
---
1616

17-
# Build a chat application with Open Web UI
17+
# Integrate Open Web UI with Foundry Local
1818

19-
This tutorial shows you how to create a chat application using Foundry Local and Open Web UI. When you finish, you'll have a working chat interface running entirely on your local device.
19+
This tutorial shows you how to create a chat application using Foundry Local and Open Web UI. When you finish, you have a working chat interface running entirely on your local device.
2020

2121
## Prerequisites
2222

2323
Before you start this tutorial, you need:
2424

25-
- **Foundry Local** [installed](../get-started.md) on your computer.
26-
- **At least one model loaded** with the `foundry model load` command, like this:
27-
```bash
28-
foundry model load Phi-4-mini-gpu-int4-rtn-block-32
29-
```
25+
- **Foundry Local** installed on your computer. Read the [Get started with Foundry Local](../get-started.md) guide for installation instructions.
3026

3127
## Set up Open Web UI for chat
3228

@@ -46,18 +42,18 @@ Before you start this tutorial, you need:
4642
2. Select **Connections**
4743
3. Select **Manage Direct Connections**
4844
4. Select the **+** icon to add a connection
49-
5. Enter `http://localhost:5272/v1` for the URL
50-
6. Type any value (like `test`) for the API Key, since it cannot be empty
45+
5. For the **URL**, enter `http://localhost:PORT/v1` where `PORT` is replaced with the port of the Foundry Local endpoint, which you can find using the CLI command `foundry service status`. Note, that Foundry Local dynamically assigns a port, so it's not always the same.
46+
6. Type any value (like `test`) for the API Key, since it can't be empty.
5147
7. Save your connection
5248

5349
5. **Start chatting with your model**:
54-
1. Your loaded models will appear in the dropdown at the top
50+
1. Your loaded models appear in the dropdown at the top
5551
2. Select any model from the list
5652
3. Type your message in the input box at the bottom
5753

5854
That's it! You're now chatting with an AI model running entirely on your local device.
5955

6056
## Next steps
6157

62-
- [Build an application with LangChain](use-langchain-with-foundry-local.md)
63-
- [How to compile Hugging Face models to run on Foundry Local](../how-to/how-to-compile-hugging-face-models.md)
58+
- [Integrate inferencing SDKs with Foundry Local](how-to-integrate-with-inference-sdks.md)
59+
- [Compile Hugging Face models to run on Foundry Local](../how-to/how-to-compile-hugging-face-models.md)

articles/ai-foundry/foundry-local/how-to/how-to-compile-hugging-face-models.md

Lines changed: 44 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: How to compile Hugging Face models to run on Foundry Local
2+
title: Compile Hugging Face models to run on Foundry Local
33
titleSuffix: Foundry Local
4-
description: Learn how to compile and run Hugging Face models with Foundry Local.
4+
description: Learn Compile and run Hugging Face models with Foundry Local.
55
manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.custom: build-2025
@@ -11,7 +11,7 @@ ms.author: samkemp
1111
author: samuel100
1212
---
1313

14-
# How to compile Hugging Face models to run on Foundry Local
14+
# Compile Hugging Face models to run on Foundry Local
1515

1616
Foundry Local runs ONNX models on your device with high performance. While the model catalog offers _out-of-the-box_ precompiled options, you can use any model in the ONNX format.
1717

@@ -219,9 +219,16 @@ foundry cache ls # should show llama-3.2
219219
foundry cache cd models
220220
foundry cache ls # should show llama-3.2
221221
```
222-
223222
---
224223

224+
> [!CAUTION]
225+
> Remember to change the model cache back to the default directory when you're done by running:
226+
>
227+
> ```bash
228+
> foundry cache cd ./foundry/cache/models.
229+
> ```
230+
231+
225232
### Using the Foundry Local CLI
226233
227234
### [Bash](#tab/Bash)
@@ -235,40 +242,6 @@ foundry model run llama-3.2 --verbose
235242
```powershell
236243
foundry model run llama-3.2 --verbose
237244
```
238-
239-
---
240-
241-
### Using the REST API
242-
243-
### [Bash](#tab/Bash)
244-
245-
```bash
246-
curl -X POST http://localhost:5272/v1/chat/completions \
247-
-H "Content-Type: application/json" \
248-
-d '{
249-
"model": "llama-3.2",
250-
"messages": [{"role": "user", "content": "What is the capital of France?"}],
251-
"temperature": 0.7,
252-
"max_tokens": 50,
253-
"stream": true
254-
}'
255-
```
256-
257-
### [PowerShell](#tab/PowerShell)
258-
259-
```powershell
260-
Invoke-RestMethod -Uri http://localhost:5272/v1/chat/completions `
261-
-Method Post `
262-
-ContentType "application/json" `
263-
-Body '{
264-
"model": "llama-3.2",
265-
"messages": [{"role": "user", "content": "What is the capital of France?"}],
266-
"temperature": 0.7,
267-
"max_tokens": 50,
268-
"stream": true
269-
}'
270-
```
271-
272245
---
273246

274247
### Using the OpenAI Python SDK
@@ -277,35 +250,54 @@ The OpenAI Python SDK is a convenient way to interact with the Foundry Local RES
277250

278251
```bash
279252
pip install openai
253+
pip install foundry-local-sdk
280254
```
281255

282256
Then, you can use the following code to run the model:
283257

284258
```python
285-
from openai import OpenAI
259+
import openai
260+
from foundry_local import FoundryLocalManager
261+
262+
modelId = "llama-3.2"
263+
264+
# Create a FoundryLocalManager instance. This will start the Foundry
265+
# Local service if it is not already running and load the specified model.
266+
manager = FoundryLocalManager(modelId)
286267

287-
client = OpenAI(
288-
base_url="http://localhost:5272/v1",
289-
api_key="none", # required but not used
268+
# The remaining code us es the OpenAI Python SDK to interact with the local model.
269+
270+
# Configure the client to use the local Foundry service
271+
client = openai.OpenAI(
272+
base_url=manager.endpoint,
273+
api_key=manager.api_key # API key is not required for local usage
290274
)
291275

276+
# Set the model to use and generate a streaming response
292277
stream = client.chat.completions.create(
293-
model="llama-3.2",
294-
messages=[{"role": "user", "content": "What is the capital of France?"}],
295-
temperature=0.7,
296-
max_tokens=50,
297-
stream=True,
278+
model=manager.get_model_info(modelId).id,
279+
messages=[{"role": "user", "content": "What is the golden ratio?"}],
280+
stream=True
298281
)
299282

300-
for event in stream:
301-
print(event.choices[0].delta.content, end="", flush=True)
302-
print("\n\n")
283+
# Print the streaming response
284+
for chunk in stream:
285+
if chunk.choices[0].delta.content is not None:
286+
print(chunk.choices[0].delta.content, end="", flush=True)
303287
```
304288

305289
> [!TIP]
306-
> You can use any language that supports HTTP requests. See [Integrate with Inferencing SDKs](integrate-with-inference-sdks.md) for more options.
290+
> You can use any language that supports HTTP requests. For more information, read the [Integrated inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
291+
292+
## Finishing up
293+
294+
After you're done using the custom model, you should reset the model cache to the default directory using:
295+
296+
```bash
297+
foundry cache cd ./foundry/cache/models
298+
```
307299

308300
## Next steps
309301

310302
- [Learn more about Olive](https://microsoft.github.io/Olive/)
311-
- [Integrate Foundry Local with Inferencing SDKs](integrate-with-inference-sdks.md)
303+
- [Integrate inferencing SDKs with Foundry Local](how-to-integrate-with-inference-sdks.md)
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
title: Integrate with inference SDKs
3+
titleSuffix: Foundry Local
4+
description: This article provides instructions on how to integrate Foundry Local with common Inferencing SDKs.
5+
manager: scottpolly
6+
ms.service: azure-ai-foundry
7+
ms.custom: build-2025
8+
ms.topic: how-to
9+
ms.date: 02/12/2025
10+
ms.author: samkemp
11+
zone_pivot_groups: foundry-local-sdk
12+
author: samuel100
13+
---
14+
15+
# Integrate inferencing SDKs with Foundry Local
16+
17+
Foundry Local integrates with various inferencing SDKs - such as OpenAI, Azure OpenAI, Langchain, etc. This guide shows you how to connect your applications to locally running AI models using popular SDKs.
18+
19+
## Prerequisites
20+
21+
- Foundry Local installed. See the [Get started with Foundry Local](../get-started.md) article for installation instructions.
22+
23+
::: zone pivot="programming-language-python"
24+
[!INCLUDE [Python](../includes/integrate-examples/python.md)]
25+
::: zone-end
26+
::: zone pivot="programming-language-javascript"
27+
[!INCLUDE [JavaScript](../includes/integrate-examples/javascript.md)]
28+
::: zone-end
29+
30+
## Next steps
31+
32+
- [Compile Hugging Face models to run on Foundry Local](how-to-compile-hugging-face-models.md)
33+
- [Explore the Foundry Local CLI reference](../reference/reference-cli.md)

0 commit comments

Comments
 (0)