Skip to content

Commit bd6e1f4

Browse files
Merge pull request #4805 from jonburchel/release-build-ai-foundry-non-FDP-features
[SCOPED] Merged human reviewed release-build-foundry-local branch onto release-build-ai-foundry-non-FDP-features branch
2 parents e1345b7 + b3f8788 commit bd6e1f4

23 files changed

+2722
-1
lines changed
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
---
2+
title: Foundry Local architecture
3+
titleSuffix: Foundry Local
4+
description: Learn about the architecture and components of Foundry Local
5+
manager: scottpolly
6+
ms.service: azure-ai-foundry
7+
ms.custom: build-2025
8+
ms.topic: concept-article
9+
ms.date: 02/12/2025
10+
ms.author: samkemp
11+
author: samuel100
12+
---
13+
14+
# Foundry Local architecture
15+
16+
Foundry Local enables efficient, secure, and scalable AI model inference directly on your devices. This article explains the core components of Foundry Local and how they work together to deliver AI capabilities.
17+
18+
Key benefits of Foundry Local include:
19+
20+
> [!div class="checklist"]
21+
>
22+
> - **Low Latency**: Run models locally to minimize processing time and deliver faster results.
23+
> - **Data Privacy**: Process sensitive data locally without sending it to the cloud, helping meet data protection requirements.
24+
> - **Flexibility**: Support for diverse hardware configurations lets you choose the optimal setup for your needs.
25+
> - **Scalability**: Deploy across various devices, from laptops to servers, to suit different use cases.
26+
> - **Cost-Effectiveness**: Reduce cloud computing costs, especially for high-volume applications.
27+
> - **Offline Operation**: Work without an internet connection in remote or disconnected environments.
28+
> - **Seamless Integration**: Easily incorporate into existing development workflows for smooth adoption.
29+
30+
## Key components
31+
32+
The Foundry Local architecture consists of these main components:
33+
34+
:::image type="content" source="../media/architecture/foundry-local-arch.png" alt-text="Diagram of Foundry Local Architecture.":::
35+
36+
### Foundry Local service
37+
38+
The Foundry Local Service includes an OpenAI-compatible REST server that provides a standard interface for working with the inference engine. It's also possible to manage models over REST. Developers use this API to send requests, run models, and get results programmatically.
39+
40+
- **Endpoint**: The endpoint is *dynamically allocated* when the service starts. You can find the endpoint by running the `foundry service status` command. When using Foundry Local in your applications, we recommend using the SDK that automatically handles the endpoint for you. For more details on how to use the Foundry Local SDK, read the [Integrated inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
41+
- **Use Cases**:
42+
- Connect Foundry Local to your custom applications
43+
- Execute models through HTTP requests
44+
45+
### ONNX runtime
46+
47+
The ONNX Runtime is a core component that executes AI models. It runs optimized ONNX models efficiently on local hardware like CPUs, GPUs, or NPUs.
48+
49+
**Features**:
50+
51+
- Works with multiple hardware providers (NVIDIA, AMD, Intel, Qualcomm) and device types (NPUs, CPUs, GPUs)
52+
- Offers a consistent interface for running across models different hardware
53+
- Delivers best-in-class performance
54+
- Supports quantized models for faster inference
55+
56+
### Model management
57+
58+
Foundry Local provides robust tools for managing AI models, ensuring that they're readily available for inference and easy to maintain. Model management is handled through the **Model Cache** and the **Command-Line Interface (CLI)**.
59+
60+
#### Model cache
61+
62+
The model cache stores downloaded AI models locally on your device, which ensures models are ready for inference without needing to download them repeatedly. You can manage the cache using either the Foundry CLI or REST API.
63+
64+
- **Purpose**: Speeds up inference by keeping models locally available
65+
- **Key Commands**:
66+
- `foundry cache list`: Shows all models in your local cache
67+
- `foundry cache remove <model-name>`: Removes a specific model from the cache
68+
- `foundry cache cd <path>`: Changes the storage location for cached models
69+
70+
#### Model lifecycle
71+
72+
1. **Download**: Download models from the Azure AI Foundry model catalog and save them to your local disk.
73+
2. **Load**: Load models into the Foundry Local service memory for inference. Set a TTL (time-to-live) to control how long the model stays in memory (default: 10 minutes).
74+
3. **Run**: Execute model inference for your requests.
75+
4. **Unload**: Remove models from memory to free up resources when no longer needed.
76+
5. **Delete**: Remove models from your local cache to reclaim disk space.
77+
78+
#### Model compilation using Olive
79+
80+
Before models can be used with Foundry Local, they must be compiled and optimized in the [ONNX](https://onnx.ai) format. Microsoft provides a selection of published models in the Azure AI Foundry Model Catalog that are already optimized for Foundry Local. However, you aren't limited to those models - by using [Olive](https://microsoft.github.io/Olive/). Olive is a powerful framework for preparing AI models for efficient inference. It converts models into the ONNX format, optimizes their graph structure, and applies techniques like quantization to improve performance on local hardware.
81+
82+
> [!TIP]
83+
> To learn more about compiling models for Foundry Local, read [How to compile Hugging Face models to run on Foundry Local](../how-to/how-to-compile-hugging-face-models.md).
84+
85+
### Hardware abstraction layer
86+
87+
The hardware abstraction layer ensures that Foundry Local can run on various devices by abstracting the underlying hardware. To optimize performance based on the available hardware, Foundry Local supports:
88+
89+
- **multiple _execution providers_**, such as NVIDIA CUDA, AMD, Qualcomm, Intel.
90+
- **multiple _device types_**, such as CPU, GPU, NPU.
91+
92+
### Developer experiences
93+
94+
The Foundry Local architecture is designed to provide a seamless developer experience, enabling easy integration and interaction with AI models.
95+
Developers can choose from various interfaces to interact with the system, including:
96+
97+
#### Command-Line Interface (CLI)
98+
99+
The Foundry CLI is a powerful tool for managing models, the inference engine, and the local cache.
100+
101+
**Examples**:
102+
103+
- `foundry model list`: Lists all available models in the local cache.
104+
- `foundry model run <model-name>`: Runs a model.
105+
- `foundry service status`: Checks the status of the service.
106+
107+
> [!TIP]
108+
> To learn more about the CLI commands, read [Foundry Local CLI Reference](../reference/reference-cli.md).
109+
110+
#### Inferencing SDK integration
111+
112+
Foundry Local supports integration with various SDKs, such as the OpenAI SDK, enabling developers to use familiar programming interfaces to interact with the local inference engine.
113+
114+
- **Supported SDKs**: Python, JavaScript, C#, and more.
115+
116+
> [!TIP]
117+
> To learn more about integrating with inferencing SDKs, read [Integrate inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md).
118+
119+
#### AI Toolkit for Visual Studio Code
120+
121+
The AI Toolkit for Visual Studio Code provides a user-friendly interface for developers to interact with Foundry Local. It allows users to run models, manage the local cache, and visualize results directly within the IDE.
122+
123+
- **Features**:
124+
- Model management: Download, load, and run models from within the IDE.
125+
- Interactive console: Send requests and view responses in real-time.
126+
- Visualization tools: Graphical representation of model performance and results.
127+
128+
## Next Steps
129+
130+
- [Get started with Foundry Local](../get-started.md)
131+
- [Integrate inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md)
132+
- [Foundry Local CLI Reference](../reference/reference-cli.md)
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
title: Get started with Foundry Local
3+
titleSuffix: Foundry Local
4+
description: Learn how to install, configure, and run your first AI model with Foundry Local
5+
manager: scottpolly
6+
keywords: Azure AI services, cognitive, AI models, local inference
7+
ms.service: azure-ai-foundry
8+
ms.topic: quickstart
9+
ms.date: 02/20/2025
10+
ms.reviewer: samkemp
11+
ms.author: samkemp
12+
author: samuel100
13+
ms.custom: build-2025
14+
#customer intent: As a developer, I want to get started with Foundry Local so that I can run AI models locally.
15+
---
16+
17+
# Get started with Foundry Local
18+
19+
This guide walks you through setting up Foundry Local to run AI models on your device.
20+
21+
## Prerequisites
22+
23+
Your system must meet the following requirements to run Foundry Local:
24+
25+
- **Operating System**: Windows 10 (x64), Windows 11 (x64/ARM), macOS.
26+
- **Hardware**: Minimum 8GB RAM, 3GB free disk space. Recommended 16GB RAM, 15GB free disk space.
27+
- **Network**: Internet connection for initial model download (optional for offline use)
28+
- **Acceleration (optional)**: NVIDIA GPU (2,000 series or newer), AMD GPU (6,000 series or newer), Qualcomm Snapdragon X Elite (8GB or more of memory), or Apple silicon.
29+
30+
Also, ensure you have administrative privileges to install software on your device.
31+
32+
## Quickstart
33+
34+
Get started with Foundry Local quickly:
35+
36+
1. [**Download Foundry Local Installer**](https://aka.ms/foundry-local-installer) and **install** by following the on-screen prompts.
37+
> [!TIP]
38+
> If you're installing on Windows, you can also use `winget` to install Foundry Local. Open a terminal window and run the following command:
39+
>
40+
> ```powershell
41+
> winget install Microsoft.FoundryLocal
42+
> ```
43+
1. **Run your first model** Open a terminal window and run the following command to run a model:
44+
45+
```bash
46+
foundry model run deepseek-r1-1.5b
47+
```
48+
49+
The model downloads - which can take a few minutes, depending on your internet speed - and the model runs. Once the model is running, you can interact with it using the command line interface (CLI). For example, you can ask:
50+
51+
```text
52+
Why is the sky blue?
53+
```
54+
55+
You should see a response from the model in the terminal:
56+
:::image type="content" source="media/get-started-output.png" alt-text="Screenshot of output from foundry local run command." lightbox="media/get-started-output.png":::
57+
58+
59+
> [!TIP]
60+
> You can replace `deepseek-r1-1.5b` with any model name from the catalog (see `foundry model list` for available models). Foundry Local downloads the model variant that best matches your system's hardware and software configuration. For example, if you have an NVIDIA GPU, it downloads the CUDA version of the model. If you have a Qualcomm NPU, it downloads the NPU variant. If you have no GPU or NPU, it downloads the CPU version.
61+
62+
## Explore commands
63+
64+
The Foundry CLI organizes commands into these main categories:
65+
66+
- **Model**: Commands for managing and running models.
67+
- **Service**: Commands for managing the Foundry Local service.
68+
- **Cache**: Commands for managing the local model cache (downloaded models on local disk).
69+
70+
View all available commands with:
71+
72+
```bash
73+
foundry --help
74+
```
75+
76+
To view available **model** commands, run:
77+
78+
```bash
79+
foundry model --help
80+
```
81+
To view available **service** commands, run:
82+
83+
```bash
84+
foundry service --help
85+
```
86+
87+
To view available **cache** commands, run:
88+
89+
```bash
90+
foundry cache --help
91+
```
92+
93+
> [!TIP]
94+
> For a complete guide to all CLI commands and their usage, see the [Foundry Local CLI Reference](reference/reference-cli.md).
95+
96+
97+
## Next steps
98+
99+
- [Integrate inferencing SDKs with Foundry Local](how-to/how-to-integrate-with-inference-sdks.md)
100+
- [Explore the Foundry Local documentation](index.yml)
101+
- [Learn about best practices and troubleshooting](reference/reference-best-practice.md)
102+
- [Explore the Foundry Local API reference](reference/reference-catalog-api.md)
103+
- [Learn Compile Hugging Face models](how-to/how-to-compile-hugging-face-models.md)
104+
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
---
2+
title: Integrate Open Web UI with Foundry Local
3+
titleSuffix: Foundry Local
4+
description: Learn how to create a chat application using Foundry Local and Open Web UI
5+
manager: scottpolly
6+
keywords: Azure AI services, cognitive, AI models, local inference
7+
ms.service: azure-ai-foundry
8+
ms.topic: how-to
9+
ms.date: 02/20/2025
10+
ms.reviewer: samkemp
11+
ms.author: samkemp
12+
author: samuel100
13+
ms.custom: build-2025
14+
#customer intent: As a developer, I want to get started with Foundry Local so that I can run AI models locally.
15+
---
16+
17+
# Integrate Open Web UI with Foundry Local
18+
19+
This tutorial shows you how to create a chat application using Foundry Local and Open Web UI. When you finish, you have a working chat interface running entirely on your local device.
20+
21+
## Prerequisites
22+
23+
Before you start this tutorial, you need:
24+
25+
- **Foundry Local** installed on your computer. Read the [Get started with Foundry Local](../get-started.md) guide for installation instructions.
26+
27+
## Set up Open Web UI for chat
28+
29+
1. **Install Open Web UI** by following the instructions from the [Open Web UI GitHub repository](https://github.com/open-webui/open-webui).
30+
31+
2. **Launch Open Web UI** with this command in your terminal:
32+
33+
```bash
34+
open-webui serve
35+
```
36+
37+
3. Open your web browser and go to [http://localhost:8080](http://localhost:8080).
38+
39+
4. **Connect Open Web UI to Foundry Local**:
40+
41+
1. Select **Settings** in the navigation menu
42+
2. Select **Connections**
43+
3. Select **Manage Direct Connections**
44+
4. Select the **+** icon to add a connection
45+
5. For the **URL**, enter `http://localhost:PORT/v1` where `PORT` is replaced with the port of the Foundry Local endpoint, which you can find using the CLI command `foundry service status`. Note, that Foundry Local dynamically assigns a port, so it's not always the same.
46+
6. Type any value (like `test`) for the API Key, since it can't be empty.
47+
7. Save your connection
48+
49+
5. **Start chatting with your model**:
50+
1. Your loaded models appear in the dropdown at the top
51+
2. Select any model from the list
52+
3. Type your message in the input box at the bottom
53+
54+
That's it! You're now chatting with an AI model running entirely on your local device.
55+
56+
## Next steps
57+
58+
- [Integrate inferencing SDKs with Foundry Local](how-to-integrate-with-inference-sdks.md)
59+
- [Compile Hugging Face models to run on Foundry Local](../how-to/how-to-compile-hugging-face-models.md)

0 commit comments

Comments
 (0)