You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry-local/concepts/foundry-local-architecture.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: Foundry Local Architecture
2
+
title: Foundry Local architecture
3
3
titleSuffix: Foundry Local
4
4
description: Learn about the architecture and components of Foundry Local
5
5
manager: scottpolly
@@ -11,7 +11,7 @@ ms.author: samkemp
11
11
author: samuel100
12
12
---
13
13
14
-
# Foundry Local Architecture
14
+
# Foundry Local architecture
15
15
16
16
Foundry Local enables efficient, secure, and scalable AI model inference directly on your devices. This article explains the core components of Foundry Local and how they work together to deliver AI capabilities.
17
17
@@ -27,13 +27,13 @@ Key benefits of Foundry Local include:
27
27
> -**Offline Operation**: Work without an internet connection in remote or disconnected environments.
28
28
> -**Seamless Integration**: Easily incorporate into existing development workflows for smooth adoption.
29
29
30
-
## Key Components
30
+
## Key components
31
31
32
32
The Foundry Local architecture consists of these main components:
33
33
34
-
:::image type="content" source="../media/architecture/foundry-local-arch.png" alt-text="Foundry Local Architecture Diagram":::
34
+
:::image type="content" source="../media/architecture/foundry-local-arch.png" alt-text="Diagram of Foundry Local Architecture":::
35
35
36
-
### Foundry Local Service
36
+
### Foundry Local service
37
37
38
38
The Foundry Local Service is an OpenAI-compatible REST server that provides a standard interface for working with the inference engine and managing models. Developers use this API to send requests, run models, and get results programmatically.
39
39
@@ -42,7 +42,7 @@ The Foundry Local Service is an OpenAI-compatible REST server that provides a st
42
42
- Connect Foundry Local to your custom applications
43
43
- Execute models through HTTP requests
44
44
45
-
### ONNX Runtime
45
+
### ONNX runtime
46
46
47
47
The ONNX Runtime is a core component that executes AI models. It runs optimized ONNX models efficiently on local hardware like CPUs, GPUs, or NPUs.
48
48
@@ -53,11 +53,11 @@ The ONNX Runtime is a core component that executes AI models. It runs optimized
53
53
- Delivers best-in-class performance
54
54
- Supports quantized models for faster inference
55
55
56
-
### Model Management
56
+
### Model management
57
57
58
58
Foundry Local provides robust tools for managing AI models, ensuring that they're readily available for inference and easy to maintain. Model management is handled through the **Model Cache** and the **Command-Line Interface (CLI)**.
59
59
60
-
#### Model Cache
60
+
#### Model cache
61
61
62
62
The model cache stores downloaded AI models locally on your device, which ensures models are ready for inference without needing to download them repeatedly. You can manage the cache using either the Foundry CLI or REST API.
63
63
@@ -67,29 +67,29 @@ The model cache stores downloaded AI models locally on your device, which ensure
67
67
-`foundry cache remove <model-name>`: Removes a specific model from the cache
68
68
-`foundry cache cd <path>`: Changes the storage location for cached models
69
69
70
-
#### Model Lifecycle
70
+
#### Model lifecycle
71
71
72
72
1.**Download**: Get models from the Azure AI Foundry model catalog and save them to your local disk.
73
73
2.**Load**: Load models into the Foundry Local service memory for inference. Set a TTL (time-to-live) to control how long the model stays in memory (default: 10 minutes).
74
74
3.**Run**: Execute model inference for your requests.
75
75
4.**Unload**: Remove models from memory to free up resources when no longer needed.
76
76
5.**Delete**: Remove models from your local cache to reclaim disk space.
77
77
78
-
#### Model Compilation using Olive
78
+
#### Model compilation using Olive
79
79
80
80
Before models can be used with Foundry Local, they must be compiled and optimized in the [ONNX](https://onnx.ai) format. Microsoft provides a selection of published models in the Azure AI Foundry Model Catalog that are already optimized for Foundry Local. However, you aren't limited to those models - by using [Olive](https://microsoft.github.io/Olive/). Olive is a powerful framework for preparing AI models for efficient inference. It converts models into the ONNX format, optimizes their graph structure, and applies techniques like quantization to improve performance on local hardware.
81
81
82
82
> [!TIP]
83
-
> To learn more about compiling models for Foundry Local, read [How to compile Hugging Face models to run on Foundry Local](../how-to/how-to-compile-hf-models.md).
83
+
> To learn more about compiling models for Foundry Local, read [How to compile Hugging Face models to run on Foundry Local](../how-to/how-to-compile-huggingface-models.md).
84
84
85
-
### Hardware Abstraction Layer
85
+
### Hardware abstraction layer
86
86
87
87
The hardware abstraction layer ensures that Foundry Local can run on various devices by abstracting the underlying hardware. To optimize performance based on the available hardware, Foundry Local supports:
88
88
89
89
-**multiple _execution providers_**, such as NVIDIA CUDA, AMD, Qualcomm, Intel.
90
90
-**multiple _device types_**, such as CPU, GPU, NPU.
91
91
92
-
### Developer Experiences
92
+
### Developer experiences
93
93
94
94
The Foundry Local architecture is designed to provide a seamless developer experience, enabling easy integration and interaction with AI models.
95
95
Developers can choose from various interfaces to interact with the system, including:
@@ -107,7 +107,7 @@ The Foundry CLI is a powerful tool for managing models, the inference engine, an
107
107
> [!TIP]
108
108
> To learn more about the CLI commands, read [Foundry Local CLI Reference](../reference/reference-cli.md).
109
109
110
-
#### Inferencing SDK Integration
110
+
#### Inferencing SDK integration
111
111
112
112
Foundry Local supports integration with various SDKs, such as the OpenAI SDK, enabling developers to use familiar programming interfaces to interact with the local inference engine.
Foundry Local provides a REST API endpoint that makes it easy to integrate with various inferencing SDKs and programming languages. This guide shows you how to connect your applications to locally running AI models using popular SDKs.
18
18
@@ -48,5 +48,5 @@ When Foundry Local is running, it exposes an OpenAI-compatible REST API endpoint
48
48
49
49
## Next steps
50
50
51
-
-[How to compile Hugging Face models to run on Foundry Local](how-to-compile-hf-models.md)
51
+
-[How to compile Hugging Face models to run on Foundry Local](how-to-compile-huggingface-models.md)
52
52
-[Explore the Foundry Local CLI reference](../reference/reference-cli.md)
0 commit comments