You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/models-featured.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -251,9 +251,7 @@ See [the Microsoft model collection in Azure AI Foundry portal](https://ai.azure
251
251
Mistral AI offers two categories of models, namely:
252
252
253
253
-_Premium models_: These include Mistral Large, Mistral Small, Mistral-OCR-2503, Mistral Medium 3 (25.05), and Ministral 3B models, and are available as serverless APIs with pay-as-you-go token-based billing.
254
-
-_Premium models_: These include Mistral Large, Mistral Small, Mistral-OCR-2503, and Ministral 3B models, and are available as standard deployments.
255
-
-_Open models_: These include Mistral-small-2503, Codestral, and Mistral Nemo (that are available as standard deployments), and [Mixtral-8x7B-Instruct-v01, Mixtral-8x7B-v01, Mistral-7B-Instruct-v01, and Mistral-7B-v01](../how-to/deploy-models-mistral-open.md)(that are available to download and run on self-hosted managed endpoints).
256
-
254
+
-_Open models_: These include Mistral-small-2503, Codestral, and Mistral Nemo (that are available as serverless APIs with pay-as-you-go token-based billing), and [Mixtral-8x7B-Instruct-v01, Mixtral-8x7B-v01, Mistral-7B-Instruct-v01, and Mistral-7B-v01](../how-to/deploy-models-mistral-open.md)(that are available to download and run on self-hosted managed endpoints).
Foundry Local enables efficient, secure, and scalable AI model inference directly on your devices. This article explains the core components of Foundry Local and how they work together to deliver AI capabilities.
17
19
18
20
Key benefits of Foundry Local include:
@@ -37,7 +39,7 @@ The Foundry Local architecture consists of these main components:
37
39
38
40
The Foundry Local Service includes an OpenAI-compatible REST server that provides a standard interface for working with the inference engine. It's also possible to manage models over REST. Developers use this API to send requests, run models, and get results programmatically.
39
41
40
-
-**Endpoint**: The endpoint is *dynamically allocated* when the service starts. You can find the endpoint by running the `foundry service status` command. When using Foundry Local in your applications, we recommend using the SDK that automatically handles the endpoint for you. For more details on how to use the Foundry Local SDK, read the [Integrated inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
42
+
-**Endpoint**: The endpoint is _dynamically allocated_ when the service starts. You can find the endpoint by running the `foundry service status` command. When using Foundry Local in your applications, we recommend using the SDK that automatically handles the endpoint for you. For more details on how to use the Foundry Local SDK, read the [Integrated inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
41
43
-**Use Cases**:
42
44
- Connect Foundry Local to your custom applications
43
45
- Execute models through HTTP requests
@@ -109,9 +111,7 @@ The Foundry CLI is a powerful tool for managing models, the inference engine, an
109
111
110
112
#### Inferencing SDK integration
111
113
112
-
Foundry Local supports integration with various SDKs, such as the OpenAI SDK, enabling developers to use familiar programming interfaces to interact with the local inference engine.
113
-
114
-
-**Supported SDKs**: Python, JavaScript, C#, and more.
114
+
Foundry Local supports integration with various SDKs in most languages, such as the OpenAI SDK, enabling developers to use familiar programming interfaces to interact with the local inference engine.
115
115
116
116
> [!TIP]
117
117
> To learn more about integrating with inferencing SDKs, read [Integrate inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md).
This tutorial shows you how to create a chat application using Foundry Local and Open Web UI. When you finish, you have a working chat interface running entirely on your local device.
Foundry Local runs ONNX models on your device with high performance. While the model catalog offers _out-of-the-box_ precompiled options, you can use any model in the ONNX format.
17
19
18
20
To compile existing models in Safetensor or PyTorch format into the ONNX format, you can use [Olive](https://microsoft.github.io/Olive). Olive is a tool that optimizes models to ONNX format, making them suitable for deployment in Foundry Local. It uses techniques like _quantization_ and _graph optimization_ to improve performance.
Foundry Local integrates with various inferencing SDKs - such as OpenAI, Azure OpenAI, Langchain, etc. This guide shows you how to connect your applications to locally running AI models using popular SDKs.
This tutorial shows you how to create an application using the Foundry Local SDK and [LangChain](https://www.langchain.com/langchain). In this tutorial, you build a translation application that translates text from one language to another that uses a local model.
Copy file name to clipboardExpand all lines: articles/ai-foundry/foundry-local/includes/sdk-reference/javascript.md
+27-18Lines changed: 27 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,12 +33,21 @@ Available options:
33
33
-`serviceUrl`: Base URL of the Foundry Local service
34
34
-`fetch`: (optional) Custom fetch implementation for environments like Node.js
35
35
36
+
### A note on aliases
37
+
38
+
Many methods outlined in this reference have an `aliasOrModelId` parameter in the signature. You can pass into the method either an **alias** or **model ID** as a value. Using an alias will:
39
+
40
+
- Select the *best model* for the available hardware. For example, if a Nvidia CUDA GPU is available, Foundry Local selects the CUDA model. If a supported NPU is available, Foundry Local selects the NPU model.
41
+
- Allow you to use a shorter name without needing to remember the model ID.
42
+
43
+
> [!TIP]
44
+
> We recommend passing into the `aliasOrModelId` parameter an **alias** because when you deploy your application, Foundry Local acquires the best model for the end user's machine at run-time.
|`downloadModel()`|`(modelAliasOrId: string, force = false, onProgress?) => Promise<FoundryModelInfo>`| Downloads a model to the local cache. |
71
-
|`loadModel()`|`(modelAliasOrId: string, ttl = 600) => Promise<FoundryModelInfo>`| Loads a model into the inference server. |
72
-
|`unloadModel()`|`(modelAliasOrId: string, force = false) => Promise<void>`| Unloads a model from the inference server. |
79
+
|`downloadModel()`|`(aliasOrModelId: string, token?: string, force = false, onProgress?) => Promise<FoundryModelInfo>`| Downloads a model to the local cache. |
80
+
|`loadModel()`|`(aliasOrModelId: string, ttl = 600) => Promise<FoundryModelInfo>`| Loads a model into the inference server. |
81
+
|`unloadModel()`|`(aliasOrModelId: string, force = false) => Promise<void>`| Unloads a model from the inference server. |
73
82
|`listLoadedModels()`|`() => Promise<FoundryModelInfo[]>`| Lists all models currently loaded in the service.|
74
83
75
84
## Example Usage
@@ -83,12 +92,12 @@ import { FoundryLocalManager } from "foundry-local-sdk";
83
92
// to your end-user's device.
84
93
// TIP: You can find a list of available models by running the
85
94
// following command in your terminal: `foundry model list`.
0 commit comments