MicrosoftDocs
diff --git a/‎articles/ai-foundry/foundry-local/concepts/foundry-local-architecture.md
Lines changed: 6 additions & 6 deletions b/‎articles/ai-foundry/foundry-local/concepts/foundry-local-architecture.md
Lines changed: 6 additions & 6 deletions
diff --git a/‎articles/ai-foundry/foundry-local/get-started.md
Lines changed: 26 additions & 19 deletions b/‎articles/ai-foundry/foundry-local/get-started.md
Lines changed: 26 additions & 19 deletions
diff --git a/‎articles/ai-foundry/foundry-local/tutorials/chat-application-with-open-web-ui.md renamed to ‎articles/ai-foundry/foundry-local/how-to/how-to-chat-application-with-open-web-ui.md
Lines changed: 10 additions & 14 deletions b/‎articles/ai-foundry/foundry-local/tutorials/chat-application-with-open-web-ui.md renamed to ‎articles/ai-foundry/foundry-local/how-to/how-to-chat-application-with-open-web-ui.md
Lines changed: 10 additions & 14 deletions
diff --git a/‎articles/ai-foundry/foundry-local/how-to/how-to-compile-hugging-face-models.md
Lines changed: 44 additions & 52 deletions b/‎articles/ai-foundry/foundry-local/how-to/how-to-compile-hugging-face-models.md
Lines changed: 44 additions & 52 deletions
diff --git a/‎articles/ai-foundry/foundry-local/how-to/how-to-integrate-with-inference-sdks.md
Lines changed: 33 additions & 0 deletions b/‎articles/ai-foundry/foundry-local/how-to/how-to-integrate-with-inference-sdks.md
Lines changed: 33 additions & 0 deletions
@@ -35,9 +35,9 @@ The Foundry Local architecture consists of these main components:
 
 ### Foundry Local service
 
-The Foundry Local Service is an OpenAI-compatible REST server that provides a standard interface for working with the inference engine and managing models. Developers use this API to send requests, run models, and get results programmatically.
+The Foundry Local Service includes an OpenAI-compatible REST server that provides a standard interface for working with the inference engine. It's also possible to manage models over REST. Developers use this API to send requests, run models, and get results programmatically.
 
-- **Endpoint**: `http://localhost:5272/v1`
+- **Endpoint**: The endpoint is *dynamically allocated* when the service starts. You can find the endpoint by running the `foundry service status` command. When using Foundry Local in your applications, we recommend using the SDK that automatically handles the endpoint for you. For more details on how to use the Foundry Local SDK, read the [Integrated inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
 - **Use Cases**:
   - Connect Foundry Local to your custom applications
   - Execute models through HTTP requests
@@ -48,7 +48,7 @@ The ONNX Runtime is a core component that executes AI models. It runs optimized
 
 **Features**:
 
-- Works with multiple hardware providers (NVIDIA, AMD, Intel) and device types (NPUs, CPUs, GPUs)
+- Works with multiple hardware providers (NVIDIA, AMD, Intel, Qualcomm) and device types (NPUs, CPUs, GPUs)
 - Offers a consistent interface for running across models different hardware
 - Delivers best-in-class performance
 - Supports quantized models for faster inference
@@ -69,7 +69,7 @@ The model cache stores downloaded AI models locally on your device, which ensure
 
 #### Model lifecycle
 
-1. **Download**: Get models from the Azure AI Foundry model catalog and save them to your local disk.
+1. **Download**: Download models from the Azure AI Foundry model catalog and save them to your local disk.
 2. **Load**: Load models into the Foundry Local service memory for inference. Set a TTL (time-to-live) to control how long the model stays in memory (default: 10 minutes).
 3. **Run**: Execute model inference for your requests.
 4. **Unload**: Remove models from memory to free up resources when no longer needed.
@@ -114,7 +114,7 @@ Foundry Local supports integration with various SDKs, such as the OpenAI SDK, en
 - **Supported SDKs**: Python, JavaScript, C#, and more.
 
 > [!TIP]
-> To learn more about integrating with inferencing SDKs, read [Integrate Foundry Local with Inferencing SDKs](../how-to/integrate-with-inference-sdks.md).
+> To learn more about integrating with inferencing SDKs, read [Integrate inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md).
 
 #### AI Toolkit for Visual Studio Code
 
@@ -128,5 +128,5 @@ The AI Toolkit for Visual Studio Code provides a user-friendly interface for dev
 ## Next Steps
 
 - [Get started with Foundry Local](../get-started.md)
-- [Integrate with Inference SDKs](../how-to/integrate-with-inference-sdks.md)
+- [Integrate inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md)
 - [Foundry Local CLI Reference](../reference/reference-cli.md)
@@ -16,43 +16,50 @@ ms.custom: build-2025
 
 # Get started with Foundry Local
 
-This guide walks you through setting up Foundry Local to run AI models on your device. Follow these clear steps to install the tool, discover available models, and launch your first local AI model.
+This guide walks you through setting up Foundry Local to run AI models on your device. 
 
 ## Prerequisites
 
 Your system must meet the following requirements to run Foundry Local:
 
-- **Operating System**: Windows 10 (x64), Windows 11 (x64/ARM), macOS, or Linux (x64/ARM)
+- **Operating System**: Windows 10 (x64), Windows 11 (x64/ARM), macOS.
 - **Hardware**: Minimum 8GB RAM, 3GB free disk space. Recommended 16GB RAM, 15GB free disk space.
 - **Network**: Internet connection for initial model download (optional for offline use)
-- **Acceleration (optional)**: NVIDIA GPU (2,000 series or newer), AMD GPU (6,000 series or newer), or Qualcomm Snapdragon X Elite, with 8GB or more of memory (RAM).
+- **Acceleration (optional)**: NVIDIA GPU (2,000 series or newer), AMD GPU (6,000 series or newer), Qualcomm Snapdragon X Elite (8GB or more of memory), or Apple silicon.
 
 Also, ensure you have administrative privileges to install software on your device.
 
 ## Quickstart
 
 Get started with Foundry Local quickly:
 
-1. **Download** Foundry Local for your platform:
-   - [Windows](https://aka.ms/foundry-local-windows)
-   - [macOS](https://aka.ms/foundry-local-macos)
-   - [Linux](https://aka.ms/foundry-local-linux)
-1. **Install** the package by following the on-screen prompts.
-1. **Run your first model** Open a terminal window and run the following command to run a model (the model will be downloaded and an interactive prompt will appear): 
+1. [**Download Foundry Local Installer**](https://aka.ms/foundry-local-installer) and **install** by following the on-screen prompts. 
+    > [!TIP]
+    > If you're installing on Windows, you can also use `winget` to install Foundry Local. Open a terminal window and run the following command:
+    >
+    > ```powershell
+    > winget install Microsoft.FoundryLocal
+    > ```
+1. **Run your first model** Open a terminal window and run the following command to run a model: 
 
     ```bash
-    foundry model run phi-3-mini-4k 
+    foundry model run deepseek-r1-1.5b 
     ```
+    
+    The model downloads - which can take a few minutes, depending on your internet speed - and the model runs. Once the model is running, you can interact with it using the command line interface (CLI). For example, you can ask:
 
-> [!TIP]
-> You can replace `phi-3-mini-4k` with any model name from the catalog (see `foundry model list` for available models). Foundry Local will download the model variant that best matches your system's hardware and software configuration. For example, if you have an NVIDIA GPU, it will download the CUDA version of the model. If you have an QNN NPU, it will download the NPU variant. If you have no GPU or NPU, it will download the CPU version.
+    ```text
+    Why is the sky blue?
+    ```
+
+    You should see a response from the model in the terminal:
+    :::image type="content" source="media/get-started-output.png" alt-text="Screenshot of output from foundry local run command." lightbox="media/get-started-output.png":::
 
-> [!IMPORTANT]
-> **For macOS/Linux users:** Run both components in separate terminals:
-> - Neutron Server (`Inference.Service.Agent`) - Make it executable with `chmod +x Inference.Service.Agent`
-> - Foundry Client (`foundry`) - Make it executable with `chmod +x foundry` and add it to your PATH
 
-## Explore Foundry Local CLI commands
+> [!TIP]
+> You can replace `deepseek-r1-1.5b` with any model name from the catalog (see `foundry model list` for available models). Foundry Local downloads the model variant that best matches your system's hardware and software configuration. For example, if you have an NVIDIA GPU, it downloads the CUDA version of the model. If you have a Qualcomm NPU, it downloads the NPU variant. If you have no GPU or NPU, it downloads the CPU version.
+
+## Explore commands
 
 The Foundry CLI organizes commands into these main categories:
 
@@ -89,9 +96,9 @@ foundry cache --help
 
 ## Next steps
 
-- [Learn how to integrate Foundry Local with your applications](how-to/integrate-with-inference-sdks.md)
+- [Integrate inferencing SDKs with Foundry Local](how-to/how-to-integrate-with-inference-sdks.md)
 - [Explore the Foundry Local documentation](index.yml)
 - [Learn about best practices and troubleshooting](reference/reference-best-practice.md)
 - [Explore the Foundry Local API reference](reference/reference-catalog-api.md)
-- [Learn how to compile Hugging Face models](how-to/how-to-compile-hugging-face-models.md)
+- [Learn Compile Hugging Face models](how-to/how-to-compile-hugging-face-models.md)
 
@@ -1,11 +1,11 @@
 ---
-title: Build a chat application with Open Web UI
+title: Integrate Open Web UI with Foundry Local
 titleSuffix: Foundry Local
 description: Learn how to create a chat application using Foundry Local and Open Web UI
 manager: scottpolly
 keywords: Azure AI services, cognitive, AI models, local inference
 ms.service: azure-ai-foundry
-ms.topic: tutorial
+ms.topic: how-to
 ms.date: 02/20/2025
 ms.reviewer: samkemp
 ms.author: samkemp
@@ -14,19 +14,15 @@ ms.custom: build-2025
 #customer intent: As a developer, I want to get started with Foundry Local so that I can run AI models locally.
 ---
 
-# Build a chat application with Open Web UI
+# Integrate Open Web UI with Foundry Local
 
-This tutorial shows you how to create a chat application using Foundry Local and Open Web UI. When you finish, you'll have a working chat interface running entirely on your local device.
+This tutorial shows you how to create a chat application using Foundry Local and Open Web UI. When you finish, you have a working chat interface running entirely on your local device.
 
 ## Prerequisites
 
 Before you start this tutorial, you need:
 
-- **Foundry Local** [installed](../get-started.md) on your computer.
-- **At least one model loaded** with the `foundry model load` command, like this:
-  ```bash
-  foundry model load Phi-4-mini-gpu-int4-rtn-block-32
-  ```
+- **Foundry Local** installed on your computer. Read the [Get started with Foundry Local](../get-started.md) guide for installation instructions.
 
 ## Set up Open Web UI for chat
 
@@ -46,18 +42,18 @@ Before you start this tutorial, you need:
    2. Select **Connections**
    3. Select **Manage Direct Connections**
    4. Select the **+** icon to add a connection
-   5. Enter `http://localhost:5272/v1` for the URL
-   6. Type any value (like `test`) for the API Key, since it cannot be empty
+   5. For the **URL**, enter `http://localhost:PORT/v1` where `PORT` is replaced with the port of the Foundry Local endpoint, which you can find using the CLI command `foundry service status`. Note, that Foundry Local dynamically assigns a port, so it's not always the same.
+   6. Type any value (like `test`) for the API Key, since it can't be empty.
    7. Save your connection
 
 5. **Start chatting with your model**:
-   1. Your loaded models will appear in the dropdown at the top
+   1. Your loaded models appear in the dropdown at the top
    2. Select any model from the list
    3. Type your message in the input box at the bottom
 
 That's it! You're now chatting with an AI model running entirely on your local device.
 
 ## Next steps
 
-- [Build an application with LangChain](use-langchain-with-foundry-local.md)
-- [How to compile Hugging Face models to run on Foundry Local](../how-to/how-to-compile-hugging-face-models.md)
+- [Integrate inferencing SDKs with Foundry Local](how-to-integrate-with-inference-sdks.md)
+- [Compile Hugging Face models to run on Foundry Local](../how-to/how-to-compile-hugging-face-models.md)
@@ -1,7 +1,7 @@
 ---
-title: How to compile Hugging Face models to run on Foundry Local
+title: Compile Hugging Face models to run on Foundry Local
 titleSuffix: Foundry Local
-description: Learn how to compile and run Hugging Face models with Foundry Local.
+description: Learn Compile and run Hugging Face models with Foundry Local.
 manager: scottpolly
 ms.service: azure-ai-foundry
 ms.custom: build-2025
@@ -11,7 +11,7 @@ ms.author: samkemp
 author: samuel100
 ---
 
-# How to compile Hugging Face models to run on Foundry Local
+# Compile Hugging Face models to run on Foundry Local
 
 Foundry Local runs ONNX models on your device with high performance. While the model catalog offers _out-of-the-box_ precompiled options, you can use any model in the ONNX format.
 
@@ -219,9 +219,16 @@ foundry cache ls  # should show llama-3.2
 foundry cache cd models
 foundry cache ls  # should show llama-3.2
 ```
-
 ---
 
+> [!CAUTION]
+> Remember to change the model cache back to the default directory when you're done by running:
+> 
+> ```bash 
+> foundry cache cd ./foundry/cache/models.
+> ```
+
+
 ### Using the Foundry Local CLI
 
 ### [Bash](#tab/Bash)
@@ -235,40 +242,6 @@ foundry model run llama-3.2 --verbose
 ```powershell
 foundry model run llama-3.2 --verbose
 ```
-
----
-
-### Using the REST API
-
-### [Bash](#tab/Bash)
-
-```bash
-curl -X POST http://localhost:5272/v1/chat/completions \
--H "Content-Type: application/json" \
--d '{
-    "model": "llama-3.2",
-    "messages": [{"role": "user", "content": "What is the capital of France?"}],
-    "temperature": 0.7,
-    "max_tokens": 50,
-    "stream": true
-}'
-```
-
-### [PowerShell](#tab/PowerShell)
-
-```powershell
-Invoke-RestMethod -Uri http://localhost:5272/v1/chat/completions `
-    -Method Post `
-    -ContentType "application/json" `
-    -Body '{
-        "model": "llama-3.2",
-        "messages": [{"role": "user", "content": "What is the capital of France?"}],
-        "temperature": 0.7,
-        "max_tokens": 50,
-        "stream": true
-    }'
-```
-
 ---
 
 ### Using the OpenAI Python SDK
@@ -277,35 +250,54 @@ The OpenAI Python SDK is a convenient way to interact with the Foundry Local RES
 
 ```bash
 pip install openai
+pip install foundry-local-sdk
 ```
 
 Then, you can use the following code to run the model:
 
 ```python
-from openai import OpenAI
+import openai
+from foundry_local import FoundryLocalManager
+
+modelId = "llama-3.2"
+
+# Create a FoundryLocalManager instance. This will start the Foundry 
+# Local service if it is not already running and load the specified model.
+manager = FoundryLocalManager(modelId)
 
-client = OpenAI(
-    base_url="http://localhost:5272/v1",
-    api_key="none",  # required but not used
+# The remaining code us es the OpenAI Python SDK to interact with the local model.
+
+# Configure the client to use the local Foundry service
+client = openai.OpenAI(
+    base_url=manager.endpoint,
+    api_key=manager.api_key  # API key is not required for local usage
 )
 
+# Set the model to use and generate a streaming response
 stream = client.chat.completions.create(
-    model="llama-3.2",
-    messages=[{"role": "user", "content": "What is the capital of France?"}],
-    temperature=0.7,
-    max_tokens=50,
-    stream=True,
+    model=manager.get_model_info(modelId).id,
+    messages=[{"role": "user", "content": "What is the golden ratio?"}],
+    stream=True
 )
 
-for event in stream:
-    print(event.choices[0].delta.content, end="", flush=True)
-print("\n\n")
+# Print the streaming response
+for chunk in stream:
+    if chunk.choices[0].delta.content is not None:
+        print(chunk.choices[0].delta.content, end="", flush=True)
 ```
 
 > [!TIP]
-> You can use any language that supports HTTP requests. See [Integrate with Inferencing SDKs](integrate-with-inference-sdks.md) for more options.
+> You can use any language that supports HTTP requests. For more information, read the [Integrated inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
+
+## Finishing up
+
+After you're done using the custom model, you should reset the model cache to the default directory using:
+
+```bash
+foundry cache cd ./foundry/cache/models
+```
 
 ## Next steps
 
 - [Learn more about Olive](https://microsoft.github.io/Olive/)
-- [Integrate Foundry Local with Inferencing SDKs](integrate-with-inference-sdks.md)
+- [Integrate inferencing SDKs with Foundry Local](how-to-integrate-with-inference-sdks.md)
@@ -0,0 +1,33 @@
+---
+title: Integrate with inference SDKs
+titleSuffix: Foundry Local
+description: This article provides instructions on how to integrate Foundry Local with common Inferencing SDKs.
+manager: scottpolly
+ms.service: azure-ai-foundry
+ms.custom: build-2025
+ms.topic: how-to
+ms.date: 02/12/2025
+ms.author: samkemp
+zone_pivot_groups: foundry-local-sdk
+author: samuel100
+---
+
+# Integrate inferencing SDKs with Foundry Local
+
+Foundry Local integrates with various inferencing SDKs - such as OpenAI, Azure OpenAI, Langchain, etc. This guide shows you how to connect your applications to locally running AI models using popular SDKs.
+
+## Prerequisites
+
+- Foundry Local installed. See the [Get started with Foundry Local](../get-started.md) article for installation instructions.
+
+::: zone pivot="programming-language-python"
+[!INCLUDE [Python](../includes/integrate-examples/python.md)]
+::: zone-end
+::: zone pivot="programming-language-javascript"
+[!INCLUDE [JavaScript](../includes/integrate-examples/javascript.md)]
+::: zone-end
+
+## Next steps
+
+- [Compile Hugging Face models to run on Foundry Local](how-to-compile-hugging-face-models.md)
+- [Explore the Foundry Local CLI reference](../reference/reference-cli.md)