MicrosoftDocs
diff --git a/‎articles/ai-foundry/foundry-local/concepts/foundry-local-architecture.md
Lines changed: 4 additions & 4 deletions b/‎articles/ai-foundry/foundry-local/concepts/foundry-local-architecture.md
Lines changed: 4 additions & 4 deletions
diff --git a/‎articles/ai-foundry/foundry-local/get-started.md
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/foundry-local/get-started.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-foundry/foundry-local/how-to/how-to-compile-hugging-face-models.md
Lines changed: 40 additions & 48 deletions b/‎articles/ai-foundry/foundry-local/how-to/how-to-compile-hugging-face-models.md
Lines changed: 40 additions & 48 deletions
diff --git a/‎articles/ai-foundry/foundry-local/includes/sdk-reference/python.md
Lines changed: 6 additions & 6 deletions b/‎articles/ai-foundry/foundry-local/includes/sdk-reference/python.md
Lines changed: 6 additions & 6 deletions
diff --git a/‎articles/ai-foundry/foundry-local/index.yml
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/foundry-local/index.yml
Lines changed: 1 addition & 1 deletion
@@ -35,9 +35,9 @@ The Foundry Local architecture consists of these main components:
 
 ### Foundry Local service
 
-The Foundry Local Service is an OpenAI-compatible REST server that provides a standard interface for working with the inference engine and managing models. Developers use this API to send requests, run models, and get results programmatically.
+The Foundry Local Service includes an OpenAI-compatible REST server that provides a standard interface for working with the inference engine. It's also possible to manage models over REST. Developers use this API to send requests, run models, and get results programmatically.
 
-- **Endpoint**: `http://localhost:5272/v1`
+- **Endpoint**: The endpoint is *dynamically allocated* when the service starts. You can find the endpoint by running the `foundry service status` command. When using Foundry Local in your applications, we recommend using the SDK that automatically handles the endpoint for you. For more details on how to use the Foundry Local SDK, read the [Integrated inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
 - **Use Cases**:
   - Connect Foundry Local to your custom applications
   - Execute models through HTTP requests
@@ -48,7 +48,7 @@ The ONNX Runtime is a core component that executes AI models. It runs optimized
 
 **Features**:
 
-- Works with multiple hardware providers (NVIDIA, AMD, Intel) and device types (NPUs, CPUs, GPUs)
+- Works with multiple hardware providers (NVIDIA, AMD, Intel, Qualcomm) and device types (NPUs, CPUs, GPUs)
 - Offers a consistent interface for running across models different hardware
 - Delivers best-in-class performance
 - Supports quantized models for faster inference
@@ -69,7 +69,7 @@ The model cache stores downloaded AI models locally on your device, which ensure
 
 #### Model lifecycle
 
-1. **Download**: Get models from the Azure AI Foundry model catalog and save them to your local disk.
+1. **Download**: Download models from the Azure AI Foundry model catalog and save them to your local disk.
 2. **Load**: Load models into the Foundry Local service memory for inference. Set a TTL (time-to-live) to control how long the model stays in memory (default: 10 minutes).
 3. **Run**: Execute model inference for your requests.
 4. **Unload**: Remove models from memory to free up resources when no longer needed.
 
@@ -25,7 +25,7 @@ Your system must meet the following requirements to run Foundry Local:
 - **Operating System**: Windows 10 (x64), Windows 11 (x64/ARM), macOS.
 - **Hardware**: Minimum 8GB RAM, 3GB free disk space. Recommended 16GB RAM, 15GB free disk space.
 - **Network**: Internet connection for initial model download (optional for offline use)
-- **Acceleration (optional)**: NVIDIA GPU (2,000 series or newer), AMD GPU (6,000 series or newer), or Qualcomm Snapdragon X Elite, with 8GB or more of memory (RAM).
+- **Acceleration (optional)**: NVIDIA GPU (2,000 series or newer), AMD GPU (6,000 series or newer), Qualcomm Snapdragon X Elite (8GB or more of memory), or Apple silicon.
 
 Also, ensure you have administrative privileges to install software on your device.
 
 
@@ -219,9 +219,16 @@ foundry cache ls  # should show llama-3.2
 foundry cache cd models
 foundry cache ls  # should show llama-3.2
 ```
-
 ---
 
+> [!CAUTION]
+> Remember to change the model cache back to the default directory when you're done by running:
+> 
+> ```bash 
+> foundry cache cd ./foundry/cache/models.
+> ```
+
+
 ### Using the Foundry Local CLI
 
 ### [Bash](#tab/Bash)
@@ -235,40 +242,6 @@ foundry model run llama-3.2 --verbose
 ```powershell
 foundry model run llama-3.2 --verbose
 ```
-
----
-
-### Using the REST API
-
-### [Bash](#tab/Bash)
-
-```bash
-curl -X POST http://localhost:5272/v1/chat/completions \
--H "Content-Type: application/json" \
--d '{
-    "model": "llama-3.2",
-    "messages": [{"role": "user", "content": "What is the capital of France?"}],
-    "temperature": 0.7,
-    "max_tokens": 50,
-    "stream": true
-}'
-```
-
-### [PowerShell](#tab/PowerShell)
-
-```powershell
-Invoke-RestMethod -Uri http://localhost:5272/v1/chat/completions `
-    -Method Post `
-    -ContentType "application/json" `
-    -Body '{
-        "model": "llama-3.2",
-        "messages": [{"role": "user", "content": "What is the capital of France?"}],
-        "temperature": 0.7,
-        "max_tokens": 50,
-        "stream": true
-    }'
-```
-
 ---
 
 ### Using the OpenAI Python SDK
@@ -277,33 +250,52 @@ The OpenAI Python SDK is a convenient way to interact with the Foundry Local RES
 
 ```bash
 pip install openai
+pip install foundry-local-sdk
 ```
 
 Then, you can use the following code to run the model:
 
 ```python
-from openai import OpenAI
+import openai
+from foundry_local import FoundryLocalManager
+
+modelId = "llama-3.2"
+
+# Create a FoundryLocalManager instance. This will start the Foundry 
+# Local service if it is not already running and load the specified model.
+manager = FoundryLocalManager(modelId)
 
-client = OpenAI(
-    base_url="http://localhost:5272/v1",
-    api_key="none",  # required but not used
+# The remaining code us es the OpenAI Python SDK to interact with the local model.
+
+# Configure the client to use the local Foundry service
+client = openai.OpenAI(
+    base_url=manager.endpoint,
+    api_key=manager.api_key  # API key is not required for local usage
 )
 
+# Set the model to use and generate a streaming response
 stream = client.chat.completions.create(
-    model="llama-3.2",
-    messages=[{"role": "user", "content": "What is the capital of France?"}],
-    temperature=0.7,
-    max_tokens=50,
-    stream=True,
+    model=manager.get_model_info(modelId).id,
+    messages=[{"role": "user", "content": "What is the golden ratio?"}],
+    stream=True
 )
 
-for event in stream:
-    print(event.choices[0].delta.content, end="", flush=True)
-print("\n\n")
+# Print the streaming response
+for chunk in stream:
+    if chunk.choices[0].delta.content is not None:
+        print(chunk.choices[0].delta.content, end="", flush=True)
 ```
 
 > [!TIP]
-> You can use any language that supports HTTP requests. For more information, read the [Integrate inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
+> You can use any language that supports HTTP requests. For more information, read the [Integrated inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
+
+## Finishing up
+
+After you're done using the custom model, you should reset the model cache to the default directory using:
+
+```bash
+foundry cache cd ./foundry/cache/models
+```
 
 ## Next steps
 
 
@@ -14,20 +14,20 @@ author: maanavd
 Install the Python package:
 
 ```bash
-pip install foundry-manager-sdk
+pip install foundry-local-sdk
 ```
 
-### FoundryManager Class
+### FoundryLocalManager Class
 
-The `FoundryManager` class provides methods to manage models, cache, and the Foundry Local service.
+The `FoundryLocalManager` class provides methods to manage models, cache, and the Foundry Local service.
 
 #### Initialization
 
 ```python
-from foundry_manager import FoundryManager
+from foundry_local import FoundryLocalManager
 
 # Initialize and optionally bootstrap with a model
-manager = FoundryManager(model_id_or_alias=None, bootstrap=True)
+manager = FoundryLocalManager(model_id_or_alias=None, bootstrap=True)
 ```
 
 - `model_id_or_alias`: (optional) Model ID or alias to download and load at startup.
@@ -104,7 +104,7 @@ manager.unload_model(alias)
 
 ### Integrate with OpenAI SDK
 
-Install the openai package:
+Install the OpenAI package:
 
 ```bash
 pip install openai
 
@@ -46,7 +46,7 @@ landingContent:
           - text: Integrate with LangChain
             url: how-to/how-to-use-langchain-with-foundry-local.md
           - text: Integrate with Open Web UI
-            url: how-to/how-to-use-langchain-with-foundry-local.md
+            url: how-to/how-to-chat-application-with-open-web-ui.md
           - text: Compile Hugging Face models to run on Foundry Local
             url: how-to/how-to-compile-hugging-face-models.md