Skip to content

Commit 5578a85

Browse files
committed
address feedback
1 parent 81ad354 commit 5578a85

File tree

8 files changed

+164
-285
lines changed

8 files changed

+164
-285
lines changed

articles/ai-foundry/foundry-local/concepts/foundry-local-architecture.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,9 @@ The Foundry Local architecture consists of these main components:
3535

3636
### Foundry Local service
3737

38-
The Foundry Local Service is an OpenAI-compatible REST server that provides a standard interface for working with the inference engine and managing models. Developers use this API to send requests, run models, and get results programmatically.
38+
The Foundry Local Service includes an OpenAI-compatible REST server that provides a standard interface for working with the inference engine. It's also possible to manage models over REST. Developers use this API to send requests, run models, and get results programmatically.
3939

40-
- **Endpoint**: `http://localhost:5272/v1`
40+
- **Endpoint**: The endpoint is *dynamically allocated* when the service starts. You can find the endpoint by running the `foundry service status` command. When using Foundry Local in your applications, we recommend using the SDK that automatically handles the endpoint for you. For more details on how to use the Foundry Local SDK, read the [Integrated inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
4141
- **Use Cases**:
4242
- Connect Foundry Local to your custom applications
4343
- Execute models through HTTP requests
@@ -48,7 +48,7 @@ The ONNX Runtime is a core component that executes AI models. It runs optimized
4848

4949
**Features**:
5050

51-
- Works with multiple hardware providers (NVIDIA, AMD, Intel) and device types (NPUs, CPUs, GPUs)
51+
- Works with multiple hardware providers (NVIDIA, AMD, Intel, Qualcomm) and device types (NPUs, CPUs, GPUs)
5252
- Offers a consistent interface for running across models different hardware
5353
- Delivers best-in-class performance
5454
- Supports quantized models for faster inference
@@ -69,7 +69,7 @@ The model cache stores downloaded AI models locally on your device, which ensure
6969

7070
#### Model lifecycle
7171

72-
1. **Download**: Get models from the Azure AI Foundry model catalog and save them to your local disk.
72+
1. **Download**: Download models from the Azure AI Foundry model catalog and save them to your local disk.
7373
2. **Load**: Load models into the Foundry Local service memory for inference. Set a TTL (time-to-live) to control how long the model stays in memory (default: 10 minutes).
7474
3. **Run**: Execute model inference for your requests.
7575
4. **Unload**: Remove models from memory to free up resources when no longer needed.

articles/ai-foundry/foundry-local/get-started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Your system must meet the following requirements to run Foundry Local:
2525
- **Operating System**: Windows 10 (x64), Windows 11 (x64/ARM), macOS.
2626
- **Hardware**: Minimum 8GB RAM, 3GB free disk space. Recommended 16GB RAM, 15GB free disk space.
2727
- **Network**: Internet connection for initial model download (optional for offline use)
28-
- **Acceleration (optional)**: NVIDIA GPU (2,000 series or newer), AMD GPU (6,000 series or newer), or Qualcomm Snapdragon X Elite, with 8GB or more of memory (RAM).
28+
- **Acceleration (optional)**: NVIDIA GPU (2,000 series or newer), AMD GPU (6,000 series or newer), Qualcomm Snapdragon X Elite (8GB or more of memory), or Apple silicon.
2929

3030
Also, ensure you have administrative privileges to install software on your device.
3131

articles/ai-foundry/foundry-local/how-to/how-to-compile-hugging-face-models.md

Lines changed: 40 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -219,9 +219,16 @@ foundry cache ls # should show llama-3.2
219219
foundry cache cd models
220220
foundry cache ls # should show llama-3.2
221221
```
222-
223222
---
224223

224+
> [!CAUTION]
225+
> Remember to change the model cache back to the default directory when you're done by running:
226+
>
227+
> ```bash
228+
> foundry cache cd ./foundry/cache/models.
229+
> ```
230+
231+
225232
### Using the Foundry Local CLI
226233
227234
### [Bash](#tab/Bash)
@@ -235,40 +242,6 @@ foundry model run llama-3.2 --verbose
235242
```powershell
236243
foundry model run llama-3.2 --verbose
237244
```
238-
239-
---
240-
241-
### Using the REST API
242-
243-
### [Bash](#tab/Bash)
244-
245-
```bash
246-
curl -X POST http://localhost:5272/v1/chat/completions \
247-
-H "Content-Type: application/json" \
248-
-d '{
249-
"model": "llama-3.2",
250-
"messages": [{"role": "user", "content": "What is the capital of France?"}],
251-
"temperature": 0.7,
252-
"max_tokens": 50,
253-
"stream": true
254-
}'
255-
```
256-
257-
### [PowerShell](#tab/PowerShell)
258-
259-
```powershell
260-
Invoke-RestMethod -Uri http://localhost:5272/v1/chat/completions `
261-
-Method Post `
262-
-ContentType "application/json" `
263-
-Body '{
264-
"model": "llama-3.2",
265-
"messages": [{"role": "user", "content": "What is the capital of France?"}],
266-
"temperature": 0.7,
267-
"max_tokens": 50,
268-
"stream": true
269-
}'
270-
```
271-
272245
---
273246

274247
### Using the OpenAI Python SDK
@@ -277,33 +250,52 @@ The OpenAI Python SDK is a convenient way to interact with the Foundry Local RES
277250

278251
```bash
279252
pip install openai
253+
pip install foundry-local-sdk
280254
```
281255

282256
Then, you can use the following code to run the model:
283257

284258
```python
285-
from openai import OpenAI
259+
import openai
260+
from foundry_local import FoundryLocalManager
261+
262+
modelId = "llama-3.2"
263+
264+
# Create a FoundryLocalManager instance. This will start the Foundry
265+
# Local service if it is not already running and load the specified model.
266+
manager = FoundryLocalManager(modelId)
286267

287-
client = OpenAI(
288-
base_url="http://localhost:5272/v1",
289-
api_key="none", # required but not used
268+
# The remaining code us es the OpenAI Python SDK to interact with the local model.
269+
270+
# Configure the client to use the local Foundry service
271+
client = openai.OpenAI(
272+
base_url=manager.endpoint,
273+
api_key=manager.api_key # API key is not required for local usage
290274
)
291275

276+
# Set the model to use and generate a streaming response
292277
stream = client.chat.completions.create(
293-
model="llama-3.2",
294-
messages=[{"role": "user", "content": "What is the capital of France?"}],
295-
temperature=0.7,
296-
max_tokens=50,
297-
stream=True,
278+
model=manager.get_model_info(modelId).id,
279+
messages=[{"role": "user", "content": "What is the golden ratio?"}],
280+
stream=True
298281
)
299282

300-
for event in stream:
301-
print(event.choices[0].delta.content, end="", flush=True)
302-
print("\n\n")
283+
# Print the streaming response
284+
for chunk in stream:
285+
if chunk.choices[0].delta.content is not None:
286+
print(chunk.choices[0].delta.content, end="", flush=True)
303287
```
304288

305289
> [!TIP]
306-
> You can use any language that supports HTTP requests. For more information, read the [Integrate inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
290+
> You can use any language that supports HTTP requests. For more information, read the [Integrated inferencing SDKs with Foundry Local](../how-to/how-to-integrate-with-inference-sdks.md) article.
291+
292+
## Finishing up
293+
294+
After you're done using the custom model, you should reset the model cache to the default directory using:
295+
296+
```bash
297+
foundry cache cd ./foundry/cache/models
298+
```
307299

308300
## Next steps
309301

articles/ai-foundry/foundry-local/includes/sdk-reference/python.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,20 +14,20 @@ author: maanavd
1414
Install the Python package:
1515

1616
```bash
17-
pip install foundry-manager-sdk
17+
pip install foundry-local-sdk
1818
```
1919

20-
### FoundryManager Class
20+
### FoundryLocalManager Class
2121

22-
The `FoundryManager` class provides methods to manage models, cache, and the Foundry Local service.
22+
The `FoundryLocalManager` class provides methods to manage models, cache, and the Foundry Local service.
2323

2424
#### Initialization
2525

2626
```python
27-
from foundry_manager import FoundryManager
27+
from foundry_local import FoundryLocalManager
2828

2929
# Initialize and optionally bootstrap with a model
30-
manager = FoundryManager(model_id_or_alias=None, bootstrap=True)
30+
manager = FoundryLocalManager(model_id_or_alias=None, bootstrap=True)
3131
```
3232

3333
- `model_id_or_alias`: (optional) Model ID or alias to download and load at startup.
@@ -104,7 +104,7 @@ manager.unload_model(alias)
104104

105105
### Integrate with OpenAI SDK
106106

107-
Install the openai package:
107+
Install the OpenAI package:
108108

109109
```bash
110110
pip install openai

articles/ai-foundry/foundry-local/index.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ landingContent:
4646
- text: Integrate with LangChain
4747
url: how-to/how-to-use-langchain-with-foundry-local.md
4848
- text: Integrate with Open Web UI
49-
url: how-to/how-to-use-langchain-with-foundry-local.md
49+
url: how-to/how-to-chat-application-with-open-web-ui.md
5050
- text: Compile Hugging Face models to run on Foundry Local
5151
url: how-to/how-to-compile-hugging-face-models.md
5252

0 commit comments

Comments
 (0)