Running AI Models Locally: AI Toolkit, Docker, and Foundry Local

Introduction
AI Toolkit for Visual Studio Code
- Key Features
- Getting Started
Docker Model Runner
- Key Features - Docker Model Runner
- Getting Started - Docker Model Runner
Foundry Local
- Key Features - Foundry Local
- Getting Started - Foundry Local
Sample Code: Using AI Toolkit for Visual Studio Code with .NET
- Semantic Kernel with AI Toolkit
- Microsoft Extensions for AI with AI Toolkit
Sample Code: Using Docker Models with .NET
- Semantic Kernel with Docker Models
- Microsoft Extensions for AI with Docker Models
Sample Code: Using Foundry Local with .NET
- Semantic Kernel with Foundry Local
- Microsoft Extensions for AI with Foundry Local
Running the Samples
Comparing Local Model Runners
Additional Resources
Summary
Next Steps

In this lesson, you'll learn how to run AI models locally using three popular approaches:

AI Toolkit for Visual Studio Code – A suite of tools for Visual Studio Code that enables running AI models locally
Docker Model Runner – A containerized approach for running AI models with Docker
Foundry Local – A cross-platform, open-source solution for running Microsoft AI models locally

Running models locally provides several benefits:

Data privacy – Your data never leaves your machine
Cost efficiency – No usage charges for API calls
Offline availability – Use AI even without internet connectivity
Customization – Fine-tune models for specific use cases

AI Toolkit for Visual Studio Code

The AI Toolkit for Visual Studio Code is a collection of tools and technologies that help you build and run AI applications locally on your PC. It leverages platform capabilities to optimize AI workloads.

Key Features

DirectML – Hardware-accelerated machine learning primitives
Windows AI Runtime (WinRT) – Runtime environment for AI models
ONNX Runtime – Cross-platform inference accelerator
Local model downloads – Access to optimized models for Windows

Getting Started

Install the AI Toolkit for Visual Studio Code
Download a supported model
Use the APIs through .NET or other supported languages

📝 Note: AI Toolkit for Visual Studio Code requires Visual Studio Code and compatible hardware for optimal performance.

Docker Model Runner

Docker Model Runner is a tool for running AI models in containers, making it easy to deploy and run inference workloads consistently across different environments.

Key Features - Docker Model Runner

Containerized models – Package models with their dependencies
Cross-platform – Run on Windows, macOS, and Linux
Built-in API – RESTful API for model interaction
Resource management – Control CPU and memory usage

Getting Started - Docker Model Runner

Install Docker Desktop
Pull a model image
Run the model container
Interact with the model through the API

# Pull and run a Llama model
docker run -d -p 12434:8080 \
  --name deepseek-model \
  --runtime=nvidia \
  ghcr.io/huggingface/dockerfiles/model-runner:latest \
  deepseek-ai/deepseek-llm-7b-chat

Foundry Local

Foundry Local is an open-source, cross-platform solution for running Microsoft AI models on your own hardware. It supports Windows, Linux, and macOS, and is designed for privacy, performance, and flexibility.

Official documentation: https://learn.microsoft.com/azure/ai-foundry/foundry-local/
GitHub repository: https://github.com/microsoft/Foundry-Local/tree/main

Key Features - Foundry Local

Cross-platform – Windows, Linux, and macOS
Microsoft models – Run models from Microsoft Foundry locally
REST API – Interact with models using a local API endpoint
No cloud dependency – All inference runs on your machine

Getting Started - Foundry Local

Read the official Foundry Local documentation
Download and install Foundry Local for your OS
Start the Foundry Local server and download a model
Use the REST API to interact with the model

Sample Code: Using AI Toolkit for Visual Studio Code with .NET

The AI Toolkit for Visual Studio Code provides a way to run AI models locally on your machine. We have two examples that demonstrate how to interact with AI Toolkit models using .NET:

1. Semantic Kernel with AI Toolkit

The AIToolkit-01-SK-Chat project shows how to use Semantic Kernel to chat with a model running via AI Toolkit for Visual Studio Code.

// Example code demonstrating AI Toolkit for Visual Studio Code with Semantic Kernel integration
// Configure to use a locally installed model through AI Toolkit for Visual Studio Code
var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    modelId: modelId,
    endpoint: new Uri(endpoint),
    apiKey: apiKey);
var kernel = builder.Build();

2. Microsoft Extensions for AI with AI Toolkit

The AIToolkit-02-MEAI-Chat project demonstrates how to use Microsoft Extensions for AI to interact with AI Toolkit for Visual Studio Code models.

// Example code demonstrating AI Toolkit for Visual Studio Code with MEAI
OpenAIClientOptions options = new OpenAIClientOptions();
options.Endpoint = new Uri(endpoint);
ApiKeyCredential credential = new ApiKeyCredential(apiKey);
// Create a chat client using local model through AI Toolkit for Visual Studio Code
ChatClient client = new OpenAIClient(credential, options).GetChatClient(modelId);

Sample Code: Using Docker Models with .NET

In this repository, we have two examples that demonstrate how to interact with Docker-based models using .NET:

1. Semantic Kernel with Docker Models

The DockerModels-01-SK-Chat project shows how to use Semantic Kernel to chat with a model running in Docker.

var model = "ai/deepseek-r1-distill-llama";
var base_url = "http://localhost:12434/engines/llama.cpp/v1";
var api_key = "unused";

// Create a chat completion service
var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion(modelId: model, apiKey: api_key, endpoint: new Uri(base_url));
var kernel = builder.Build();

var chat = kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory();
history.AddSystemMessage("You are a useful chatbot. Always reply in a funny way with short answers.");

// ... continue with chat functionality

2. Microsoft Extensions for AI with Docker Models

The DockerModels-02-MEAI-Chat project demonstrates how to use Microsoft Extensions for AI to interact with Docker-based models.

var model = "ai/deepseek-r1-distill-llama";
var base_url = "http://localhost:12434/engines/llama.cpp/v1";
var api_key = "unused";

OpenAIClientOptions options = new OpenAIClientOptions();
options.Endpoint = new Uri(base_url);
ApiKeyCredential credential = new ApiKeyCredential(api_key);

ChatClient client = new OpenAIClient(credential, options).GetChatClient(model);

// Build and send a prompt
StringBuilder prompt = new StringBuilder();
prompt.AppendLine("You will analyze the sentiment of the following product reviews...");
// ... add more text to the prompt

var response = await client.CompleteChatAsync(prompt.ToString());
Console.WriteLine(response.Value.Content[0].Text);

Sample Code: Using Foundry Local with .NET

This repository includes two demos for Foundry Local:

1. Semantic Kernel with Foundry Local

The AIFoundryLocal-01-SK-Chat project shows how to use Semantic Kernel to chat with a model running via Foundry Local.

#pragma warning disable SKEXP0001, SKEXP0003, SKEXP0010, SKEXP0011, SKEXP0050, SKEXP0052
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using System.Text;

var model = "Phi-3.5-mini-instruct-cuda-gpu";
var baseUrl = "http://localhost:5273/v1";
var apiKey = "unused";

// Create a chat completion service
var kernel = Kernel.CreateBuilder()
    .AddOpenAIChatCompletion(modelId: model, apiKey: apiKey, endpoint: new Uri(baseUrl))
    .Build();

var chat = kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory();
history.AddSystemMessage("You are a useful chatbot. Always reply in a funny way with short answers.");

var settings = new OpenAIPromptExecutionSettings
{
    MaxTokens = 50000,
    Temperature = 1
};

while (true)
{
    Console.Write("Q: ");
    var userQuestion = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(userQuestion))
    {
        break;
    }
    history.AddUserMessage(userQuestion);

    var responseBuilder = new StringBuilder();
    Console.Write("AI: ");
    await foreach (var message in chat.GetStreamingChatMessageContentsAsync(history, settings, kernel))
    {
        responseBuilder.Append(message.Content);
        Console.Write(message.Content);
    }
    Console.WriteLine();

    history.AddAssistantMessage(responseBuilder.ToString());
}

2. Microsoft Extensions for AI with Foundry Local

The AIFoundryLocal-01-MEAI-Chat project demonstrates how to use Microsoft Extensions for AI to interact with Foundry Local models.

using OpenAI;
using OpenAI.Chat;
using System.ClientModel;
using System.Text;

var model = "Phi-3.5-mini-instruct-cuda-gpu";
var baseUrl = "http://localhost:5273/v1";
var apiKey = "unused";

OpenAIClientOptions options = new OpenAIClientOptions();
options.Endpoint = new Uri(baseUrl);
ApiKeyCredential credential = new ApiKeyCredential(apiKey);

ChatClient client = new OpenAIClient(credential, options).GetChatClient(model);

// here we're building the prompt
StringBuilder prompt = new StringBuilder();
prompt.AppendLine("You will analyze the sentiment of the following product reviews. Each line is its own review. Output the sentiment of each review in a bulleted list and then provide a generate sentiment of all reviews. ");
prompt.AppendLine("I bought this product and it's amazing. I love it!");
prompt.AppendLine("This product is terrible. I hate it.");
prompt.AppendLine("I'm not sure about this product. It's okay.");
prompt.AppendLine("I found this product based on the other reviews. It worked for a bit, and then it didn't.");

// send the prompt to the model and wait for the text completion
var response = await client.CompleteChatAsync(prompt.ToString());

// display the response
Console.WriteLine(response.Value.Content[0].Text);

Running the Samples

To run the samples in this repository:

Install Docker Desktop, AI Toolkit for Visual Studio Code, or Foundry Local as needed
Pull or download the required model
Start the local model server (Docker, AI Toolkit, or Foundry Local)
Navigate to one of the sample project directories
Run the project with dotnet run

Comparing Local Model Runners

Feature	AI Toolkit for Visual Studio Code	Docker Model Runner	Foundry Local
Platform	Windows, macOS, Linux	Cross-platform	Cross-platform
Integration	Visual Studio Code APIs	REST API	REST API
Deployment	VS Code Extension	Container-based	Local installation
Hardware Acceleration	DirectML, DirectX	CPU, GPU	CPU, GPU
Models	Optimized for VS Code	Any containerized model	Microsoft Foundry models

Additional Resources

Summary

Running AI models locally with AI Toolkit for Visual Studio Code, Docker Model Runner, or Foundry Local offers flexibility, privacy, and cost benefits. The samples in this repository demonstrate how to integrate these local models with your .NET applications using Semantic Kernel and Microsoft Extensions for AI.

Next Steps

You've learned how to run AI models locally using AI Toolkit for Visual Studio Code, Docker Model Runner, and Foundry Local. Next, you'll explore the latest Azure OpenAI models for image and video generation.

👉 Image and Video Generation with New Azure OpenAI Models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running AI Models Locally: AI Toolkit, Docker, and Foundry Local

Table of Contents

AI Toolkit for Visual Studio Code

Key Features

Getting Started

Docker Model Runner

Key Features - Docker Model Runner

Getting Started - Docker Model Runner

Foundry Local

Key Features - Foundry Local

Getting Started - Foundry Local

Sample Code: Using AI Toolkit for Visual Studio Code with .NET

1. Semantic Kernel with AI Toolkit

2. Microsoft Extensions for AI with AI Toolkit

Sample Code: Using Docker Models with .NET

1. Semantic Kernel with Docker Models

2. Microsoft Extensions for AI with Docker Models

Sample Code: Using Foundry Local with .NET

1. Semantic Kernel with Foundry Local

2. Microsoft Extensions for AI with Foundry Local

Running the Samples

Comparing Local Model Runners

Additional Resources

Summary

Next Steps

FilesExpand file tree

06-LocalModelRunners.md

Latest commit

History

06-LocalModelRunners.md

File metadata and controls

Running AI Models Locally: AI Toolkit, Docker, and Foundry Local

Table of Contents

AI Toolkit for Visual Studio Code

Key Features

Getting Started

Docker Model Runner

Key Features - Docker Model Runner

Getting Started - Docker Model Runner

Foundry Local

Key Features - Foundry Local

Getting Started - Foundry Local

Sample Code: Using AI Toolkit for Visual Studio Code with .NET

1. Semantic Kernel with AI Toolkit

2. Microsoft Extensions for AI with AI Toolkit

Sample Code: Using Docker Models with .NET

1. Semantic Kernel with Docker Models

2. Microsoft Extensions for AI with Docker Models

Sample Code: Using Foundry Local with .NET

1. Semantic Kernel with Foundry Local

2. Microsoft Extensions for AI with Foundry Local

Running the Samples

Comparing Local Model Runners

Additional Resources

Summary

Next Steps