Learn to install, configure, and run your first AI models using Microsoft Foundry Local. This hands-on session provides a step-by-step introduction to local inference, from installation through building your first chat application using models like Phi-4, Qwen, and DeepSeek.
By the end of this session, you will:
- Install and Configure: Set up Foundry Local with proper installation verification
- Master CLI Operations: Use Foundry Local CLI for model management and deployment
- Run Your First Model: Successfully deploy and interact with a local AI model
- Build a Chat App: Create a basic chat application using the Foundry Local Python SDK
- Understand Local AI: Grasp the fundamentals of local inference and model management
- Windows: Windows 11 (22H2 or later) OR macOS: macOS 11+ (limited support)
- RAM: 8GB minimum, 16GB+ recommended
- Storage: 10GB+ free space for models
- Python: 3.10 or later installed
- Admin Access: Administrator privileges for installation
- Visual Studio Code with Python extension (recommended)
- Command line access (PowerShell on Windows, Terminal on macOS)
- Git for cloning repositories (optional)
Install Foundry Local using the Windows package manager:
# Install via winget (recommended)
winget install Microsoft.FoundryLocalAlternative: Download directly from Microsoft Learn
Note
macOS support is currently in preview. Check official documentation for the latest availability.
If available, install using Homebrew:
# If Homebrew formula is available
brew update
brew install foundry-local
# Or manual download (check official docs for latest)
curl -L -o foundry-local.tar.gz "https://download.microsoft.com/foundry-local/latest/macos/foundry-local.tar.gz"
tar -xzf foundry-local.tar.gz
sudo ./install.shAlternative for macOS users:
- Use a Windows 11 VM (Parallels/UTM) and follow Windows steps
- Run via container if available and configure
FOUNDRY_LOCAL_ENDPOINT
After installation, restart your terminal and verify Foundry Local is working:
# Check if Foundry Local is installed correctly
foundry --version
# View available commands
foundry --helpExpected output should show version information and available commands.
Create a dedicated Python environment for this workshop:
Windows:
# Create virtual environment
py -m venv .venv
# Activate environment
.\.venv\Scripts\Activate.ps1
# Upgrade pip and install dependencies
python -m pip install --upgrade pip
pip install foundry-local-sdk openaimacOS/Linux:
# Create virtual environment
python3 -m venv .venv
# Activate environment
source .venv/bin/activate
# Upgrade pip and install dependencies
python -m pip install --upgrade pip
pip install foundry-local-sdk openaiNow let's run our first AI model locally!
# Download and start phi-4-mini (lightweight, fast)
foundry model run phi-4-mini
# Test the model with a simple prompt
foundry model run phi-4-mini --prompt "Hello, introduce yourself in one sentence"Tip
This command downloads the model (first time) and starts the Foundry Local service automatically.
# List available models (shows downloaded models)
foundry model list
# Check service status
foundry service status
# See what models are cached locally
foundry cache listOnce phi-4-mini is working, experiment with other models:
# Larger model with better capabilities
foundry model run gpt-oss-20b --prompt "Explain edge AI in simple terms"
# Fast, efficient model
foundry model run qwen2.5-0.5b --prompt "What are the benefits of local AI inference?"Now let's create a Python application that uses the models we just started.
Create a new file called my_first_chat.py (or use the provided sample):
#!/usr/bin/env python3
"""
My First Foundry Local Chat Application
Using FoundryLocalManager for automatic service management
"""
import os
from foundry_local import FoundryLocalManager
from openai import OpenAI
def main():
# Get model alias from environment or use default
alias = os.getenv("FOUNDRY_LOCAL_ALIAS", "phi-4-mini")
try:
# Initialize Foundry Local Manager (auto-starts service, downloads model)
manager = FoundryLocalManager(alias)
# Create OpenAI client pointing to local endpoint
client = OpenAI(
base_url=manager.endpoint,
api_key=manager.api_key or "not-needed"
)
# Get the actual model ID for this alias
model_id = manager.get_model_info(alias).id
print("🤖 Welcome to your first local AI chat!")
print(f"� Using model: {alias} -> {model_id}")
print(f"🌐 Endpoint: {manager.endpoint}")
print("�💡 Type 'quit' to exit\n")
except Exception as e:
print(f"❌ Failed to initialize Foundry Local: {e}")
print("💡 Make sure Foundry Local is installed: foundry --version")
return
while True:
# Get user input
user_message = input("You: ").strip()
if user_message.lower() in ['quit', 'exit', 'bye']:
print("👋 Goodbye!")
break
if not user_message:
continue
try:
# Send message to local AI model
response = client.chat.completions.create(
model=model_id,
messages=[
{"role": "system", "content": "You are a helpful AI assistant running locally."},
{"role": "user", "content": user_message}
],
max_tokens=200,
temperature=0.7
)
# Display the response
ai_response = response.choices[0].message.content
print(f"🤖 AI: {ai_response}\n")
except Exception as e:
print(f"❌ Error: {e}")
print("💡 Check service status: foundry service status\n")
if __name__ == "__main__":
main()Tip
Related Examples: For more advanced usage, see:
- Python Sample:
Workshop/samples/session01/chat_bootstrap.py- Includes streaming responses and error handling - Jupyter Notebook:
Workshop/notebooks/session01_chat_bootstrap.ipynb- Interactive version with detailed explanations
# No need to manually start models - FoundryLocalManager handles this!
# Just run your chat application
python my_first_chat.pyAlternative: Use the provided samples directly
# Try the complete sample with streaming support
cd Workshop/samples
python -m session01.chat_bootstrap "Your question here"Or explore the interactive notebook Open Workshop/notebooks/session01_chat_bootstrap.ipynb in VS Code
Try these example conversations:
- "What is Microsoft Foundry Local?"
- "List 3 benefits of running AI models locally"
- "Help me understand edge AI"
Congratulations! You've successfully:
- ✅ Installed Foundry Local and verified it's working
- ✅ Started your first AI model (phi-4-mini) locally
- ✅ Tested different models via command line
- ✅ Built a chat application that connects to your local AI
- ✅ Experienced local AI inference without cloud dependencies
- Your AI models run entirely on your computer
- No data is sent to the cloud
- Responses are generated locally using your CPU/GPU
- Privacy and security are maintained
foundry model rundownloads and starts models- FoundryLocalManager SDK automatically handles service startup and model loading
- Models are cached locally for future use
- Multiple models can be downloaded but typically one runs at a time
- The service automatically manages model lifecycle
- CLI Approach: Manual model management with
foundry model run <model> - SDK Approach: Automatic service + model management with
FoundryLocalManager(alias) - Recommendation: Use SDK for applications, CLI for testing and exploration
# Installation & Setup
foundry --version # Check installation
foundry --help # View all commands
# Model Management
foundry model list # List available models
foundry model run <model> # Download and start a model
foundry model run <model> --prompt "text" # One-shot prompt
foundry cache list # Show downloaded models
# Service Management
foundry service status # Check if service is running
foundry service start # Start the service manually
foundry service stop # Stop the service- phi-4-mini: Best starter model - fast, lightweight, good quality
- qwen2.5-0.5b: Fastest inference, minimal memory usage
- gpt-oss-20b: Higher quality responses, needs more resources
- deepseek-coder-1.3b: Optimized for programming and code tasks
Solution:
# Restart your terminal after installation
# Or manually add to PATH (Windows)
$env:PATH += ";C:\Program Files\Microsoft\FoundryLocal"Solution:
# Check available system memory
foundry service status
# Try a smaller model first
foundry model run phi-4-mini
# Check disk space for model downloads
# Models are stored in: %USERPROFILE%\.foundry\models (Windows)Solution:
# Check if service is running
foundry service status
# Start service if needed
foundry service start
# Verify the port (default is 5273)
# Check for port conflicts with: netstat -an | findstr 5273- Experiment with different models and prompts
- Modify your chat application to try different models
- Create your own prompts and test responses
- Explore Session 2: Building RAG applications
- Session 2: Build AI solutions with RAG (Retrieval-Augmented Generation)
- Session 3: Compare different open-source models
- Session 4: Work with cutting-edge models
- Session 5: Build multi-agent AI systems
For more advanced usage, you can set these environment variables:
| Variable | Purpose | Example |
|---|---|---|
FOUNDRY_LOCAL_ALIAS |
Default model to use | phi-4-mini |
FOUNDRY_LOCAL_ENDPOINT |
Override endpoint URL | http://localhost:5273/v1 |
Create a .env file in your project directory:
FOUNDRY_LOCAL_ALIAS=phi-4-mini
FOUNDRY_LOCAL_ENDPOINT=auto
- Session01 Python Sample:
Workshop/samples/session01/chat_bootstrap.py- Complete chat app with streaming - Session01 Notebook:
Workshop/notebooks/session01_chat_bootstrap.ipynb- Interactive tutorial - Module08 Sample 01 - REST Chat Quickstart
- Module08 Sample 02 - OpenAI SDK Integration
- Module08 Sample 03 - Model Discovery & Benchmarking
Session Duration: 30 minutes hands-on + 15 minutes Q&A
Difficulty Level: Beginner
Prerequisites: Windows 11/macOS 11+, Python 3.10+, Admin access
Scenario: An enterprise IT team needs to evaluate on-device AI inference for processing sensitive employee feedback without sending data to external services.
Your Goal: Demonstrate that local AI models can provide quality responses with sub-second latency while maintaining complete data privacy.
Use these prompts to validate your setup:
[
"List two benefits of local inference.",
"Summarize why keeping data on device improves privacy.",
"Give one trade-off when choosing a small model over a large model."
]- ✅ All prompts get responses in under 2 seconds
- ✅ No data leaves your local machine
- ✅ Responses are relevant and helpful
- ✅ Your chat application works smoothly
This validation ensures your Foundry Local setup is ready for the advanced workshops in Sessions 2-6.