-
Couldn't load subscription status.
- Fork 1.2k
docs: Added initial documentation for how to configure and launch llama stack #3450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 7 commits
5900bce
1adc94c
142f8bd
d34a043
38b9d03
0fcd32e
86a835c
c83343d
2f3dfee
4afa23d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,303 @@ | ||
| # Configuring and Launching Llama Stack | ||
|
|
||
| This guide walks you through the two primary methods for setting up and running Llama Stack: using Docker containers and configuring the server manually. | ||
|
|
||
| ## Method 1: Using the Starter Docker Container | ||
|
||
|
|
||
| The easiest way to get started with Llama Stack is using the pre-built Docker container. This approach eliminates the need for manual dependency management and provides a consistent environment across different systems. | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| - Docker installed and running on your system | ||
| - Access to external model providers (e.g., Ollama running locally) | ||
|
|
||
| ### Basic Docker Usage | ||
|
|
||
| Here's an example for spinning up the Llama Stack server using Docker: | ||
|
|
||
| ```bash | ||
| docker run -it \ | ||
| -v ~/.llama:/root/.llama \ | ||
| --network=host \ | ||
| llamastack/distribution-starter \ | ||
| --e OLLAMA_URL=http://localhost:11434 | ||
| ``` | ||
|
|
||
| ### Docker Command Breakdown | ||
|
|
||
| - `-it`: Run in interactive mode with TTY allocation | ||
| - `-v ~/.llama:/root/.llama`: Mount your local Llama Stack configuration directory | ||
| - `--network=host`: Use host networking to access local services like Ollama | ||
| - `llamastack/distribution-starter`: The official Llama Stack Docker image | ||
| - `--e OLLAMA_URL=http://localhost:11434`: Set environment variable for Ollama URL | ||
|
|
||
| ### Advanced Docker Configuration | ||
|
|
||
| You can customize the Docker deployment with additional environment variables: | ||
|
|
||
| ```bash | ||
| docker run -it \ | ||
| -v ~/.llama:/root/.llama \ | ||
| -p 8321:8321 \ | ||
| -e OLLAMA_URL=http://localhost:11434 \ | ||
| -e BRAVE_SEARCH_API_KEY=your_api_key_here \ | ||
| -e TAVILY_SEARCH_API_KEY=your_api_key_here \ | ||
| llamastack/distribution-starter \ | ||
| --port 8321 | ||
| ``` | ||
|
|
||
| ### Environment Variables | ||
|
|
||
| Common environment variables you can set: | ||
|
|
||
| | Variable | Description | Example | | ||
| |----------|-------------|---------| | ||
| | `OLLAMA_URL` | URL for Ollama service | `http://localhost:11434` | | ||
| | `BRAVE_SEARCH_API_KEY` | API key for Brave search | `your_brave_api_key` | | ||
| | `TAVILY_SEARCH_API_KEY` | API key for Tavily search | `your_tavily_api_key` | | ||
| | `TOGETHER_API_KEY` | API key for Together AI | `your_together_api_key` | | ||
| | `OPENAI_API_KEY` | API key for OpenAI | `your_openai_api_key` | | ||
|
|
||
| ## Method 2: Manual Server Configuration and Launch | ||
|
|
||
| For more control over your Llama Stack deployment, you can configure and run the server manually. | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| 1. **Install Llama Stack**: | ||
|
|
||
| Using pip: | ||
| ```bash | ||
| pip install llama-stack | ||
| ``` | ||
|
|
||
| Using uv (alternative): | ||
| ```bash | ||
| # Initialize a new project (if starting fresh) | ||
| uv init | ||
|
|
||
| # Add llama-stack as a dependency | ||
| uv add llama-stack | ||
|
|
||
| # Note: If using uv, prefix subsequent commands with 'uv run' | ||
| # Example: uv run llama stack build --list-distros | ||
| ``` | ||
|
|
||
| ### Step 1: Build a Distribution | ||
|
|
||
| Choose a distro and build your Llama Stack distribution: | ||
|
|
||
| ```bash | ||
| # List available distributions | ||
| llama stack build --list-distros | ||
|
|
||
| # Build with a specific distro | ||
| llama stack build --distro watsonx --image-type venv --image-name watsonx-stack | ||
|
|
||
| # Or build with a meta-reference distro | ||
| llama stack build --distro meta-reference-gpu --image-type venv --image-name meta-reference-gpu-stack | ||
| ``` | ||
mattf marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### Advanced: Custom Provider Selection (Step 1.a) | ||
|
|
||
| If you know the specific providers you want to use, you can supply them directly on the command-line instead of using a pre-built distribution: | ||
|
|
||
| ```bash | ||
| llama stack build --providers inference=remote::ollama,agents=inline::meta-reference,safety=inline::llama-guard,vector_io=inline::faiss,tool_runtime=inline::rag-runtime --image-type venv --image-name custom-stack | ||
| ``` | ||
|
|
||
| **Discover Available Options:** | ||
|
|
||
| ```bash | ||
| # List all available APIs | ||
| llama stack list-apis | ||
|
|
||
| # List all available providers | ||
| llama stack list-providers | ||
| ``` | ||
|
|
||
| This approach gives you complete control over which providers are included in your stack, allowing for highly customized configurations tailored to your specific needs. | ||
|
|
||
| ### Select Available Distributions | ||
mattf marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| - **ci-tests**: CI tests for Llama Stack | ||
| - **dell**: Dell's distribution of Llama Stack. TGI inference via Dell's custom container | ||
| - **meta-reference-gpu**: Use Meta Reference for running LLM inference | ||
| - **nvidia**: Use NVIDIA NIM for running LLM inference, evaluation and safety | ||
| - **open-benchmark**: Distribution for running open benchmarks | ||
| - **postgres-demo**: Quick start template for running Llama Stack with several popular providers | ||
| - **starter**: Quick start template for running Llama Stack with several popular providers. This distribution is intended for CPU-only environments | ||
| - **starter-gpu**: Quick start template for running Llama Stack with several popular providers. This distribution is intended for GPU-enabled environments | ||
| - **watsonx**: Use watsonx for running LLM inference | ||
|
|
||
| ### Step 2: Configure Your Stack | ||
|
|
||
| After building, you can customize the configuration files: | ||
|
|
||
| #### Configuration File Locations | ||
|
|
||
| - Build config: `~/.llama/distributions/{stack-name}/{stack-name}-build.yaml` | ||
| - Runtime config: `~/.llama/distributions/{stack-name}/{stack-name}-run.yaml` | ||
|
|
||
| #### Sample Runtime Configuration | ||
|
|
||
| ```yaml | ||
| version: 2 | ||
|
|
||
| apis: | ||
| - inference | ||
| - safety | ||
| - embeddings | ||
| - tool_runtime | ||
|
|
||
| providers: | ||
| inference: | ||
| - provider_id: ollama | ||
| provider_type: remote::ollama | ||
| config: | ||
| url: http://localhost:11434 | ||
|
|
||
| safety: | ||
| - provider_id: llama-guard | ||
| provider_type: remote::ollama | ||
| config: | ||
| url: http://localhost:11434 | ||
|
|
||
| embeddings: | ||
| - provider_id: ollama-embeddings | ||
| provider_type: remote::ollama | ||
| config: | ||
| url: http://localhost:11434 | ||
|
|
||
| tool_runtime: | ||
| - provider_id: brave-search | ||
| provider_type: remote::brave-search | ||
| config: | ||
| api_key: ${env.BRAVE_SEARCH_API_KEY:=} | ||
| ``` | ||
|
|
||
| ### Step 3: Launch the Server | ||
|
|
||
| Start your configured Llama Stack server: | ||
|
|
||
| ```bash | ||
| # Run with specific port | ||
| llama stack run {stack-name} --port 8321 | ||
|
|
||
| # Run with environment variables | ||
| OLLAMA_URL=http://localhost:11434 llama stack run starter --port 8321 | ||
|
|
||
| # Run in background | ||
| nohup llama stack run starter --port 8321 > llama_stack.log 2>&1 & | ||
| ``` | ||
|
|
||
| ### Step 4: Verify Installation | ||
|
|
||
| Test your Llama Stack server: | ||
|
|
||
| #### Basic HTTP Health Checks | ||
| ```bash | ||
| # Check server health | ||
| curl http://localhost:8321/health | ||
|
|
||
| # List available models | ||
| curl http://localhost:8321/v1/models | ||
| ``` | ||
|
|
||
| #### Comprehensive Verification (Recommended) | ||
| Use the official Llama Stack client for better verification: | ||
|
|
||
| ```bash | ||
| # List all configured providers (recommended) | ||
| uv run --with llama-stack-client llama-stack-client providers list | ||
|
|
||
| # Alternative if you have llama-stack-client installed | ||
| llama-stack-client providers list | ||
| ``` | ||
|
|
||
| #### Test Chat Completion | ||
| ```bash | ||
| # Basic HTTP test | ||
| curl -X POST http://localhost:8321/v1/chat/completions \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "model": "llama3.1:8b", | ||
| "messages": [{"role": "user", "content": "Hello!"}] | ||
| }' | ||
|
|
||
| # Or using the client (more robust) | ||
| uv run --with llama-stack-client llama-stack-client inference chat-completion \ | ||
| --model llama3.1:8b \ | ||
| --message "Hello!" | ||
| ``` | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. verify w/ client - There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, @mattf Updated that section. |
||
|
|
||
| ## Configuration Management | ||
|
|
||
| ### Managing Multiple Stacks | ||
|
|
||
| You can maintain multiple stack configurations: | ||
|
|
||
| ```bash | ||
| # List all built stacks | ||
| llama stack list | ||
|
|
||
| # Remove a stack | ||
| llama stack rm {stack-name} | ||
|
|
||
| # Rebuild with updates | ||
| llama stack build --distro starter --image-type venv --image-name starter-v2 | ||
| ``` | ||
|
|
||
| ### Common Configuration Issues | ||
|
|
||
| #### Port Conflicts | ||
|
|
||
| If port 8321 is already in use: | ||
|
|
||
| ```bash | ||
| # Check what's using the port | ||
| netstat -tlnp | grep :8321 | ||
|
|
||
| # Use a different port | ||
| llama stack run starter --port 8322 | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Common Issues | ||
|
|
||
| 1. **Docker Permission Denied**: | ||
| ```bash | ||
| sudo docker run -it \ | ||
| -v ~/.llama:/root/.llama \ | ||
| --network=host \ | ||
| llamastack/distribution-starter | ||
| ``` | ||
|
|
||
| 2. **Provider Connection Issues**: | ||
| - Verify external services (Ollama, APIs) are running | ||
| - Check network connectivity and firewall settings | ||
| - Validate API keys and URLs | ||
|
|
||
| ### Logs and Debugging | ||
|
|
||
| Enable detailed logging: | ||
|
|
||
| ```bash | ||
| # Run with debug logging | ||
| llama stack run starter --port 8321 --log-level DEBUG | ||
|
|
||
| # Check logs in Docker | ||
| docker logs <container-id> | ||
| ``` | ||
|
|
||
| ## Next Steps | ||
|
|
||
| Once your Llama Stack server is running: | ||
|
|
||
| 1. **Explore the APIs**: Test inference, safety, and embeddings endpoints | ||
| 2. **Integrate with Applications**: Use the server with LangChain, custom applications, or API clients | ||
| 3. **Scale Your Deployment**: Consider load balancing and high-availability setups | ||
| 4. **Monitor Performance**: Set up logging and monitoring for production use | ||
|
|
||
| For more advanced configurations and production deployments, refer to the [Advanced Configuration Guide](advanced_configuration.md) and [Production Deployment Guide](production_deployment.md). | ||
Uh oh!
There was an error while loading. Please reload this page.