Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
303 changes: 303 additions & 0 deletions docs/docs/getting_started/configuring_and_launching_llama_stack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,303 @@
# Configuring and Launching Llama Stack

This guide walks you through the two primary methods for setting up and running Llama Stack: using Docker containers and configuring the server manually.

## Method 1: Using the Starter Docker Container
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's structure from the approach that needs the least infrastructure & knowledge to the most -

Prerequesites

  • Ollama running at http://localhost:11434 (include link to ollama getting started docs)
  • export OLLAMA_URL=http://localhost:11434

Using llama stack CLI

pip install llama-stack
llama stack build --distro starter --image-type venv --run

Using docker or podman

...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @mattf I have restructured the doc.


The easiest way to get started with Llama Stack is using the pre-built Docker container. This approach eliminates the need for manual dependency management and provides a consistent environment across different systems.

### Prerequisites

- Docker installed and running on your system
- Access to external model providers (e.g., Ollama running locally)

### Basic Docker Usage

Here's an example for spinning up the Llama Stack server using Docker:

```bash
docker run -it \
-v ~/.llama:/root/.llama \
--network=host \
llamastack/distribution-starter \
--e OLLAMA_URL=http://localhost:11434
```

### Docker Command Breakdown

- `-it`: Run in interactive mode with TTY allocation
- `-v ~/.llama:/root/.llama`: Mount your local Llama Stack configuration directory
- `--network=host`: Use host networking to access local services like Ollama
- `llamastack/distribution-starter`: The official Llama Stack Docker image
- `--e OLLAMA_URL=http://localhost:11434`: Set environment variable for Ollama URL

### Advanced Docker Configuration

You can customize the Docker deployment with additional environment variables:

```bash
docker run -it \
-v ~/.llama:/root/.llama \
-p 8321:8321 \
-e OLLAMA_URL=http://localhost:11434 \
-e BRAVE_SEARCH_API_KEY=your_api_key_here \
-e TAVILY_SEARCH_API_KEY=your_api_key_here \
llamastack/distribution-starter \
--port 8321
```

### Environment Variables

Common environment variables you can set:

| Variable | Description | Example |
|----------|-------------|---------|
| `OLLAMA_URL` | URL for Ollama service | `http://localhost:11434` |
| `BRAVE_SEARCH_API_KEY` | API key for Brave search | `your_brave_api_key` |
| `TAVILY_SEARCH_API_KEY` | API key for Tavily search | `your_tavily_api_key` |
| `TOGETHER_API_KEY` | API key for Together AI | `your_together_api_key` |
| `OPENAI_API_KEY` | API key for OpenAI | `your_openai_api_key` |

## Method 2: Manual Server Configuration and Launch

For more control over your Llama Stack deployment, you can configure and run the server manually.

### Prerequisites

1. **Install Llama Stack**:

Using pip:
```bash
pip install llama-stack
```

Using uv (alternative):
```bash
# Initialize a new project (if starting fresh)
uv init

# Add llama-stack as a dependency
uv add llama-stack

# Note: If using uv, prefix subsequent commands with 'uv run'
# Example: uv run llama stack build --list-distros
```

### Step 1: Build a Distribution

Choose a distro and build your Llama Stack distribution:

```bash
# List available distributions
llama stack build --list-distros

# Build with a specific distro
llama stack build --distro watsonx --image-type venv --image-name watsonx-stack

# Or build with a meta-reference distro
llama stack build --distro meta-reference-gpu --image-type venv --image-name meta-reference-gpu-stack
```

#### Advanced: Custom Provider Selection (Step 1.a)

If you know the specific providers you want to use, you can supply them directly on the command-line instead of using a pre-built distribution:

```bash
llama stack build --providers inference=remote::ollama,agents=inline::meta-reference,safety=inline::llama-guard,vector_io=inline::faiss,tool_runtime=inline::rag-runtime --image-type venv --image-name custom-stack
```

**Discover Available Options:**

```bash
# List all available APIs
llama stack list-apis

# List all available providers
llama stack list-providers
```

This approach gives you complete control over which providers are included in your stack, allowing for highly customized configurations tailored to your specific needs.

### Select Available Distributions

- **ci-tests**: CI tests for Llama Stack
- **dell**: Dell's distribution of Llama Stack. TGI inference via Dell's custom container
- **meta-reference-gpu**: Use Meta Reference for running LLM inference
- **nvidia**: Use NVIDIA NIM for running LLM inference, evaluation and safety
- **open-benchmark**: Distribution for running open benchmarks
- **postgres-demo**: Quick start template for running Llama Stack with several popular providers
- **starter**: Quick start template for running Llama Stack with several popular providers. This distribution is intended for CPU-only environments
- **starter-gpu**: Quick start template for running Llama Stack with several popular providers. This distribution is intended for GPU-enabled environments
- **watsonx**: Use watsonx for running LLM inference

### Step 2: Configure Your Stack

After building, you can customize the configuration files:

#### Configuration File Locations

- Build config: `~/.llama/distributions/{stack-name}/{stack-name}-build.yaml`
- Runtime config: `~/.llama/distributions/{stack-name}/{stack-name}-run.yaml`

#### Sample Runtime Configuration

```yaml
version: 2

apis:
- inference
- safety
- embeddings
- tool_runtime

providers:
inference:
- provider_id: ollama
provider_type: remote::ollama
config:
url: http://localhost:11434

safety:
- provider_id: llama-guard
provider_type: remote::ollama
config:
url: http://localhost:11434

embeddings:
- provider_id: ollama-embeddings
provider_type: remote::ollama
config:
url: http://localhost:11434

tool_runtime:
- provider_id: brave-search
provider_type: remote::brave-search
config:
api_key: ${env.BRAVE_SEARCH_API_KEY:=}
```

### Step 3: Launch the Server

Start your configured Llama Stack server:

```bash
# Run with specific port
llama stack run {stack-name} --port 8321

# Run with environment variables
OLLAMA_URL=http://localhost:11434 llama stack run starter --port 8321

# Run in background
nohup llama stack run starter --port 8321 > llama_stack.log 2>&1 &
```

### Step 4: Verify Installation

Test your Llama Stack server:

#### Basic HTTP Health Checks
```bash
# Check server health
curl http://localhost:8321/health

# List available models
curl http://localhost:8321/v1/models
```

#### Comprehensive Verification (Recommended)
Use the official Llama Stack client for better verification:

```bash
# List all configured providers (recommended)
uv run --with llama-stack-client llama-stack-client providers list

# Alternative if you have llama-stack-client installed
llama-stack-client providers list
```

#### Test Chat Completion
```bash
# Basic HTTP test
curl -X POST http://localhost:8321/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1:8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'

# Or using the client (more robust)
uv run --with llama-stack-client llama-stack-client inference chat-completion \
--model llama3.1:8b \
--message "Hello!"
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify w/ client -

uv run --with llama-stack-client llama-stack-client providers list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @mattf Updated that section.


## Configuration Management

### Managing Multiple Stacks

You can maintain multiple stack configurations:

```bash
# List all built stacks
llama stack list

# Remove a stack
llama stack rm {stack-name}

# Rebuild with updates
llama stack build --distro starter --image-type venv --image-name starter-v2
```

### Common Configuration Issues

#### Port Conflicts

If port 8321 is already in use:

```bash
# Check what's using the port
netstat -tlnp | grep :8321

# Use a different port
llama stack run starter --port 8322
```

## Troubleshooting

### Common Issues

1. **Docker Permission Denied**:
```bash
sudo docker run -it \
-v ~/.llama:/root/.llama \
--network=host \
llamastack/distribution-starter
```

2. **Provider Connection Issues**:
- Verify external services (Ollama, APIs) are running
- Check network connectivity and firewall settings
- Validate API keys and URLs

### Logs and Debugging

Enable detailed logging:

```bash
# Run with debug logging
llama stack run starter --port 8321 --log-level DEBUG

# Check logs in Docker
docker logs <container-id>
```

## Next Steps

Once your Llama Stack server is running:

1. **Explore the APIs**: Test inference, safety, and embeddings endpoints
2. **Integrate with Applications**: Use the server with LangChain, custom applications, or API clients
3. **Scale Your Deployment**: Consider load balancing and high-availability setups
4. **Monitor Performance**: Set up logging and monitoring for production use

For more advanced configurations and production deployments, refer to the [Advanced Configuration Guide](advanced_configuration.md) and [Production Deployment Guide](production_deployment.md).
2 changes: 2 additions & 0 deletions docs/docs/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,9 @@ Llama Stack consists of a server (with multiple pluggable API providers) and Cli

## Quick Links


- Ready to build? Check out the [Getting Started Guide](/docs/getting_started/quickstart) to get started.
- Need help with setup? See the [Configuration and Launch Guide](./getting_started/configuring_and_launching_llama_stack) for detailed Docker and manual installation instructions.
- Want to contribute? See the [Contributing Guide](https://github.com/llamastack/llama-stack/blob/main/CONTRIBUTING.md).
- Explore [Example Applications](https://github.com/llamastack/llama-stack-apps) built with Llama Stack.

Expand Down