| title | category | description | order | icon |
|---|---|---|---|---|
Inference Providers |
Advanced |
Configure OpenRouter, vLLM, NVIDIA NIMs, Together AI, and other providers |
30 |
zap |
This guide demonstrates how to configure prompt-ops to work with various inference providers, including OpenRouter, vLLM, and NVIDIA NIMs. By changing the model configuration in your YAML files, you can easily switch between different backends without modifying your code.
In prompt-ops, model configuration is specified in the model section of your YAML configuration file. The basic configuration looks like this:
model:
name: "openrouter/meta-llama/llama-3.1-8b-instruct"
temperature: 0.0
max_tokens: 40960prompt-ops uses LiteLLM as the unified API client to handle all LLM API calls. LiteLLM provides automatic provider detection based on the model name prefix (e.g., openrouter/, groq/, together_ai/) and looks for the corresponding environment variable (e.g., OPENROUTER_API_KEY, GROQ_API_KEY, TOGETHERAI_API_KEY).
OpenRouter provides access to a wide range of models from different providers through a unified API.
model:
name: "openrouter/meta-llama/llama-3.1-8b-instruct"
temperature: 0.0
max_tokens: 40960Set your API key as an environment variable (LiteLLM will auto-detect it):
export OPENROUTER_API_KEY=your_openrouter_api_key_herevLLM is an open-source library for fast LLM inference. It's particularly useful for running models locally or on your own infrastructure.
model:
name: "hosted_vllm/meta-llama/Llama-3.1-8B-Instruct"
api_base: "http://localhost:8000/v1"
temperature: 0.0
max_tokens: 4096To run vLLM locally, you would first start the vLLM server:
pip install vllm
vllm serve meta-llama/Llama-3.1-8B-Instruct --tensor-parallel-size=1NVIDIA NIMs (NVIDIA Inference Microservices) provide optimized containers for running LLMs on NVIDIA GPUs.
model:
name: "openai/meta/llama-3.1-8b-instruct" # Format: openai/{model_name}
api_base: "http://localhost:8000/v1"
api_key: "any_string_for_localhost" # Can be any string for local deployments
temperature: 0.0
max_tokens: 4096To run a NIM container locally:
docker run -it --rm --name=nim \
--runtime=nvidia \
--gpus 1 \
--shm-size=16GB \
-e NGC_API_KEY=<YOUR NGC API KEY> \
-v "~/.cache/nim:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
nvcr.io/nim/meta/llama-3.1-8b-instruct:1.5.0Together AI provides a platform for running various open-source models with optimized performance and competitive pricing.
model:
name: "together_ai/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"
temperature: 0.0
max_tokens: 4096To use Together AI, you'll need to:
- Sign up for an account at Together AI
- Generate an API key from your account dashboard
- Set the API key as an environment variable (LiteLLM will auto-detect it):
export TOGETHERAI_API_KEY=your_api_key_hereThen run the optimization:
prompt-ops migratemodel:
task_model: groq/meta-llama/llama-4-maverick-17b-128e-instruct
proposer_model: groq/meta-llama/llama-4-maverick-17b-128e-instruct
api_base: https://api.groq.com/openai/v1export GROQ_API_KEY=your_api_key_hereThen run the optimization:
prompt-ops migrateprompt-ops allows you to specify different models for the task execution and the prompt proposal process:
model:
task_model: "openrouter/meta-llama/llama-3.1-8b-instruct"
proposer_model: "openrouter/meta-llama/llama-3.3-70b-instruct"
api_base: "https://openrouter.ai/api/v1"
temperature: 0.0
max_tokens: 4096To run prompt-ops with your configuration:
# Set your provider-specific API key
export OPENROUTER_API_KEY=your_key # For OpenRouter models (openrouter/...)
export GROQ_API_KEY=your_key # For Groq models (groq/...)
export TOGETHERAI_API_KEY=your_key # For Together AI models (together_ai/...)
# Run with any configuration
prompt-ops migrate --config configs/your_config.yamlHow LiteLLM Works: LiteLLM automatically detects the provider from your model name prefix (e.g., openrouter/model, groq/model, together_ai/model) and looks for the corresponding environment variable (OPENROUTER_API_KEY, GROQ_API_KEY, TOGETHERAI_API_KEY). No manual API routing required!
For more information on supported providers, environment variables, and configuration options, refer to the LiteLLM documentation.