Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
225 changes: 225 additions & 0 deletions WATSONX_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# Watson X Integration with Granite Models

This branch adds support for IBM Watson X AI with Granite models as an alternative to Ollama for running LocalGPT.

## Overview

LocalGPT now supports two LLM backends:
1. **Ollama** (default): Run models locally using Ollama
2. **Watson X**: Use IBM's Granite models hosted on Watson X AI

## What Changed

- Added `WatsonXClient` class in `rag_system/utils/watsonx_client.py` that provides an Ollama-compatible interface for Watson X
- Updated `factory.py` and `main.py` to support backend switching via environment variable
- Added `ibm-watsonx-ai` SDK dependency to `requirements.txt`
- Configuration now supports both backends through environment variables

## Prerequisites

To use Watson X with Granite models, you need:

1. IBM Cloud account with Watson X access
2. Watson X API key
3. Watson X project ID

### Getting Your Credentials

1. Go to [IBM Cloud](https://cloud.ibm.com/)
2. Navigate to Watson X AI service
3. Create or select a project
4. Get your API key from IBM Cloud IAM
5. Copy your project ID from the Watson X project settings

## Configuration

### Environment Variables

Create a `.env` file or set these environment variables:

```bash
# Choose LLM backend (default: ollama)
LLM_BACKEND=watsonx

# Watson X Configuration
WATSONX_API_KEY=your_api_key_here
WATSONX_PROJECT_ID=your_project_id_here
WATSONX_URL=https://us-south.ml.cloud.ibm.com

# Model Configuration
WATSONX_GENERATION_MODEL=ibm/granite-13b-chat-v2
WATSONX_ENRICHMENT_MODEL=ibm/granite-8b-japanese
```

### Available Granite Models

Watson X offers several Granite models:
- `ibm/granite-13b-chat-v2` - General purpose chat model
- `ibm/granite-13b-instruct-v2` - Instruction-following model
- `ibm/granite-20b-multilingual` - Multilingual support
- `ibm/granite-8b-japanese` - Lightweight Japanese model
- `ibm/granite-3b-code-instruct` - Code generation model

For a full list of available models, visit the [Watson X documentation](https://www.ibm.com/docs/en/watsonx/saas?topic=solutions-supported-foundation-models).

## Installation

1. Install the Watson X SDK:
```bash
pip install ibm-watsonx-ai>=1.3.39
```

Or install all dependencies:
```bash
pip install -r rag_system/requirements.txt
```

## Usage

### Running with Watson X

Once configured, simply set the environment variable and run as normal:

```bash
export LLM_BACKEND=watsonx
python -m rag_system.main api
```

Or in Python:

```python
import os
os.environ['LLM_BACKEND'] = 'watsonx'

from rag_system.factory import get_agent

# Get agent with Watson X backend
agent = get_agent(mode="default")

# Use as normal
result = agent.run("What is artificial intelligence?")
print(result)
```

### Switching Between Backends

You can easily switch between Ollama and Watson X:

```bash
# Use Ollama (local)
export LLM_BACKEND=ollama
python -m rag_system.main api

# Use Watson X (cloud)
export LLM_BACKEND=watsonx
python -m rag_system.main api
```

## Features

The Watson X client supports all the key features used by LocalGPT:

- ✅ Text generation / completion
- ✅ Async generation
- ✅ Streaming responses
- ✅ Embeddings (if using Watson X embedding models)
- ✅ Custom generation parameters (temperature, max_tokens, top_p, top_k)
- ⚠️ Image/multimodal support (limited, depends on model availability)

## API Compatibility

The `WatsonXClient` provides the same interface as `OllamaClient`:

```python
from rag_system.utils.watsonx_client import WatsonXClient

client = WatsonXClient(
api_key="your_api_key",
project_id="your_project_id"
)

# Generate completion
response = client.generate_completion(
model="ibm/granite-13b-chat-v2",
prompt="Explain quantum computing"
)

print(response['response'])

# Stream completion
for chunk in client.stream_completion(
model="ibm/granite-13b-chat-v2",
prompt="Write a story about AI"
):
print(chunk, end='', flush=True)
```

## Limitations

1. **Embedding Models**: Watson X uses different embedding models than Ollama. Make sure to configure embedding models appropriately in `main.py` if needed.

2. **Multimodal Support**: Image support varies by model availability in Watson X. Not all Granite models support multimodal inputs.

3. **Streaming**: Streaming support depends on the Watson X SDK version and may fall back to returning the full response at once.

4. **Rate Limits**: Watson X has API rate limits that may differ from local Ollama usage. Monitor your usage accordingly.

## Troubleshooting

### Authentication Errors

If you see authentication errors:
- Verify your API key is correct
- Check that your project ID matches an existing Watson X project
- Ensure your IBM Cloud account has Watson X access

### Model Not Found

If you get model not found errors:
- Verify the model ID is correct (e.g., `ibm/granite-13b-chat-v2`)
- Check that the model is available in your Watson X instance
- Some models may require additional permissions

### Connection Errors

If you experience connection issues:
- Check your internet connection
- Verify the Watson X URL is correct for your region
- Check IBM Cloud status page for service outages

## Cost Considerations

Unlike local Ollama, Watson X is a cloud service with usage-based pricing:
- Token-based pricing for generation
- Consider your query volume
- Monitor usage through IBM Cloud dashboard

## Reverting to Ollama

To switch back to local Ollama:

```bash
unset LLM_BACKEND # or set LLM_BACKEND=ollama
python -m rag_system.main api
```

## Support

For Watson X specific issues:
- [IBM Watson X Documentation](https://www.ibm.com/docs/en/watsonx/saas)
- [Watson X Developer Hub](https://www.ibm.com/watsonx/developer/)
- [IBM Cloud Support](https://cloud.ibm.com/docs/get-support)

For LocalGPT issues:
- [LocalGPT GitHub Issues](https://github.com/PromtEngineer/localGPT/issues)

## Contributing

If you find issues with the Watson X integration or want to add features:
1. Create an issue describing the problem/feature
2. Submit a pull request with your changes
3. Ensure all tests pass

## License

This integration follows the same license as LocalGPT (MIT License).
61 changes: 61 additions & 0 deletions env.example.watsonx
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# ====================================================================
# LocalGPT Watson X Configuration Example
# ====================================================================
# This file shows how to configure LocalGPT to use IBM Watson X AI
# with Granite models instead of local Ollama.
#
# Copy this file to .env and fill in your credentials:
# cp .env.example.watsonx .env
# ====================================================================

# LLM Backend Selection
# Options: "ollama" (default) or "watsonx"
LLM_BACKEND=watsonx

# ====================================================================
# Watson X Credentials
# ====================================================================
# Get these from your IBM Cloud Watson X project:
# 1. Go to https://cloud.ibm.com/
# 2. Navigate to Watson X AI service
# 3. Create or select a project
# 4. Get API key from IBM Cloud IAM
# 5. Copy project ID from project settings

# Your IBM Cloud API key
WATSONX_API_KEY=your_api_key_here

# Your Watson X project ID
WATSONX_PROJECT_ID=your_project_id_here

# Watson X service URL (default: us-south region)
# Options:
# - https://us-south.ml.cloud.ibm.com (US South)
# - https://eu-de.ml.cloud.ibm.com (Frankfurt)
# - https://eu-gb.ml.cloud.ibm.com (London)
# - https://jp-tok.ml.cloud.ibm.com (Tokyo)
WATSONX_URL=https://us-south.ml.cloud.ibm.com

# ====================================================================
# Model Configuration
# ====================================================================
# Granite models available on Watson X

# Main generation model for answering queries
# Options:
# - ibm/granite-13b-chat-v2 (recommended for chat)
# - ibm/granite-13b-instruct-v2 (for instructions)
# - ibm/granite-20b-multilingual (for multilingual)
# - ibm/granite-3b-code-instruct (for code)
WATSONX_GENERATION_MODEL=ibm/granite-13b-chat-v2

# Lightweight model for enrichment and routing
# Use a smaller model for better performance on simple tasks
WATSONX_ENRICHMENT_MODEL=ibm/granite-8b-japanese

# ====================================================================
# Optional: Ollama Configuration (fallback)
# ====================================================================
# These settings are used if LLM_BACKEND=ollama

OLLAMA_HOST=http://localhost:11434
51 changes: 45 additions & 6 deletions rag_system/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,30 @@ def get_agent(mode: str = "default"):
"""
from rag_system.agent.loop import Agent
from rag_system.utils.ollama_client import OllamaClient
from rag_system.main import PIPELINE_CONFIGS, OLLAMA_CONFIG
from rag_system.main import PIPELINE_CONFIGS, OLLAMA_CONFIG, LLM_BACKEND, WATSONX_CONFIG

load_dotenv()

llm_client = OllamaClient(host=OLLAMA_CONFIG["host"])
# Initialize the appropriate LLM client based on backend configuration
if LLM_BACKEND.lower() == "watsonx":
from rag_system.utils.watsonx_client import WatsonXClient

if not WATSONX_CONFIG["api_key"] or not WATSONX_CONFIG["project_id"]:
raise ValueError(
"Watson X configuration incomplete. Please set WATSONX_API_KEY and WATSONX_PROJECT_ID "
"environment variables."
)

llm_client = WatsonXClient(
api_key=WATSONX_CONFIG["api_key"],
project_id=WATSONX_CONFIG["project_id"],
url=WATSONX_CONFIG["url"]
)
llm_config = WATSONX_CONFIG
else:
llm_client = OllamaClient(host=OLLAMA_CONFIG["host"])
llm_config = OLLAMA_CONFIG

config = PIPELINE_CONFIGS.get(mode, PIPELINE_CONFIGS['default'])

if 'storage' not in config:
Expand All @@ -24,7 +43,7 @@ def get_agent(mode: str = "default"):
agent = Agent(
pipeline_configs=config,
llm_client=llm_client,
ollama_config=OLLAMA_CONFIG
ollama_config=llm_config
)
return agent

Expand All @@ -33,11 +52,31 @@ def get_indexing_pipeline(mode: str = "default"):
Factory function to get an instance of the Indexing Pipeline.
"""
from rag_system.pipelines.indexing_pipeline import IndexingPipeline
from rag_system.main import PIPELINE_CONFIGS, OLLAMA_CONFIG
from rag_system.main import PIPELINE_CONFIGS, OLLAMA_CONFIG, LLM_BACKEND, WATSONX_CONFIG
from rag_system.utils.ollama_client import OllamaClient

load_dotenv()
llm_client = OllamaClient(host=OLLAMA_CONFIG["host"])

# Initialize the appropriate LLM client based on backend configuration
if LLM_BACKEND.lower() == "watsonx":
from rag_system.utils.watsonx_client import WatsonXClient

if not WATSONX_CONFIG["api_key"] or not WATSONX_CONFIG["project_id"]:
raise ValueError(
"Watson X configuration incomplete. Please set WATSONX_API_KEY and WATSONX_PROJECT_ID "
"environment variables."
)

llm_client = WatsonXClient(
api_key=WATSONX_CONFIG["api_key"],
project_id=WATSONX_CONFIG["project_id"],
url=WATSONX_CONFIG["url"]
)
llm_config = WATSONX_CONFIG
else:
llm_client = OllamaClient(host=OLLAMA_CONFIG["host"])
llm_config = OLLAMA_CONFIG

config = PIPELINE_CONFIGS.get(mode, PIPELINE_CONFIGS['default'])

return IndexingPipeline(config, llm_client, OLLAMA_CONFIG)
return IndexingPipeline(config, llm_client, llm_config)
Loading