Skip to content

Commit 7a9dd44

Browse files
committed
feat: added minikube, oke deployment, standard k8s readme, and updated local rag agent with more debug options
1 parent 8ce9da8 commit 7a9dd44

File tree

2 files changed

+32
-171
lines changed

2 files changed

+32
-171
lines changed

agentic_rag/gradio_app.py

Lines changed: 28 additions & 171 deletions
Original file line numberDiff line numberDiff line change
@@ -295,196 +295,53 @@ def create_interface():
295295

296296
# Create model choices list for reuse
297297
model_choices = []
298-
# HF models first if token is available
299-
if hf_token:
300-
model_choices.extend([
301-
"mistral",
302-
"mistral-4bit",
303-
"mistral-8bit",
304-
])
305-
# Then Ollama models (don't require HF token)
298+
# Only Ollama models (no more local Mistral deployments)
306299
model_choices.extend([
307-
"llama3",
308-
"phi-3",
309-
"qwen2",
310-
# New Ollama models
311-
"gemma3:1b",
312-
"gemma3",
313-
"gemma3:12b",
314-
"gemma3:27b",
315300
"qwq",
316-
"deepseek-r1",
317-
"deepseek-r1:671b",
301+
"gemma3",
318302
"llama3.3",
319-
"llama3.2",
320-
"llama3.2:1b",
321-
"llama3.2-vision",
322-
"llama3.2-vision:90b",
323-
"llama3.1",
324-
"llama3.1:405b",
325303
"phi4",
326-
"phi4-mini",
327304
"mistral",
328-
"moondream",
329-
"neural-chat",
330-
"starling-lm",
331-
"codellama",
332-
"llama2-uncensored",
333305
"llava",
334-
"granite3.2"
306+
"phi3",
307+
"deepseek-r1"
335308
])
336309
if openai_key:
337310
model_choices.append("openai")
338311

339-
# Set default model to qwen2
340-
default_model = "qwen2"
312+
# Set default model to qwq
313+
default_model = "qwq"
341314

342315
# Model Management Tab (First Tab)
343316
with gr.Tab("Model Management"):
344317
gr.Markdown("""
345-
## Model Management
346-
347-
Download models in advance to prepare them for use in the chat interface.
348-
349-
### Hugging Face Models
350-
351-
For Hugging Face models (Mistral), you'll need a Hugging Face token in your config.yaml file.
352-
353-
### Ollama Models (Default)
354-
355-
Ollama models are used by default. For Ollama models, this will pull the model using the Ollama client.
356-
Make sure Ollama is installed and running on your system.
357-
You can download Ollama from [ollama.com/download](https://ollama.com/download)
318+
## Model Selection
319+
Choose your preferred model for the conversation.
358320
""")
359321

360-
with gr.Row():
361-
with gr.Column():
362-
model_dropdown = gr.Dropdown(
363-
choices=model_choices,
364-
value=default_model if default_model in model_choices else model_choices[0] if model_choices else None,
365-
label="Select Model to Download",
366-
interactive=True
367-
)
368-
download_button = gr.Button("Download Selected Model")
369-
model_status = gr.Textbox(
370-
label="Download Status",
371-
placeholder="Select a model and click Download to begin...",
372-
interactive=False
373-
)
374-
375-
with gr.Column():
376-
gr.Markdown("""
377-
### Model Information
378-
379-
**Ollama - qwen2** (DEFAULT): Alibaba's Qwen2 model via Ollama.
380-
- Size: ~4GB
381-
- Requires Ollama to be installed and running
382-
- High-quality model with good performance
383-
384-
**Ollama - llama3**: Meta's Llama 3 model via Ollama.
385-
- Size: ~4GB
386-
- Requires Ollama to be installed and running
387-
- Excellent performance and quality
388-
389-
**Ollama - phi-3**: Microsoft's Phi-3 model via Ollama.
390-
- Size: ~4GB
391-
- Requires Ollama to be installed and running
392-
- Efficient small model with good performance
393-
394-
**Local (Mistral)**: The default Mistral-7B-Instruct-v0.2 model.
395-
- Size: ~14GB
396-
- VRAM Required: ~8GB
397-
- Good balance of quality and speed
398-
399-
**Local (Mistral) - 4-bit Quantized**: 4-bit quantized version of Mistral-7B.
400-
- Size: ~4GB
401-
- VRAM Required: ~4GB
402-
- Faster inference with minimal quality loss
403-
404-
**Local (Mistral) - 8-bit Quantized**: 8-bit quantized version of Mistral-7B.
405-
- Size: ~7GB
406-
- VRAM Required: ~6GB
407-
- Balance between quality and memory usage
408-
409-
For a complete list of supported models and specifications, see the **Model FAQ** tab.
410-
""")
411-
412-
# Model FAQ Tab
413-
with gr.Tab("Model FAQ"):
414-
gr.Markdown("""
415-
## Model Information & Technical Requirements
416-
417-
This page provides detailed information about all supported models, including size, parameter count, and hardware requirements.
418-
419-
### Memory Requirements
322+
model_dropdown = gr.Dropdown(
323+
choices=model_choices,
324+
value=default_model,
325+
label="Select Model",
326+
info="Choose the model to use for the conversation"
327+
)
420328

421-
As a general guideline:
422-
- You should have at least 8 GB of RAM available to run 7B parameter models
423-
- You should have at least 16 GB of RAM available to run 13B parameter models
424-
- You should have at least 32 GB of RAM available to run 33B+ parameter models
425-
- For vision models, additional memory is required for image processing
426-
427-
### Ollama Models
428-
429-
| Model | Parameters | Size | Download Command | Description | Pulls | Tags | Last Updated |
430-
|-------|------------|------|-----------------|-------------|-------|------|--------------|
431-
| Gemma 3 | 1B | 815MB | gemma3:1b | The current, most capable model that runs on a single GPU | 3.4M | 17 | 2 weeks ago |
432-
| Gemma 3 | 4B | 3.3GB | gemma3 | The current, most capable model that runs on a single GPU | 3.4M | 17 | 2 weeks ago |
433-
| Gemma 3 | 12B | 8.1GB | gemma3:12b | The current, most capable model that runs on a single GPU | 3.4M | 17 | 2 weeks ago |
434-
| Gemma 3 | 27B | 17GB | gemma3:27b | The current, most capable model that runs on a single GPU | 3.4M | 17 | 2 weeks ago |
435-
| QwQ | 32B | 20GB | qwq | QwQ is the reasoning model of the Qwen series | 1.2M | 8 | 4 weeks ago |
436-
| DeepSeek-R1 | 7B | 4.7GB | deepseek-r1 | DeepSeek's first-generation of reasoning models with comparable performance to OpenAI-o1 | 35.5M | 29 | 2 months ago |
437-
| DeepSeek-R1 | 671B | 404GB | deepseek-r1:671b | DeepSeek's first-generation of reasoning models with comparable performance to OpenAI-o1 | 35.5M | 29 | 2 months ago |
438-
| Llama 3.3 | 70B | 43GB | llama3.3 | New state of the art 70B model. Llama 3.3 70B offers similar performance compared to the Llama 3.1 405B model | 1.7M | 14 | 4 months ago |
439-
| Llama 3.2 | 3B | 2.0GB | llama3.2 | Meta's Llama 3.2 goes small with 1B and 3B models | 12.8M | 63 | 6 months ago |
440-
| Llama 3.2 | 1B | 1.3GB | llama3.2:1b | Meta's Llama 3.2 goes small with 1B and 3B models | 12.8M | 63 | 6 months ago |
441-
| Llama 3.2 Vision | 11B | 7.9GB | llama3.2-vision | Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models | 1.8M | 9 | 5 months ago |
442-
| Llama 3.2 Vision | 90B | 55GB | llama3.2-vision:90b | Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models | 1.8M | 9 | 5 months ago |
443-
| Llama 3.1 | 8B | 4.7GB | llama3.1 | Llama 3.1 is a new state-of-the-art model from Meta | 89.6M | 93 | 4 months ago |
444-
| Llama 3.1 | 405B | 231GB | llama3.1:405b | Llama 3.1 is a new state-of-the-art model from Meta | 89.6M | 93 | 4 months ago |
445-
| Phi 4 | 14B | 9.1GB | phi4 | Phi-4 is a 14B parameter, state-of-the-art open model from Microsoft | 1.5M | 5 | 3 months ago |
446-
| Phi 4 Mini | 3.8B | 2.5GB | phi4-mini | Phi-4 is a 14B parameter, state-of-the-art open model from Microsoft | 1.5M | 5 | 3 months ago |
447-
| Mistral | 7B | 4.1GB | mistral | The 7B model released by Mistral AI, updated to version 0.3 | 11.6M | 84 | 8 months ago |
448-
| Moondream 2 | 1.4B | 829MB | moondream | A series of multimodal LLMs (MLLMs) designed for vision-language understanding | 946.6K | 17 | 4 months ago |
449-
| Neural Chat | 7B | 4.1GB | neural-chat | A state-of-the-art 12B model with 128k context length | 1.5M | 17 | 8 months ago |
450-
| Starling | 7B | 4.1GB | starling-lm | A state-of-the-art 12B model with 128k context length | 1.5M | 17 | 8 months ago |
451-
| Code Llama | 7B | 3.8GB | codellama | A large language model that can use text prompts to generate and discuss code | 1.9M | 199 | 8 months ago |
452-
| Llama 2 Uncensored | 7B | 3.8GB | llama2-uncensored | Uncensored Llama 2 model by George Sung and Jarrad Hope | 913.2K | 34 | 17 months ago |
453-
| LLaVA | 7B | 4.5GB | llava | LLaVA is a novel end-to-end trained large multimodal model for visual and language understanding | 4.8M | 98 | 14 months ago |
454-
| Granite-3.2 | 8B | 4.9GB | granite3.2 | A high-performing and efficient model | 3.9M | 94 | 8 months ago |
455-
| Llama 3 | 8B | 4.7GB | llama3 | Meta Llama 3: The most capable openly available LLM to date | 7.8M | 68 | 10 months ago |
456-
| Phi 3 | 4B | 4.0GB | phi3 | Phi-3 is a family of lightweight 3B (Mini) and 14B (Medium) state-of-the-art open models | 3M | 72 | 8 months ago |
457-
| Qwen 2 | 7B | 4.1GB | qwen2 | Qwen2 is a new series of large language models from Alibaba group | 4.2M | 97 | 7 months ago |
458-
459-
### HuggingFace Models
460-
461-
| Model | Parameters | Size | Quantization | VRAM Required |
462-
|-------|------------|------|--------------|---------------|
463-
| Mistral | 7B | 14GB | None | 8GB |
464-
| Mistral | 7B | 4GB | 4-bit | 4GB |
465-
| Mistral | 7B | 7GB | 8-bit | 6GB |
466-
467-
### Recommended Models
468-
469-
**Best Overall Performance**:
470-
- Ollama - llama3
471-
- Ollama - llama3.2-vision (for image processing)
472-
- Ollama - phi4
473-
474-
**Best for Limited Hardware (8GB RAM)**:
475-
- Ollama - llama3.2:1b
476-
- Ollama - gemma3:1b
477-
- Ollama - phi4-mini
478-
- Ollama - moondream
329+
# Add model FAQ section
330+
gr.Markdown("""
331+
## Model FAQ
479332
480-
**Best for Code Tasks**:
481-
- Ollama - codellama
482-
- Ollama - deepseek-r1
333+
| Model | Parameters | Size | Download Command |
334+
|-------|------------|------|------------------|
335+
| qwq | 7B | 4.1GB | qwq:latest |
336+
| gemma3 | 7B | 4.1GB | gemma3:latest |
337+
| llama3.3 | 7B | 4.1GB | llama3.3:latest |
338+
| phi4 | 7B | 4.1GB | phi4:latest |
339+
| mistral | 7B | 4.1GB | mistral:latest |
340+
| llava | 7B | 4.1GB | llava:latest |
341+
| phi3 | 7B | 4.1GB | phi3:latest |
342+
| deepseek-r1 | 7B | 4.1GB | deepseek-r1:latest |
483343
484-
**Best for Enterprise Use**:
485-
- Ollama - qwen2
486-
- Ollama - granite3.2
487-
- Ollama - neural-chat
344+
Note: All models are available through Ollama. Make sure Ollama is running on your system.
488345
""")
489346

490347
# Document Processing Tab

agentic_rag/local_rag_agent.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,10 @@ def __init__(self, vector_store: VectorStore = None, model_name: str = None,
186186
if model_name and model_name.startswith("ollama:"):
187187
model_name = model_name.replace("ollama:", "")
188188

189+
# Always append :latest to Ollama model names
190+
if not model_name.endswith(":latest"):
191+
model_name = f"{model_name}:latest"
192+
189193
# Load Ollama model
190194
print("\nLoading Ollama model...")
191195
print(f"Model: {model_name}")

0 commit comments

Comments
 (0)