Adding RAG and new output mode by mistrjirka · Pull Request #2 · stratosphereips/bsy-clippy

mistrjirka · 2025-10-18T16:15:34Z

Description

This PR adds two major features to bsy-clippy:

Vector Database (RAG) Support: Implements Retrieval-Augmented Generation using vector embeddings to handle large stdin inputs more effectively
Hide Thinking Mode: Adds a --hide-thinking flag to hide LLM reasoning (<think> tags) and show a spinner instead

Motivation and Context

Vector Database Feature

When processing large files through stdin, sending the entire content to the LLM can:

Exceed context window limits
Result in less relevant responses
Waste tokens on irrelevant sections

The vector database solves this by:

Chunking text intelligently (paragraph-aware with overlap)
Creating semantic embeddings using BAAI/bge-small-en-v1.5
Retrieving only the most relevant chunks for each query
Using HNSW indexing for fast similarity search

Hide Thinking Feature

Some users prefer cleaner output without the LLM's reasoning process. This feature:

Hides <think>...</think> sections completely
Shows an animated spinner in stream mode to indicate processing
Works in both batch and stream modes
Compatible with all other features (vector mode, interactive mode, etc.)

API Key Handling Fix

Fixed an issue where localhost endpoints (127.0.0.1, localhost) required an API key even though local LLM servers like Ollama don't need authentication.

Dependencies

New dependencies added to requirements.txt and pyproject.toml:

fastembed>=0.3.0 - CPU-based text embeddings
hnswlib>=0.8.0 - Fast approximate nearest neighbor search
numpy>=1.24.0 - Array operations for embeddings

Type of change

New feature (non-breaking change which adds functionality)
Bug fix (localhost API key handling)
This change requires a documentation update

How Has This Been Tested?

All tests were run on Python 3.13.7 with Ollama (qwen3:1.7b) on localhost.

Test 1: Vector Mode Without Chat Continuation

Command:

cat /tmp/test.txt | python bsy-clippy.py --vector --profile localollama -u "What is AI" --mode batch

Expected: Should build vector index, answer the question, and exit (no interactive mode).

Result: ✅ PASSED


Creating vector embeddings for 1 chunks...
Vector index ready (1 chunks, 384 dimensions)
Vector database ready with 1 chunks
<think>
Okay, the user asked "What is AI?" and provided some context. Let me see. The context mentions that the Solar System has eight planets, AI is a branch of computer science, and Python is a programming language.

So, the user wants a short explanation of AI. From the context, AI is defined as a branch of computer science. The other info about planets and Python doesn't directly relate to AI, but maybe the user wants a concise answer. I should focus on the main point from the context. Make sure it's brief and to the point. Avoid any extra info not in the context. So, the answer is "Artificial Intelligence is a branch of computer science." That's concise and uses the relevant context. Check if it's very short and brief. Yes, it's just a sentence. Alright, that should do it.
</think>

Artificial Intelligence is a branch of computer science.

Test 2: Hide-Thinking Flag in Batch Mode

Command:

cat /tmp/test.txt | python bsy-clippy.py --profile localollama -u "Explain this" --mode batch --hide-thinking

Expected: Should process input and show only the answer (no <think> tags).

Result: ✅ PASSED

The Solar System has eight planets (Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune). 
AI is a branch of computer science focused on creating intelligent machines. 
Python is a programming language used for tasks like web development, data analysis, and automation.

Test 3: Hide-Thinking Flag in Stream Mode

Command:

echo "test" | python bsy-clippy.py --profile localollama -u "Say hello" --hide-thinking

Expected: Should show animated spinner during thinking, then display only the final answer.

Result: ✅ PASSED

                  

Hello!

Note: The whitespace represents where the spinner was displayed and then cleared

Test 4: Normal Mode (Default Behavior)

Command:

echo "test" | python bsy-clippy.py --profile localollama -u "Say hi" --mode batch

Expected: Should show colored <think> tags with the reasoning process.

Result: ✅ PASSED

<think>
Okay, the user said "Say hi" and then wrote "test". I need to respond in a brief and short way. 
Let's see, the main action is to say hi, but the test part is probably a prompt. 
So a simple "Hi!" would work. Maybe add a smiley to keep it friendly. 
But since it's short, just "Hi!" is better. Make sure it's concise.
</think>

Hi! 😊

Test 5: Vector Mode With Chat Continuation

Command:

cat /tmp/test.txt | python bsy-clippy.py --vector --profile localollama -u "What is AI" -c

Expected: Should answer question first, then enter interactive mode with RAG enabled.

Result: ✅ PASSED - Answers question, then prompts "You can now ask questions about the input data." and enters interactive mode.

Test 6: Localhost API Key Handling

Command:

unset OPENAI_API_KEY
echo "test" | python bsy-clippy.py --profile localollama -u "Say hello"

Expected: Should work without API key for localhost endpoints.

Result: ✅ PASSED - No error, processes request successfully.

Test 7: Remote Endpoint API Key Requirement

Command:

unset OPENAI_API_KEY
echo "test" | python bsy-clippy.py --base-url https://api.openai.com/v1 -u "test"

Expected: Should show error requiring API key for remote endpoints.

Result: ✅ PASSED

[Error] OPENAI_API_KEY is not set. Create a .env file with OPENAI_API_KEY=<token> or export it.

Test Configuration

Python Version: 3.13.7
LLM: Ollama with qwen3:1.7b model
Endpoint: http://127.0.0.1:11434/v1

Test Data: /tmp/test.txt containing:

The Solar System contains eight planets.
Artificial Intelligence is a branch of computer science.
Python is a programming language.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have tested all new features with multiple scenarios
Dependencies are properly documented in requirements.txt and pyproject.toml
Backward compatibility maintained (all existing functionality still works)

Additional Notes

Architecture Changes

Vector Index Implementation

Class: VectorIndex - Manages embeddings and HNSW index
Function: chunk_text() - Intelligent paragraph-aware chunking with configurable overlap
Function: build_vector_index() - Wrapper to create index from text

Spinner Implementation

Class: Spinner - Threaded spinner with animated frames
Frames: Uses Braille patterns (⠋ ⠙ ⠹ ⠸ ⠼ ⠴ ⠦ ⠧ ⠇ ⠏) for smooth animation
Thread Safety: Daemon thread with proper cleanup

Refactoring

Function: handle_stdin_with_vector() - Separated logic for vector-enabled stdin processing
Function: handle_stdin_without_vector() - Normal stdin processing
Main function: Reduced from 120+ lines to ~60 lines by extracting helper functions

New CLI Arguments

--vector                Enable RAG mode for large stdin inputs
--chunk-size N          Chunk size for vector database (default: 500 characters)
--retrieve-chunks N     Number of chunks to retrieve (default: 4)
--hide-thinking         Hide <think> sections and show spinner instead

Performance Considerations

Vector embedding is CPU-based (no GPU required)
HNSW index provides O(log n) search complexity
Spinner runs in background thread (no blocking)
Chunk overlap ensures context preservation across boundaries

…lhost endpoints

mistrjirka added 4 commits October 18, 2025 17:34

Add RAG mode support with vector database and enhance CLI functionality

c7b5961

Refactor OpenAI client creation to simplify API key handling for loca…

0fd2bd3

…lhost endpoints

Simplify localhost endpoint check in OpenAI client creation

73fb43b

Add spinner for thinking output and hide <think> sections in CLI

78d42cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding RAG and new output mode#2

Adding RAG and new output mode#2
mistrjirka wants to merge 4 commits intostratosphereips:mainfrom
mistrjirka:main

mistrjirka commented Oct 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mistrjirka commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Vector Database Feature

Hide Thinking Feature

API Key Handling Fix

Dependencies

Type of change

How Has This Been Tested?

Test 1: Vector Mode Without Chat Continuation

Test 2: Hide-Thinking Flag in Batch Mode

Test 3: Hide-Thinking Flag in Stream Mode

Test 4: Normal Mode (Default Behavior)

Test 5: Vector Mode With Chat Continuation

Test 6: Localhost API Key Handling

Test 7: Remote Endpoint API Key Requirement

Test Configuration

Checklist:

Additional Notes

Architecture Changes

Vector Index Implementation

Spinner Implementation

Refactoring

New CLI Arguments

Performance Considerations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mistrjirka commented Oct 18, 2025 •

edited

Loading