-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Continue implementation of experimental Redis vector search #2430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request continues the implementation of experimental Redis vector search functionality for ChatterBot, introducing semantic similarity search capabilities using vector embeddings. The PR implements a "storage-aware architecture" where storage adapters can specify their preferred tagger and search algorithm, enabling automatic optimization based on backend capabilities.
Key Changes:
- Added
NoOpTaggerclass that skips spaCy processing for vector-based storage adapters - Implemented
SemanticVectorSearchalgorithm that relies on storage adapter's native vector similarity - Enhanced Redis storage adapter with confidence scoring from cosine distance and improved timestamp handling
- Added storage adapter methods
get_preferred_tagger()andget_preferred_search_algorithm()for automatic configuration
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/storage/redis.rst | Extensive documentation updates explaining semantic vector search, comparison tables, performance considerations, and usage examples |
| docs/storage/index.rst | Added metadata tags and storage adapter comparison guide with recommendations for different use cases |
| docs/index.rst | Updated main page with modern AI capabilities including semantic vector search |
| chatterbot/tagging.py | Introduced NoOpTagger class for storage adapters that don't require POS-lemma indexing |
| chatterbot/storage/storage_adapter.py | Added base methods for storage adapters to specify preferred tagger and search algorithm |
| chatterbot/storage/redis.py | Implemented preferred tagger/search algorithm methods, enhanced confidence calculation from vector scores, improved timestamp handling, and added search_text_contains parameter support |
| chatterbot/search.py | Added SemanticVectorSearch class that delegates similarity calculation to storage adapter |
| chatterbot/chatterbot.py | Integrated storage-aware architecture with automatic tagger and search algorithm selection based on storage adapter preferences |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| def as_nlp_pipeline( | ||
| self, | ||
| texts: Union[List[str], Tuple[str, dict]], |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type annotation Union[List[str], Tuple[str, dict]] for the texts parameter is incorrect. Based on line 36 where texts[0] is checked with isinstance(texts[0], tuple), the actual expected type should be Union[List[str], List[Tuple[str, dict]]] (a list of strings OR a list of tuples). The current annotation suggests a single tuple, but the implementation expects a list/iterable.
docs/storage/index.rst
Outdated
|
|
||
| **For Semantic AI Chatbots** (Recommended for modern conversational AI): | ||
|
|
||
| Note that as of December 2025, the Redis Vector Storage Adapter is still an experimental betafeature. |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space between "beta" and "feature". Should be "beta feature" instead of "betafeature".
| Note that as of December 2025, the Redis Vector Storage Adapter is still an experimental betafeature. | |
| Note that as of December 2025, the Redis Vector Storage Adapter is still an experimental beta feature. |
| if 'search_text' in kwargs: | ||
| # For now, we'll treat search_text as a filter-only parameter | ||
| # and fall through to the regular query_search below. | ||
| # This prevents the broken behavior of embedding POS tags. | ||
| # The proper fix requires BestMatch to pass additional context | ||
| # or use search_text_contains instead. | ||
| pass |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The search_text parameter is now being ignored (lines 366-372) with just a pass statement. However, the code at line 280 now handles search_text_contains instead. This creates a potential issue where legacy code or logic adapters that pass search_text will get no results. Consider logging a deprecation warning or handling backward compatibility more explicitly to help users migrate from search_text to search_text_contains.
| def get_text_index_string(self, text: Union[str, List[str]]): | ||
| """ | ||
| Return the text unchanged (no indexing applied). | ||
| """ | ||
| return text |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type annotation for the get_text_index_string parameter includes List[str], but the function returns text unchanged. If a list is passed, it will return the list as-is rather than a string. This could cause type inconsistency issues. Consider either removing List[str] from the annotation or handling the list case explicitly (e.g., by joining list elements into a string).
| - ✅ Yes (similar meaning) | ||
| * - Confidence Scoring | ||
| - Levenshtein distance | ||
| - Cosine similarity (1 - distance/2) |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The documentation states confidence is calculated as 1 - distance/2 for cosine similarity, but the actual implementation in the code uses 1.0 - (float(vector_score) / 2.0) where vector_score is the cosine distance. This is correct, but the documentation should be clearer that distance refers to the cosine distance (ranging from 0 to 2), not a normalized value. Consider clarifying: "Cosine similarity (1 - cosine_distance/2, where distance ranges 0-2)".
| - Cosine similarity (1 - distance/2) | |
| - Cosine similarity (1 - cosine_distance/2, where distance ranges 0-2) |
| 'created_at': kwargs.get('created_at') or int(_default_date.strftime('%y%m%d')), | ||
| # Store created_at as Unix timestamp with microseconds (float) | ||
| # This provides full datetime precision while maintaining Redis NUMERIC field compatibility | ||
| 'created_at': kwargs.get('created_at') or _default_date.timestamp(), |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code kwargs.get('created_at') or _default_date.timestamp() has a potential bug. If kwargs.get('created_at') returns a datetime object, it won't have .timestamp() called on it, and a datetime object will be stored instead of a Unix timestamp. This should be: kwargs.get('created_at').timestamp() if kwargs.get('created_at') else _default_date.timestamp() or handle the datetime-to-timestamp conversion explicitly.
| 'created_at': kwargs.get('created_at') or _default_date.timestamp(), | |
| 'created_at': ( | |
| kwargs.get('created_at').timestamp() if isinstance(kwargs.get('created_at'), datetime) | |
| else kwargs.get('created_at') if kwargs.get('created_at') is not None | |
| else _default_date.timestamp() | |
| ), |
chatterbot/chatterbot.py
Outdated
| if isinstance(self.tagger, NoOpTagger): | ||
| # NoOpTagger returns text unchanged, so we can skip the tagging call | ||
| if not input_statement.search_text: | ||
| input_statement.search_text = input_statement.text | ||
| if not input_statement.search_in_response_to and input_statement.in_response_to: | ||
| input_statement.search_in_response_to = input_statement.in_response_to |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The check isinstance(self.tagger, NoOpTagger) at line 215 creates a tight coupling between the ChatBot class and the NoOpTagger implementation. A more maintainable approach would be to add a method like needs_text_indexing() to the tagger interface that returns False for NoOpTagger and True for others. This would make the code more extensible if other taggers with similar behavior are added in the future.
docs/index.rst
Outdated
| **Modern AI Capabilities** (2025): | ||
|
|
||
| - **Semantic Vector Search**: Advanced context understanding using vector embeddings and Redis vector database | ||
| - **Large Language Model (LLM) Integration**: Direct support for Ollama and OpenAI models (in development) |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statement "Large Language Model (LLM) Integration: Direct support for Ollama and OpenAI models (in development)" mentions features that are in development. This could be misleading in documentation as it suggests features that don't yet exist. Consider removing this line or moving it to a separate "Roadmap" or "Planned Features" section to avoid confusion.
docs/storage/redis.rst
Outdated
| # These inputs find similar responses despite different words: | ||
| response1 = chatbot.get_response("What's the weather like?") | ||
| response2 = chatbot.get_response("How's the climate today?") |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a trailing space after the period on line 167. This should be removed for consistency.
| response2 = chatbot.get_response("How's the climate today?") | |
| response2 = chatbot.get_response("How's the climate today?") |
chatterbot/chatterbot.py
Outdated
| from chatterbot.search import SemanticVectorSearch | ||
|
|
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] There's a blank line after the import statement that creates inconsistent spacing. Consider removing this trailing blank line for cleaner code formatting.
No description provided.