Skip to content

Conversation

@gunthercox
Copy link
Owner

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request continues the implementation of experimental Redis vector search functionality for ChatterBot, introducing semantic similarity search capabilities using vector embeddings. The PR implements a "storage-aware architecture" where storage adapters can specify their preferred tagger and search algorithm, enabling automatic optimization based on backend capabilities.

Key Changes:

  • Added NoOpTagger class that skips spaCy processing for vector-based storage adapters
  • Implemented SemanticVectorSearch algorithm that relies on storage adapter's native vector similarity
  • Enhanced Redis storage adapter with confidence scoring from cosine distance and improved timestamp handling
  • Added storage adapter methods get_preferred_tagger() and get_preferred_search_algorithm() for automatic configuration

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
docs/storage/redis.rst Extensive documentation updates explaining semantic vector search, comparison tables, performance considerations, and usage examples
docs/storage/index.rst Added metadata tags and storage adapter comparison guide with recommendations for different use cases
docs/index.rst Updated main page with modern AI capabilities including semantic vector search
chatterbot/tagging.py Introduced NoOpTagger class for storage adapters that don't require POS-lemma indexing
chatterbot/storage/storage_adapter.py Added base methods for storage adapters to specify preferred tagger and search algorithm
chatterbot/storage/redis.py Implemented preferred tagger/search algorithm methods, enhanced confidence calculation from vector scores, improved timestamp handling, and added search_text_contains parameter support
chatterbot/search.py Added SemanticVectorSearch class that delegates similarity calculation to storage adapter
chatterbot/chatterbot.py Integrated storage-aware architecture with automatic tagger and search algorithm selection based on storage adapter preferences

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


def as_nlp_pipeline(
self,
texts: Union[List[str], Tuple[str, dict]],
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotation Union[List[str], Tuple[str, dict]] for the texts parameter is incorrect. Based on line 36 where texts[0] is checked with isinstance(texts[0], tuple), the actual expected type should be Union[List[str], List[Tuple[str, dict]]] (a list of strings OR a list of tuples). The current annotation suggests a single tuple, but the implementation expects a list/iterable.

Copilot uses AI. Check for mistakes.

**For Semantic AI Chatbots** (Recommended for modern conversational AI):

Note that as of December 2025, the Redis Vector Storage Adapter is still an experimental betafeature.
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space between "beta" and "feature". Should be "beta feature" instead of "betafeature".

Suggested change
Note that as of December 2025, the Redis Vector Storage Adapter is still an experimental betafeature.
Note that as of December 2025, the Redis Vector Storage Adapter is still an experimental beta feature.

Copilot uses AI. Check for mistakes.
Comment on lines +366 to +372
if 'search_text' in kwargs:
# For now, we'll treat search_text as a filter-only parameter
# and fall through to the regular query_search below.
# This prevents the broken behavior of embedding POS tags.
# The proper fix requires BestMatch to pass additional context
# or use search_text_contains instead.
pass
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search_text parameter is now being ignored (lines 366-372) with just a pass statement. However, the code at line 280 now handles search_text_contains instead. This creates a potential issue where legacy code or logic adapters that pass search_text will get no results. Consider logging a deprecation warning or handling backward compatibility more explicitly to help users migrate from search_text to search_text_contains.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +20
def get_text_index_string(self, text: Union[str, List[str]]):
"""
Return the text unchanged (no indexing applied).
"""
return text
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotation for the get_text_index_string parameter includes List[str], but the function returns text unchanged. If a list is passed, it will return the list as-is rather than a string. This could cause type inconsistency issues. Consider either removing List[str] from the annotation or handling the list case explicitly (e.g., by joining list elements into a string).

Copilot uses AI. Check for mistakes.
- ✅ Yes (similar meaning)
* - Confidence Scoring
- Levenshtein distance
- Cosine similarity (1 - distance/2)
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The documentation states confidence is calculated as 1 - distance/2 for cosine similarity, but the actual implementation in the code uses 1.0 - (float(vector_score) / 2.0) where vector_score is the cosine distance. This is correct, but the documentation should be clearer that distance refers to the cosine distance (ranging from 0 to 2), not a normalized value. Consider clarifying: "Cosine similarity (1 - cosine_distance/2, where distance ranges 0-2)".

Suggested change
- Cosine similarity (1 - distance/2)
- Cosine similarity (1 - cosine_distance/2, where distance ranges 0-2)

Copilot uses AI. Check for mistakes.
'created_at': kwargs.get('created_at') or int(_default_date.strftime('%y%m%d')),
# Store created_at as Unix timestamp with microseconds (float)
# This provides full datetime precision while maintaining Redis NUMERIC field compatibility
'created_at': kwargs.get('created_at') or _default_date.timestamp(),
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code kwargs.get('created_at') or _default_date.timestamp() has a potential bug. If kwargs.get('created_at') returns a datetime object, it won't have .timestamp() called on it, and a datetime object will be stored instead of a Unix timestamp. This should be: kwargs.get('created_at').timestamp() if kwargs.get('created_at') else _default_date.timestamp() or handle the datetime-to-timestamp conversion explicitly.

Suggested change
'created_at': kwargs.get('created_at') or _default_date.timestamp(),
'created_at': (
kwargs.get('created_at').timestamp() if isinstance(kwargs.get('created_at'), datetime)
else kwargs.get('created_at') if kwargs.get('created_at') is not None
else _default_date.timestamp()
),

Copilot uses AI. Check for mistakes.
Comment on lines 215 to 220
if isinstance(self.tagger, NoOpTagger):
# NoOpTagger returns text unchanged, so we can skip the tagging call
if not input_statement.search_text:
input_statement.search_text = input_statement.text
if not input_statement.search_in_response_to and input_statement.in_response_to:
input_statement.search_in_response_to = input_statement.in_response_to
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The check isinstance(self.tagger, NoOpTagger) at line 215 creates a tight coupling between the ChatBot class and the NoOpTagger implementation. A more maintainable approach would be to add a method like needs_text_indexing() to the tagger interface that returns False for NoOpTagger and True for others. This would make the code more extensible if other taggers with similar behavior are added in the future.

Copilot uses AI. Check for mistakes.
docs/index.rst Outdated
**Modern AI Capabilities** (2025):

- **Semantic Vector Search**: Advanced context understanding using vector embeddings and Redis vector database
- **Large Language Model (LLM) Integration**: Direct support for Ollama and OpenAI models (in development)
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement "Large Language Model (LLM) Integration: Direct support for Ollama and OpenAI models (in development)" mentions features that are in development. This could be misleading in documentation as it suggests features that don't yet exist. Consider removing this line or moving it to a separate "Roadmap" or "Planned Features" section to avoid confusion.

Copilot uses AI. Check for mistakes.
# These inputs find similar responses despite different words:
response1 = chatbot.get_response("What's the weather like?")
response2 = chatbot.get_response("How's the climate today?")
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a trailing space after the period on line 167. This should be removed for consistency.

Suggested change
response2 = chatbot.get_response("How's the climate today?")
response2 = chatbot.get_response("How's the climate today?")

Copilot uses AI. Check for mistakes.
Comment on lines 113 to 114
from chatterbot.search import SemanticVectorSearch

Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] There's a blank line after the import statement that creates inconsistent spacing. Consider removing this trailing blank line for cleaner code formatting.

Copilot uses AI. Check for mistakes.
@gunthercox gunthercox merged commit c3b0b6c into master Dec 7, 2025
8 checks passed
@gunthercox gunthercox deleted the redis branch December 7, 2025 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants