game-by-virtuals · Ang-dot · Mar 25, 2025 · Mar 15, 2025 · Mar 25, 2025 · Mar 25, 2025
diff --git a/.github/workflows/validate-new-plugin-metadata.yml b/.github/workflows/validate-new-plugin-metadata.yml
@@ -16,7 +16,7 @@ jobs:
         uses: actions/checkout@v4
         with:
           fetch-depth: 0
-          ref: ${{ github.event.pull_request.head.ref }}
+          ref: ${{ github.event.pull_request.head.sha }}
 
       - name: Identify New Plugin Directories
         id: find_new_plugins

diff --git a/plugins/RAGPinecone/README.md b/plugins/RAGPinecone/README.md
@@ -0,0 +1,219 @@
+# RAGPinecone Plugin for GAME SDK
+
+A Retrieval Augmented Generation (RAG) plugin using Pinecone as the vector database for the GAME SDK.
+
+## Features
+
+- Query a knowledge base for relevant context
+- Advanced hybrid search (vector + BM25) for better retrieval
+- AI-generated answers based on retrieved documents
+- Add documents to the knowledge base
+- Delete documents from the knowledge base
+- Chunk documents for better retrieval
+- Process documents from a folder automatically
+- Integrate with Telegram bot for RAG-powered conversations
+
+## Installation
+
+### From Source
+
+1. Clone the repository or navigate to the plugin directory:
+```bash
+cd game-python/plugins/RAGPinecone
+```
+
+2. Install the plugin in development mode:
+```bash
+pip install -e .
+```
+
+This will install all required dependencies and make the plugin available in your environment.
+
+## Setup and Configuration
+
+1. Set the following environment variables:
+   - `PINECONE_API_KEY`: Your Pinecone API key
+   - `OPENAI_API_KEY`: Your OpenAI API key (for embeddings)
+   - `GAME_API_KEY`: Your GAME API key
+   - `TELEGRAM_BOT_TOKEN`: Your Telegram bot token (if using with Telegram)
+
+2. Import and initialize the plugin to use in your agent:
+
+```python
+from rag_pinecone_gamesdk.rag_pinecone_plugin import RAGPineconePlugin
+from rag_pinecone_gamesdk.rag_pinecone_game_functions import query_knowledge_fn, add_document_fn
+
+# Initialize the plugin
+rag_plugin = RAGPineconePlugin(
+    pinecone_api_key="your-pinecone-api-key",
+    openai_api_key="your-openai-api-key",
+    index_name="your-index-name",
+    namespace="your-namespace"
+)
+
+# Add the functions to your agent's action space
+agent_action_space = [
+    query_knowledge_fn(rag_plugin),
+    add_document_fn(rag_plugin),
+    # ... other functions
+]
+```
+
+## Available Functions
+
+### Basic RAG Functions
+
+1. `query_knowledge(query: str, num_results: int = 3)` - Query the knowledge base for relevant context
+2. `add_document(content: str, metadata: dict = None)` - Add a document to the knowledge base
+
+### Advanced RAG Functions
+
+1. `advanced_query_knowledge(query: str)` - Query the knowledge base using hybrid retrieval (vector + BM25) and get an AI-generated answer
+2. `get_relevant_documents(query: str)` - Get relevant documents using hybrid retrieval without generating an answer
+
+Example usage of advanced functions:
+
+```python
+from rag_pinecone_gamesdk.search_rag import RAGSearcher
+from rag_pinecone_gamesdk.rag_pinecone_game_functions import advanced_query_knowledge_fn, get_relevant_documents_fn
+
+# Initialize the RAG searcher
+rag_searcher = RAGSearcher(
+    pinecone_api_key="your-pinecone-api-key",
+    openai_api_key="your-openai-api-key",
+    index_name="your-index-name",
+    namespace="your-namespace"
+)
+
+# Add the advanced functions to your agent's action space
+agent_action_space = [
+    advanced_query_knowledge_fn(rag_searcher),
+    get_relevant_documents_fn(rag_searcher),
+    # ... other functions
+]
+```
+
+## Populating the Knowledge Base
+
+### Using the Documents Folder
+
+The easiest way to populate the knowledge base is to place your documents in the `Documents` folder and run the provided script:
+
+```bash
+cd game-python/plugins/RAGPinecone
+python examples/populate_knowledge_base.py
+```
+
+This will process all supported files in the Documents folder and add them to the knowledge base.
+
+Supported file types:
+- `.txt` - Text files
+- `.pdf` - PDF documents
+- `.docx` - Word documents
+- `.doc` - Word documents
+- `.csv` - CSV files
+- `.md` - Markdown files
+- `.html` - HTML files
+
+### Using the API
+
+You can also populate the knowledge base programmatically:
+
+```python
+from rag_pinecone_gamesdk.populate_rag import RAGPopulator
+
+# Initialize the populator
+populator = RAGPopulator(
+    pinecone_api_key="your-pinecone-api-key",
+    openai_api_key="your-openai-api-key",
+    index_name="your-index-name",
+    namespace="your-namespace"
+)
+
+# Add a document
+content = "Your document content here"
+metadata = {
+    "title": "Document Title",
+    "author": "Author Name",
+    "source": "Source Name",
+}
+
+status, message, results = populator.add_document(content, metadata)
+print(f"Status: {status}")
+print(f"Message: {message}")
+print(f"Results: {results}")
+
+# Process all documents in a folder
+status, message, results = populator.process_documents_folder()
+print(f"Status: {status}")
+print(f"Message: {message}")
+print(f"Processed {results.get('total_files', 0)} files, {results.get('successful_files', 0)} successful")
+```
+
+## Testing the Advanced Search
+
+You can test the advanced search functionality using the provided example script:
+
+```bash
+cd game-python/plugins/RAGPinecone
+python examples/test_advanced_search.py
+```
+
+This will run a series of test queries using the advanced hybrid retrieval system.
+
+## Integration with Telegram
+
+See the `examples/test_rag_pinecone_telegram.py` file for an example of how to integrate the RAGPinecone plugin with a Telegram bot.
+
+To run the Telegram bot with advanced RAG capabilities:
+
+```bash
+cd game-python/plugins/RAGPinecone
+python examples/test_rag_pinecone_telegram.py
+```
+
+## Advanced Usage
+
+### Hybrid Retrieval
+
+The advanced search functionality uses a hybrid retrieval approach that combines:
+
+1. **Vector Search**: Uses embeddings to find semantically similar documents
+2. **BM25 Search**: Uses keyword matching to find documents with relevant terms
+
+This hybrid approach often provides better results than either method alone, especially for complex queries.
+
+### Custom Document Processing
+
+You can customize how documents are processed by extending the `RAGPopulator` class:
+
+```python
+from rag_pinecone_gamesdk.populate_rag import RAGPopulator
+
+class CustomRAGPopulator(RAGPopulator):
+    def chunk_document(self, content, metadata):
+        # Custom chunking logic
+        # ...
+        return chunked_docs
+```
+
+### Custom Embedding Models
+
+You can use different embedding models by specifying the `embedding_model` parameter:
+
+```python
+rag_plugin = RAGPineconePlugin(
+    embedding_model="sentence-transformers/all-mpnet-base-v2"
+)
+```
+
+## Requirements
+
+- Python 3.9+
+- Pinecone account
+- OpenAI API key
+- GAME SDK
+- langchain
+- langchain_community
+- langchain_pinecone
+- langchain_openai 
diff --git a/plugins/RAGPinecone/examples/populate_knowledge_base.py b/plugins/RAGPinecone/examples/populate_knowledge_base.py
@@ -0,0 +1,142 @@
+import os
+import logging
+import tempfile
+import requests
+import re
+from dotenv import load_dotenv
+import gdown
+
+from rag_pinecone_gamesdk.populate_rag import RAGPopulator
+from rag_pinecone_gamesdk import DEFAULT_INDEX_NAME, DEFAULT_NAMESPACE
+
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+)
+logger = logging.getLogger(__name__)
+
+def download_from_google_drive(folder_url, download_folder):
+    """
+    Download all files from a Google Drive folder
+
+    Args:
+        folder_url: URL of the Google Drive folder
+        download_folder: Local folder to download files to
+
+    Returns:
+        List of downloaded file paths
+    """
+    logger.info(f"Downloading files from Google Drive folder: {folder_url}")
+
+    # Extract folder ID from URL
+    folder_id_match = re.search(r'folders/([a-zA-Z0-9_-]+)', folder_url)
+    if not folder_id_match:
+        logger.error(f"Could not extract folder ID from URL: {folder_url}")
+        return []
+
+    folder_id = folder_id_match.group(1)
+    logger.info(f"Folder ID: {folder_id}")
+
+    # Create download folder if it doesn't exist
+    os.makedirs(download_folder, exist_ok=True)
+
+    # Download all files in the folder
+    try:
+        # Use gdown to download all files in the folder
+        downloaded_files = gdown.download_folder(
+            id=folder_id,
+            output=download_folder,
+            quiet=False,
+            use_cookies=False
+        )
+
+        if not downloaded_files:
+            logger.warning("No files were downloaded from Google Drive")
+            return []
+
+        logger.info(f"Downloaded {len(downloaded_files)} files from Google Drive")
+        return downloaded_files
+
+    except Exception as e:
+        logger.error(f"Error downloading files from Google Drive: {str(e)}")
+        return []
+
+def main():
+    # Load environment variables
+    load_dotenv()
+
+    # Check for required environment variables
+    pinecone_api_key = os.environ.get("PINECONE_API_KEY")
+    openai_api_key = os.environ.get("OPENAI_API_KEY")
+
+    if not pinecone_api_key:
+        logger.error("PINECONE_API_KEY environment variable is not set")
+        return
+
+    if not openai_api_key:
+        logger.error("OPENAI_API_KEY environment variable is not set")
+        return
+
+    # Google Drive folder URL
+    google_drive_url = "https://drive.google.com/drive/folders/1dKYDQxenDkthF0MPr-KOsdPNqEmrAq1c?usp=sharing"
+
+    # Create a temporary directory for downloaded files
+    with tempfile.TemporaryDirectory() as temp_dir:
+        logger.info(f"Created temporary directory for downloaded files: {temp_dir}")
+
+        # Download files from Google Drive
+        downloaded_files = download_from_google_drive(google_drive_url, temp_dir)
+
+        if not downloaded_files:
+            logger.error("No files were downloaded from Google Drive. Exiting.")
+            return
+
+        # Get the Documents folder path for local processing
+        documents_folder = os.path.join(
+            os.path.dirname(os.path.dirname(os.path.abspath(__file__))),
+            "Documents"
+        )
+
+        # Ensure the Documents folder exists
+        if not os.path.exists(documents_folder):
+            os.makedirs(documents_folder)
+            logger.info(f"Created Documents folder at: {documents_folder}")
+
+        # Initialize the RAGPopulator
+        logger.info("Initializing RAGPopulator...")
+        populator = RAGPopulator(
+            pinecone_api_key=pinecone_api_key,
+            openai_api_key=openai_api_key,
+            index_name=DEFAULT_INDEX_NAME,
+            namespace=DEFAULT_NAMESPACE,
+            documents_folder=temp_dir,  # Use the temp directory with downloaded files
+        )
+
+        # Process all documents in the temporary folder
+        logger.info(f"Processing downloaded documents from: {temp_dir}")
+        status, message, results = populator.process_documents_folder()
+
+        # Log the results
+        logger.info(f"Status: {status}")
+        logger.info(f"Message: {message}")
+        logger.info(f"Processed {results.get('total_files', 0)} files, {results.get('successful_files', 0)} successful")
+
+        # Get all document IDs
+        ids = populator.fetch_all_ids()
+        logger.info(f"Total vectors in database: {len(ids)}")
+
+        # Print detailed results for each file
+        if 'results' in results:
+            logger.info("\nDetailed results:")
+            for result in results['results']:
+                file_path = result.get('file_path', 'Unknown file')
+                status = result.get('status', 'Unknown status')
+                message = result.get('message', 'No message')
+                logger.info(f"File: {os.path.basename(file_path)}")
+                logger.info(f"Status: {status}")
+                logger.info(f"Message: {message}")
+                logger.info("---")
+
+if __name__ == "__main__":
+    main()