Skip to content

Conversation

@nullchimp
Copy link
Owner

This pull request introduces several enhancements and refactorings across multiple files to improve configurability, error handling, and code organization. Key changes include the addition of configuration classes for loader scripts, updates to retry logic in embedding requests, and improvements to the agent's query handling. New tests have also been added to validate the configuration and logic changes.

Configuration Management Enhancements:

  • Added LoaderConfig and WebLoaderConfig dataclasses in scripts/loader_config.py to centralize configuration for document and web loaders. Introduced DOC_LOADER_CONFIGS and WEB_LOADER_CONFIGS to store predefined configurations.
  • Updated scripts/doc_loader.py and scripts/url_loader.py to use the new configuration classes, enabling dynamic loading paths and URI replacements. [1] [2]

Embedding Retry Logic Improvements:

  • Refined retry logic in async def _make_embedding_request within src/core/rag/embedder/__init__.py to handle retries more robustly. Reduced initial wait time to 5 seconds and added a cumulative wait for 429 errors.
  • Updated corresponding test in tests/test_core_rag_embedder.py to validate the new retry behavior, ensuring proper handling of multiple sleep intervals.

Agent Behavior Refinements:

  • Enhanced src/agent.py to maintain a query history for better context management during interactions. Added logic to append user and assistant messages to the history and print it for debugging.
  • Updated the agent's system role instructions to emphasize tool usage and discourage fabricated responses, ensuring accurate and reliable outputs.

Testing Additions:

  • Added comprehensive tests in tests/test_scripts_loader_config.py to validate the structure and logic of DOC_LOADER_CONFIGS and WEB_LOADER_CONFIGS, including URI replacement behavior and configuration integrity.

File Renaming and Directory Adjustments:

  • Renamed several files for consistency (doc-loader.pydoc_loader.py, url-loader.pyurl_loader.py, docker/memgraph.shscripts/memgraph.sh) and updated paths accordingly. [1] [2] [3]

Enhances data loading by including additional documentation paths and correcting source URIs for improved context.

Refines the agent's system role to emphasize the use of the GitHub Knowledgebase and external tools for accurate information retrieval, preventing fabricated responses.

Adds chat history to the agent for persistent conversations.

Addresses embedding request failures with retry mechanism and rate limit handling.

Moves memgraph.sh to the scripts directory and updates its path resolution.
Introduces a configuration file for data loaders to improve maintainability and reduce code duplication.

Moves hardcoded paths and URI replacements into configurable objects, making the scripts more flexible and easier to manage.

Adds tests for the new configuration structure and loading logic.
Copilot AI review requested due to automatic review settings June 23, 2025 15:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors loader scripts to use centralized configuration classes, improves embedding retry logic, enhances agent session history handling, and adds corresponding tests.

  • Introduces LoaderConfig and WebLoaderConfig in scripts/loader_config.py and updates doc_loader.py/url_loader.py to use them.
  • Updates _make_embedding_request retry logic to split the 60s wait into two intervals and adds cumulative retry error handling.
  • Enhances src/agent.py with query history tracking and prints history; adds tests for loader configurations.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
scripts/loader_config.py New dataclasses for loader configuration and constants.
scripts/doc_loader.py Uses DOC_LOADER_CONFIGS and applies uri_replacement.
scripts/url_loader.py Uses WEB_LOADER_CONFIGS and applies uri_replacement.
src/core/rag/embedder/init.py Refactored _make_embedding_request retry/backoff logic.
src/core/rag/embedder/text_embedding_3_small.py Removed unused uuid import.
src/agent.py Added history state, updated process_query to use it.
scripts/memgraph.sh Adjusted working directory path.
tests/test_scripts_loader_config.py New tests for loader config classes and URI logic.
tests/test_core_rag_embedder.py Updated tests to validate the new retry sleep calls.
Comments suppressed due to low confidence (2)

src/agent.py:82

  • assistant_message could be None if there are no choices in the response, leading to an AttributeError. Add a guard to ensure assistant_message is not None before calling get().
        result = assistant_message.get("content", "")

scripts/memgraph.sh:10

  • This cd command assumes a 'docker' directory at PROJECT_ROOT, but memgraph.sh was moved to 'scripts'. Confirm the 'docker' folder's location or update this path to match the current layout.
cd "$PROJECT_ROOT/docker"

Today is {date.today().strftime("%d %B %Y")}.
"""

self.history = [
Copy link

Copilot AI Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new history tracking in process_query isn't covered by existing tests. Please add unit tests to verify that history is correctly updated when requests succeed or fail.

Copilot generated this review using guidance from repository custom instructions.
@nullchimp nullchimp merged commit 77dc42e into main Jun 23, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants