Skip to content

Implement Semantic Memory Store for Distilled Knowledge #140

@matiasmolinas

Description

@matiasmolinas

Problem/Motivation

(Solution inspired on langmem .. )

Currently, the EAT framework's "Smart Memory" is purely episodic. The eat_agent_experiences collection stores a raw, chronological log of everything an agent does. While this is invaluable for detailed analysis, it is inefficient for the SystemAgent to sift through raw episodes every time it needs to make a decision.

The system lacks a mechanism for storing generalized knowledge, or "wisdom," derived from these experiences. For example, if the system successfully processes invoices using a specific toolchain five times, this learning should be distilled into a single, high-confidence "fact" or "best practice."

This new layer of memory, inspired by the "Semantic Memory" concept in the langmem project, will allow agents to quickly access established patterns and facts, making their planning and decision-making faster and more effective.

Proposed Solution

We will introduce a new "Semantic Memory" store backed by a dedicated MongoDB collection, eat_semantic_memory. This collection will store distilled pieces of knowledge (facts, patterns, heuristics) that are generated by a background process (to be implemented in a separate issue).

This issue covers the creation of the storage backend, the data model, and the necessary tools to interact with it.

Implementation Details

  1. Define MongoDB Schema:

    • Create a new schema definition file: eat_semantic_memory_schema.md.
    • The schema for the eat_semantic_memory collection should include the following fields:
      Field Name Data Type Description
      fact_id String (UUID) Primary Key. Unique identifier for the semantic fact.
      fact_text String The distilled piece of knowledge in natural language. (e.g., "Using ToolA followed by ToolB is an effective pattern for 'invoice data extraction'.")
      fact_embedding Array (Vector) The vector embedding of fact_text for semantic search.
      confidence_score Float A score from 0.0 to 1.0 indicating the system's confidence in this fact.
      source_experience_ids List of Strings A list of experience_ids from eat_agent_experiences that were used to derive this fact.
      domain String The operational domain this fact applies to (e.g., 'finance', 'code_generation'). Indexed.
      tags List of Strings Keywords for filtering and categorization. Indexed.
      created_at ISODate Timestamp of when the fact was created.
      last_accessed_at ISODate Timestamp of when the fact was last retrieved.
  2. Create Pydantic Data Model:

    • In a new file, evolving_agents/memory/models.py, create a Pydantic model SemanticFact that mirrors the MongoDB schema. This will ensure type safety when working with the data in Python.
  3. Implement the Storage Tool:

    • Create a new tool: evolving_agents/tools/internal/mongo_semantic_memory_tool.py.
    • This tool, MongoSemanticMemoryTool, will be responsible for all interactions with the eat_semantic_memory collection.
    • It should be initialized with an LLMService and a MongoDBClient.
    • Implement the following async methods:
      • add_fact(fact_text: str, confidence: float, source_ids: List[str], ...): This method will take the fact details, generate an embedding for fact_text using the LLMService, create a SemanticFact object, and insert it into MongoDB.
      • search_facts(query: str, top_k: int = 5, ...): This method will embed the query and perform a $vectorSearch on the fact_embedding field in MongoDB to retrieve the most relevant facts.
      • update_fact_confidence(fact_id: str, new_confidence: float): Allows for updating the confidence of an existing fact.
      • find_fact_by_id(fact_id: str): Retrieves a single fact by its ID.
  4. Update Database Setup:

    • Modify docs/MONGO-SETUP.md to include instructions for creating the new eat_semantic_memory collection.
    • Crucially, add the definition for the new Atlas Vector Search index on the fact_embedding field. The index should be named something like vector_index_semantic_facts_default.
  5. Integration with Dependency Container:

    • Update dependency_container.py or the main application setup to instantiate MongoSemanticMemoryTool and register it in the container so other components can access it.

Acceptance Criteria

  • The eat_semantic_memory_schema.md file is created and defines the new collection structure.
  • The MongoSemanticMemoryTool is implemented with methods for adding and searching for semantic facts.
  • Unit tests are created for MongoSemanticMemoryTool to verify that facts can be added and semantically searched.
  • The docs/MONGO-SETUP.md guide is updated with instructions for the new collection and its vector index.
  • The new tool is successfully registered and retrievable from the DependencyContainer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions