- Gabriel Braga Ladislau
- Guilherme Silveira Gomes Brotto
- Marlon Moratti de Amaral
The goal of this project is to enhance Generative Pre-Trained Transformers (GPTs) with episodic memory, allowing them to recall specific facts from past interactions. We implement a memory system that stores and retrieves information, enabling the GPT to provide more contextually relevant responses based on previous exchanges.
We use two pre-trained LLMs: one with an episodic memory system and another without. Both models are fed with factual data, and later, we prompt them with questions about these facts. The answers are then compared against expected results using Sentence-BERT (SBERT) and cosine similarity to measure accuracy and relevance.
The episodic memory system is built using ChromaDB as a database for storing and retrieving facts efficiently.
- ChromaDB – for storing episodic memory
- PyTorch – for working with LLMs
- Sentence-BERT (SBERT) – for evaluating generated answers
- NarrativeQA – as an optional dataset for testing
git clone git@github.com:gbladislau/LLM-Episodic-Memory.git
cd LLM-Episodic-Memorypython3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt- You can use your own dataset or NarrativeQA.
- NarrativeQA is fully available on HuggingFace at NarrativeQA.
- To use different models from HuggingFace you need to have all their dependencies previously installed.
- By default all of
hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4dependencies are in our requirements. - For the SBERT models all of them under Sentence Transformers.
The app.py script allows you to launch the LLM and interact with it, with or without the episodic memory module.
In chat mode, you can exit by typing exit. If you want to exit without saving any data to the memory module, simply type exit_quiet.
usage: app.py [-h] [-m MODEL] [-r REFLECTION_PROMPT] [--results RESULTS] [-e] [-v] [-s SBERT] [-i]
Run the LLM and begin your conversation
options:
-h, --help show this help message and exit
-m MODEL, --model MODEL
Model's Name
-r REFLECTION_PROMPT, --reflection_prompt REFLECTION_PROMPT
Reflection Prompt Template
--results RESULTS Results output path
-e, --episodic Run the LLM with the episodic memory module
-v, --verbose Verbose
-s SBERT, --sbert SBERT
SBERT Model to evaluate the similarity scores
-i, --inference_mode Use inference mode instead of chat mode
-m, --model: Specifies the name of the model to be used.-r, --reflection_prompt: Defines the reflection prompt template to be used.--results: Specifies the file path where results should be saved.-e, --episodic: Runs the LLM with the episodic memory module enabled.-v, --verbose: Enables verbose mode for additional logging and debugging information.-s, --sbert: Specifies the SBERT model to be used for evaluating similarity scores.-i, --inference_mode: Runs the model in inference mode instead of interactive chat mode.
To run the model in standard chat mode:
python app.py -m hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4To run the model with episodic memory enabled:
python app.py -e -m hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4To specify a reflection prompt template:
python app.py -r prompts/reflection_prompt_template.txtTo use a specific SBERT model for evaluation:
python app.py -s all-mpnet-base-v2To save results to a specific file:
python app.py --results results/results.jsonThe evaluation script (evaluate.py) includes several command-line options:
usage: evaluate.py [-h] [--plot] [--dont_rerun] [--gen_prompt] [--result RESULT] [--prompt PROMPT]
Run the evaluation module to calculate the LLMs results
options:
-h, --help show this help message and exit
--plot, -p Generate plot for quantitativeanalysis
--dont_rerun Use the mem.npy and no_mem.npyscores previously calculated
--gen_prompt Generate qualitative prompt
--result RESULT LLM anwsers results input file path (json)
--prompt PROMPT Prompt file output path
--plot, -p: Generates a boxplot comparing the similarity scores of the memory-enabled and memoryless models.--dont_rerun: Uses precomputed scores stored inmem.npyandno_mem.npyinstead of recalculating them.--gen_prompt: Generates a qualitative prompt for evaluation.--result: Specifies the path to the JSON file containing both LLM generated anwsers (default:results/results.json).--prompt: Specifies the output path for the qualitative prompt file (default:prompts/evaluation_prompt.txt).
To run the evaluation with a plot:
python evaluate.py --plotTo use previously calculated scores:
python evaluate.py --dont_rerunTo generate a qualitative prompt:
python evaluate.py --gen_promptThe paper is available in this repository: Episodic Memory in Large Language Models