Conversation
…se correct LLM instance
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #109 +/- ##
==========================================
+ Coverage 90.08% 90.45% +0.36%
==========================================
Files 19 19
Lines 1503 1540 +37
==========================================
+ Hits 1354 1393 +39
+ Misses 149 147 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hey @colinfrisch @sanika-n @wang-boyu , While working on fixes for this file I noticed that the grading logic in episodic memory currently seems to be like this: which is basically calculating the difference between the step at which an event happened and the current step, and then subtracting that linear time penalty from the importance score. The issue is that since importance is on a fixed 1-5 scale, this linear penalty quickly overwhelms the score. For example, a In practice, this means high-importance memories are effectively "forgotten" by the retrieval logic almost immediately. Following the "Generative Agents" paper, I suggest we move toward a normalized/weighted scoring approach, using exponential decay for recency and scaling importance to a [0, 1] range. This would ensure that vital memories remain retrievable over longer periods. Does this look like something we should address? I'm happy to fix this with a more robust scoring implementation. |
|
We can probably do it together in this PR. Two questions -
|
Based on my understanding I think in this file the memory entires are created in a way that suggests that 'importance' an entry inside MemoryEntry.content['importance']
Yup that's exactly the purpose, but additionally, I suggest we implement methods to calculate all the 3 factors and refactor the entire grading logic currently present.
I'm not sure of how to implement the Cosine similarity could be something to start with maybe? And then finally the function could return the sum as the result to produce the ranking result. @boyu what do you think about this method? |
Not quite. The paper uses min-max scaling that during retrieval, all scores are collected and scaled together: https://github.com/joonspk-research/generative_agents/blob/fe05a71d3e4ed7d10bf68aa4eda6dd995ec070f4/reverie/backend_server/persona/cognitive_modules/retrieve.py#L234 In fact they did the same scaling process for recency and relevance scores too (seperate process for each of them).
This looks very much to what was implemented in the paper: https://github.com/joonspk-research/generative_agents/blob/fe05a71d3e4ed7d10bf68aa4eda6dd995ec070f4/reverie/backend_server/persona/cognitive_modules/retrieve.py#L145 Similarly we could also have an adjustable parameter such as their
Yes it seems an text embedding model was used, followed by a cosine similarity score: https://github.com/joonspk-research/generative_agents/blob/fe05a71d3e4ed7d10bf68aa4eda6dd995ec070f4/reverie/backend_server/persona/cognitive_modules/retrieve.py#L175 But unlike the other two scores, |
|
Hey @wang-boyu, Hope you are dong good :) Also added a private function Noticed this same issue recurring in other PRs would you want me to add this to all memory files ? Also may I open a new issue for handling the relevance logic :) |
|
Thanks for the updates @psbuilds
I'm a bit confused. Are we using importance scores in other memory types? For this PR it seems that EpisodicMemory add memory entires per event, unlike other memory types, where memory entries are consolidated and added per step. I've updated this PR to reflect that, but this means EpisodicMemory does not use This may link to #137. If it is expected behavior, then EpisodicMemory doesn't have the issue of overwriting memory entries of the same type, since it's simply appending new entries for each individual event. However, this does mean that the internal attributs is inconsistent now:
We might need to come up with consistent, unified APIs for memory retrieval. But again, not in this PR. |
Yup this had come into my notice while working on it, from my understanding EpisodicMemory is designed around per event granularity, with each individual event stored as its own discrete memory with an importance score. Due to this I think the process_step being a no-op makes sense.
Regarding #137, you're correct. Since EpisodicMemory appends each event as a separate MemoryEntry According to this #137 may be closed since the issue doesn't exist.
Yup this is something that should definitely be added, happy to work :) |
|
Hey @wang-boyu , after a quick review of #165 I noticed an issue with the implementation used there, I think this approach can lead to future breaks and is way too complicated, although I believe cascading changes will be necessary for a potential fix for this issue, the mixed types of I think I can come up with something better but, It may take a few days, if you're okay with that happy to proceed :) |
|
Thanks for flagging this. I haven't got the chance to look into #165 yet, but I agree that having mixed types (either Since #165 is already open to address #137, I would suggest syncing with its author first, to discuss alternatives and see if the implementations can be adjusted accordingly, so that we can avoid duplicated efforts on the same task. If it turns out that some broader refactor / fix is required, then maybe we can have a follow-up as a separate PR. |
Summary
In EpisodicMemory graded entries were calculated but never stored as
MemoryEntryobjects, causingget_prompt_ready()to return empty memory context to the LLM. Also fixed the grading methods using the wrong LLM instance.Bug / Issue
Fixes: #108
add_to_memory never creates MemoryEntry objects:
add_to_memory()grades each event via an LLM call, but only stores the result in step_content (viasuper().add_to_memory()), which is immediately cleared. No MemoryEntry is ever created, so memory_entries stays empty andretrieve_top_k_entries() / get_prompt_ready()return nothing to the LLM.Wrong LLM instance for grading:
grade_event_importance()sets the system prompt onself.llmbut callsself.agent.llm.generate(). This means the grading system prompt is never used and the agent's own system prompt gets silently overwritten during grading.Implementation
episodic_memory.py:
add_to_memory()/aadd_to_memory(): Now creates a MemoryEntry with the graded content and appends it to memory_entries. Uses{**content, "importance": grade}to avoid mutating the input dict.grade_event_importance() / agrade_event_importance(): Changedself.agent.llm.generate() → self.llm.generate()so the grading system prompt is actually used.Testing
The already existing test files were updated to prove this behaviour fix.