Hi Mem0 team,
I enjoyed reading your paper "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory".
While reviewing the baseline comparisons, I noticed a discrepancy in Table 1 regarding the reported performance of the A-Mem system.
In Table 1 of the Mem0 paper, the F1 and BLEU-1 scores for A-Mem are listed as follows:
| Question Type |
A-Mem's F1 score |
A-Mem's BLEU-1 score |
| Single Hop |
27.02 |
20.09 |
| Multi-Hop |
12.14 |
12.00 |
| Open Domain |
44.65 |
37.06 |
| Temporal |
45.85 |
36.67 |
However, these values appear to contradict the data reported in the original A-Mem paper (A-MEM: Agentic Memory for LLM Agents, Table 1).
Comparing the two tables, it seems that the results for Single Hop, Multi-Hop, and Open Domain may have been mixed up or mislabeled in the Mem0 paper. For example, the values assigned to Single Hop in Mem0 seem to correspond to a different category (or vice versa) in the original source.
Reference from A-Mem Paper (Table 1):
Could you please verify if this is a transcription error during the compilation of the results?
Thank you!