MemTensor
diff --git a/‎README.md‎
Lines changed: 14 additions & 16 deletions b/‎README.md‎
Lines changed: 14 additions & 16 deletions
diff --git a/‎docker/requirements.txt‎
Lines changed: 1 addition & 1 deletion b/‎docker/requirements.txt‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎evaluation/README.md‎
Lines changed: 3 additions & 2 deletions b/‎evaluation/README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎evaluation/scripts/PrefEval/pref_eval.py‎
Lines changed: 21 additions & 5 deletions b/‎evaluation/scripts/PrefEval/pref_eval.py‎
Lines changed: 21 additions & 5 deletions
diff --git a/‎evaluation/scripts/PrefEval/pref_mem0.py‎
Lines changed: 34 additions & 7 deletions b/‎evaluation/scripts/PrefEval/pref_mem0.py‎
Lines changed: 34 additions & 7 deletions
diff --git a/‎evaluation/scripts/PrefEval/pref_memobase.py‎
Lines changed: 28 additions & 4 deletions b/‎evaluation/scripts/PrefEval/pref_memobase.py‎
Lines changed: 28 additions & 4 deletions
diff --git a/‎evaluation/scripts/PrefEval/pref_memos.py‎
Lines changed: 39 additions & 17 deletions b/‎evaluation/scripts/PrefEval/pref_memos.py‎
Lines changed: 39 additions & 17 deletions
@@ -54,22 +54,20 @@
 
 ## 📈 Performance Benchmark
 
-MemOS demonstrates significant improvements over baseline memory solutions in multiple reasoning tasks.
-
-| Model       | Avg. Score | Multi-Hop | Open Domain | Single-Hop | Temporal Reasoning |
-|-------------|------------|-----------|-------------|------------|---------------------|
-| **OpenAI**  | 0.5275     | 0.6028    | 0.3299      | 0.6183     | 0.2825              |
-| **MemOS**   | **0.7331** | **0.6430** | **0.5521**   | **0.7844** | **0.7321**          |
-| **Improvement** | **+38.98%** | **+6.67%** | **+67.35%** | **+26.86%** | **+159.15%**       |
-
-> 💡 **Temporal reasoning accuracy improved by 159% compared to the OpenAI baseline.**
-
-### Details of End-to-End Evaluation on LOCOMO
-
-> [!NOTE]
-> Comparison of LLM Judge Scores across five major tasks in the LOCOMO benchmark. Each bar shows the mean evaluation score judged by LLMs for a given method-task pair, with standard deviation as error bars. MemOS-0630 consistently outperforms baseline methods (LangMem, Zep, OpenAI, Mem0) across all task types, especially in multi-hop and temporal reasoning scenarios.
-
-<img src="https://statics.memtensor.com.cn/memos/score_all_end2end.jpg" alt="END2END SCORE">
+MemOS demonstrates significant improvements over baseline memory solutions in multiple memory tasks,
+showcasing its capabilities in **information extraction**, **temporal and cross-session reasoning**, and **personalized preference responses**.
+
+| Model           | LOCOMO      | LongMemEval | PrefEval-10 | PersonaMem  |
+|-----------------|-------------|-------------|-------------|-------------|
+| **GPT-4o-mini** | 52.75       | 55.4        | 2.8         | 43.46       |
+| **MemOS**       | **75.80**   | **77.80**   | **71.90**   | **61.17**   |
+| **Improvement** | **+43.70%** | **+40.43%** | **+2568%**  | **+40.75%** |
+
+### Detailed Evaluation Results
+- We use gpt-4o-mini as the processing and judging LLM and bge-m3 as embedding model in MemOS evaluation.
+- The evaluation was conducted under conditions that align various settings as closely as possible. Reproduce the results with our scripts at [`evaluation`](./evaluation).
+- Check the full search and response details at huggingface https://huggingface.co/datasets/MemTensor/MemOS_eval_result.
+> 💡 **MemOS outperforms all other methods (Mem0, Zep, Memobase, SuperMemory et al.) across all benchmarks!**
 
 ## ✨ Key Features
 
 
@@ -157,4 +157,4 @@ volcengine-python-sdk==4.0.6
 watchfiles==1.1.0
 websockets==15.0.1
 xlrd==2.0.2
-xlsxwriter==3.2.5
+xlsxwriter==3.2.5
@@ -1,6 +1,6 @@
 # Evaluation Memory Framework
 
-This repository provides tools and scripts for evaluating the LoCoMo dataset using various models and APIs.
+This repository provides tools and scripts for evaluating the `LoCoMo`, `LongMemEval`, `PrefEval`, `personaMem` dataset using various models and APIs.
 
 ## Installation
 
@@ -68,7 +68,8 @@ First prepare the dataset `longmemeval_s` from https://huggingface.co/datasets/x
 ```
 
 ### PrefEval Evaluation
-To evaluate the **Prefeval** dataset using one of the supported memory frameworks — run the following [script](./scripts/run_prefeval_eval.sh):
+Downloading benchmark_dataset/filtered_inter_turns.json from https://github.com/amazon-science/PrefEval/blob/main/benchmark_dataset/filtered_inter_turns.json and save it as `./data/prefeval/filtered_inter_turns.json`.
+To evaluate the **Prefeval** dataset — run the following [script](./scripts/run_prefeval_eval.sh):
 
 ```bash
 # Edit the configuration in ./scripts/run_prefeval_eval.sh
 
@@ -392,23 +392,39 @@ async def main(concurrency_limit: int, input_file: str, output_file: str, output
 if __name__ == "__main__":
     parser = argparse.ArgumentParser(description="Evaluate assistant responses from a JSONL file.")
 
-    parser.add_argument(
-        "--input", type=str, required=True, help="Path to the input JSONL file from pref_memos.py."
-    )
+    parser.add_argument("--input", type=str, required=True, help="Path to the input JSONL file.")
 
     parser.add_argument(
         "--concurrency-limit",
         type=int,
         default=10,
         help="The maximum number of concurrent API calls.",
     )
+
+    parser.add_argument(
+        "--lib",
+        type=str,
+        choices=[
+            "memos-api-online",
+            "mem0",
+            "mem0_graph",
+            "memos-api",
+            "memobase",
+            "memu",
+            "supermemory",
+            "zep",
+        ],
+        default="memos-api",
+        help="Which library to use (used in 'add' mode).",
+    )
+
     args = parser.parse_args()
 
     input_path = args.input
     output_dir = os.path.dirname(input_path)
 
-    output_jsonl_path = os.path.join(output_dir, "eval_pref_memos.jsonl")
-    output_excel_path = os.path.join(output_dir, "eval_pref_memos_summary.xlsx")
+    output_jsonl_path = os.path.join(output_dir, f"eval_pref_{args.lib}.jsonl")
+    output_excel_path = os.path.join(output_dir, f"eval_pref_{args.lib}_summary.xlsx")
 
     asyncio.run(
         main(
 
@@ -29,7 +29,13 @@
 
 
 def add_memory_for_line(
-    line_data: tuple, mem_client, num_irrelevant_turns: int, lib: str, version: str
+    line_data: tuple,
+    mem_client,
+    num_irrelevant_turns: int,
+    lib: str,
+    version: str,
+    success_records,
+    f,
 ) -> dict:
     """
     Adds conversation memory for a single line of data to MemOS and returns the data with a persistent user_id.
@@ -46,13 +52,22 @@ def add_memory_for_line(
         elif num_irrelevant_turns == 300:
             conversation = conversation + irre_300
 
-        turns_add = 5
         start_time_add = time.monotonic()
-        if conversation:
-            for chunk_start in range(0, len(conversation), turns_add * 2):
-                chunk = conversation[chunk_start : chunk_start + turns_add * 2]
-                timestamp_add = int(time.time() * 100)
-                mem_client.add(messages=chunk, user_id=user_id, timestamp=timestamp_add)
+
+        for idx, _ in enumerate(conversation[::2]):
+            msg_idx = idx * 2
+            record_id = f"{lib}_user_pref_eval_{i}_{version}_{msg_idx!s}"
+            timestamp_add = int(time.time() * 100)
+
+            if record_id not in success_records:
+                mem_client.add(
+                    messages=conversation[msg_idx : msg_idx + 2],
+                    user_id=user_id,
+                    timestamp=timestamp_add,
+                )
+                f.write(f"{record_id}\n")
+                f.flush()
+
         end_time_add = time.monotonic()
         add_duration = end_time_add - start_time_add
 
@@ -210,6 +225,15 @@ def main():
     from utils.client import Mem0Client
 
     mem_client = Mem0Client(enable_graph="graph" in args.lib)
+    os.makedirs(f"results/prefeval/{args.lib}_{args.version}", exist_ok=True)
+    success_records = set()
+    record_file = f"results/prefeval/{args.lib}_{args.version}/success_records.txt"
+    if os.path.exists(record_file):
+        print(f"Loading existing success records from {record_file}...")
+        with open(record_file, encoding="utf-8") as f:
+            for i in f.readlines():
+                success_records.add(i.strip())
+        print(f"Loaded {len(success_records)} records.")
 
     if args.mode == "add":
         print(f"Running in 'add' mode. Ingesting memories from '{args.input}'...")
@@ -218,6 +242,7 @@ def main():
         with (
             open(args.output, "w", encoding="utf-8") as outfile,
             concurrent.futures.ThreadPoolExecutor(max_workers=args.max_workers) as executor,
+            open(record_file, "a+", encoding="utf-8") as f,
         ):
             futures = [
                 executor.submit(
@@ -227,6 +252,8 @@ def main():
                     args.add_turn,
                     args.lib,
                     args.version,
+                    success_records,
+                    f,
                 )
                 for i, line in enumerate(lines)
             ]
 
@@ -28,16 +28,20 @@
 
 
 def add_memory_for_line(
-    line_data: tuple, mem_client, num_irrelevant_turns: int, lib: str, version: str
+    line_data: tuple,
+    mem_client,
+    num_irrelevant_turns: int,
+    lib: str,
+    version: str,
+    success_records,
+    f,
 ) -> dict:
     """
     Adds conversation memory for a single line of data to MemOS and returns the data with a persistent user_id.
     """
     i, line = line_data
     user_id = f"{lib}_user_pref_eval_{i}_{version}"
     mem_client.delete_user(user_id)
-    user_id = mem_client.client.add_user({"user_id": user_id})
-    print("user_id:", user_id)
     try:
         original_data = json.loads(line)
         conversation = original_data.get("conversation", [])
@@ -63,7 +67,14 @@ def add_memory_for_line(
                         "created_at": timestamp_add,
                     }
                 )
-            mem_client.add(messages=messages, user_id=user_id)
+            for idx, _ in enumerate(conversation[::2]):
+                msg_idx = idx * 2
+                record_id = f"{lib}_user_pref_eval_{i}_{version}_{msg_idx!s}"
+
+                if record_id not in success_records:
+                    mem_client.add(messages=conversation[msg_idx : msg_idx + 2], user_id=user_id)
+                    f.write(f"{record_id}\n")
+                    f.flush()
 
         end_time_add = time.monotonic()
         add_duration = end_time_add - start_time_add
@@ -222,13 +233,24 @@ def main():
 
     mem_client = MemobaseClient()
 
+    os.makedirs(f"results/prefeval/{args.lib}_{args.version}", exist_ok=True)
+    success_records = set()
+    record_file = f"results/prefeval/{args.lib}_{args.version}/success_records.txt"
+    if os.path.exists(record_file):
+        print(f"Loading existing success records from {record_file}...")
+        with open(record_file, encoding="utf-8") as f:
+            for i in f.readlines():
+                success_records.add(i.strip())
+        print(f"Loaded {len(success_records)} records.")
+
     if args.mode == "add":
         print(f"Running in 'add' mode. Ingesting memories from '{args.input}'...")
         print(f"Adding {args.add_turn} irrelevant turns.")
         print(f"Using {args.max_workers} workers.")
         with (
             open(args.output, "w", encoding="utf-8") as outfile,
             concurrent.futures.ThreadPoolExecutor(max_workers=args.max_workers) as executor,
+            open(record_file, "a+", encoding="utf-8") as f,
         ):
             futures = [
                 executor.submit(
@@ -238,6 +260,8 @@ def main():
                     args.add_turn,
                     args.lib,
                     args.version,
+                    success_records,
+                    f,
                 )
                 for i, line in enumerate(lines)
             ]
 
@@ -21,7 +21,6 @@
 sys.path.insert(0, ROOT_DIR)
 sys.path.insert(0, EVAL_SCRIPTS_DIR)
 
-
 load_dotenv()
 OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
 BASE_URL = os.getenv("OPENAI_BASE_URL")
@@ -30,8 +29,8 @@
 
 
 def add_memory_for_line(
-    line_data: tuple, mem_client, num_irrelevant_turns: int, lib: str, version: str
-) -> dict:
+    line_data, mem_client, num_irrelevant_turns, lib, version, success_records, f
+):
     """
     Adds conversation memory for a single line of data to MemOS and returns the data with a persistent user_id.
     """
@@ -47,15 +46,22 @@ def add_memory_for_line(
         elif num_irrelevant_turns == 300:
             conversation = conversation + irre_300
 
-        turns_add = 5
         start_time_add = time.monotonic()
-        if conversation:
-            if os.getenv("PRE_SPLIT_CHUNK", "false").lower() == "true":
-                for chunk_start in range(0, len(conversation), turns_add * 2):
-                    chunk = conversation[chunk_start : chunk_start + turns_add * 2]
-                    mem_client.add(messages=chunk, user_id=user_id, conv_id=None)
-            else:
-                mem_client.add(messages=conversation, user_id=user_id, conv_id=None)
+
+        for idx, _ in enumerate(conversation[::2]):
+            msg_idx = idx * 2
+            record_id = f"{lib}_user_pref_eval_{i}_{version}_{msg_idx!s}"
+
+            if record_id not in success_records:
+                mem_client.add(
+                    messages=conversation[msg_idx : msg_idx + 2],
+                    user_id=user_id,
+                    conv_id=None,
+                    batch_size=2,
+                )
+                f.write(f"{record_id}\n")
+                f.flush()
+
         end_time_add = time.monotonic()
         add_duration = end_time_add - start_time_add
 
@@ -68,7 +74,7 @@ def add_memory_for_line(
         return None
 
 
-def search_memory_for_line(line_data: tuple, mem_client, top_k_value: int) -> dict:
+def search_memory_for_line(line_data, mem_client, top_k_value):
     """
     Processes a single line of data, searching memory based on the question.
     """
@@ -98,7 +104,7 @@ def search_memory_for_line(line_data: tuple, mem_client, top_k_value: int) -> di
                 f"- {entry.get('memory', '')}"
                 for entry in relevant_memories["text_mem"][0]["memories"]
             )
-            + f"\n{relevant_memories['pref_mem']}"
+            + f"\n{relevant_memories.get('pref_string', '')}"
         )
 
         memory_tokens_used = len(tokenizer.encode(memories_str))
@@ -120,7 +126,7 @@ def search_memory_for_line(line_data: tuple, mem_client, top_k_value: int) -> di
         return None
 
 
-def generate_response_for_line(line_data: tuple, openai_client: OpenAI, lib: str) -> dict:
+def generate_response_for_line(line_data, openai_client, lib):
     """
     Generates a response for a single line of data using pre-fetched memories.
     """
@@ -195,7 +201,7 @@ def main():
     parser.add_argument(
         "--lib",
         type=str,
-        choices=["memos-api", "memos-local"],
+        choices=["memos-api", "memos-api-online"],
         default="memos-api",
         help="Which MemOS library to use (used in 'add' mode).",
     )
@@ -218,9 +224,22 @@ def main():
         print(f"Error: Input file '{args.input}' not found")
         return
 
-    from utils.client import MemosApiClient
+    from utils.client import MemosApiClient, MemosApiOnlineClient
+
+    if args.lib == "memos-api":
+        mem_client = MemosApiClient()
+    elif args.lib == "memos-api-online":
+        mem_client = MemosApiOnlineClient()
 
-    mem_client = MemosApiClient()
+    os.makedirs(f"results/prefeval/{args.lib}_{args.version}", exist_ok=True)
+    success_records = set()
+    record_file = f"results/prefeval/{args.lib}_{args.version}/success_records.txt"
+    if os.path.exists(record_file):
+        print(f"Loading existing success records from {record_file}...")
+        with open(record_file, encoding="utf-8") as f:
+            for i in f.readlines():
+                success_records.add(i.strip())
+        print(f"Loaded {len(success_records)} records.")
 
     if args.mode == "add":
         print(f"Running in 'add' mode. Ingesting memories from '{args.input}'...")
@@ -229,6 +248,7 @@ def main():
         with (
             open(args.output, "w", encoding="utf-8") as outfile,
             concurrent.futures.ThreadPoolExecutor(max_workers=args.max_workers) as executor,
+            open(record_file, "a+", encoding="utf-8") as record_f,
         ):
             futures = [
                 executor.submit(
@@ -238,6 +258,8 @@ def main():
                     args.add_turn,
                     args.lib,
                     args.version,
+                    success_records,
+                    record_f,
                 )
                 for i, line in enumerate(lines)
             ]