Skip to content

Commit 6efe419

Browse files
Nyakult2RantfridayL
authored
add pm and pref eval scripts (#385)
* feat: check nodes existence * feat: use different template for different language input * feat: use different template for different language input * fix: eval script * feat: memos-api eval scripts * feat: mem reader * feat: 实现äºprefeval memos-api evaluation scripts * refactor:format code * feat: add PersonaMem eval scripts * docs(evaluation): update PersonaMem eval readme * feat:memos-api ingest batch message * feat: refactor search * feat: refactor search * update: add api for memory * feat: add memory api return memory and memory type * refactor(server):重构服务器路由模块以优化内存管理 * format: ruff format code * feat(server): 增加LLM最大令牌数 * test * fix: user query embedding for search * count memory_size by user * fix(server):修复记忆读取逻辑中的列表展开问题 * feat(nebular):优化图数据库查询性能 * refactor(memory): - 移除了对 `_refresh_memory_size` 方法的调用- 保留原有逻辑以便后续恢复或重构 * feat: remove user idx_memory_user_name * feat(graph):优化Nebula图数据库查询性能 * feat: rollback remove_oldest_memory * feat:nebula gql add index * feat: align code * feat: update memos_api * feat: update memos_api * feat: 更新默认选项 * feat:memory client * feat:refactor lme * feat: memu & supermemory client * feat: locomo memu * feat: locomo supermemory * New 'add' and 'process' modes. * feat: lme supermemory & memu * feat: default args * api and local * api and local * memobase fix * memos fix * default args * fix memos-api search data * prefeval pipeline * fix lme memos-api * personamem pipeline * personamem pipeline * lme scrips * align dev * format code * refactor: remove old files * format code * pm and prefeval pipeline * format code * format code * pm and prefeval pipeline * pm and prefeval pipeline * pm and prefeval pipeline * format code * format code * pref pipeline * add search response mode * add search response mode * update readme and example * update mem0 api * pm mem0 * fix MEMOBASE api * update pm and prefeval pipepline for frames * update pm and prefeval readme * format code * fix memobase api * fix memobase api * format code * format code * fix format * fix format * fix format --------- Co-authored-by: 2Rant <[email protected]> Co-authored-by: fridayL <[email protected]>
1 parent b5ea7e6 commit 6efe419

21 files changed

+2397
-455
lines changed

evaluation/.env-example

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,28 @@ MODEL="gpt-4o-mini"
33
OPENAI_API_KEY="sk-***REDACTED***"
44
OPENAI_BASE_URL="http://***.***.***.***:3000/v1"
55

6-
MEM0_API_KEY="m0-***REDACTED***"
7-
8-
ZEP_API_KEY="z_***REDACTED***"
96

107
# response model
118
CHAT_MODEL="gpt-4o-mini"
129
CHAT_MODEL_BASE_URL="http://***.***.***.***:3000/v1"
1310
CHAT_MODEL_API_KEY="sk-***REDACTED***"
1411

12+
# memos
1513
MEMOS_KEY="Token mpg-xxxxx"
16-
MEMOS_URL="https://apigw-pre.memtensor.cn/api/openmem/v1"
17-
PRE_SPLIT_CHUNK=false # pre split chunk in client end
14+
MEMOS_URL="http://127.0.0.1:8001"
15+
MEMOS_ONLINE_URL="https://memos.memtensor.cn/api/openmem/v1"
16+
17+
# other memory agents
18+
MEM0_API_KEY="m0-xxx"
19+
ZEP_API_KEY="z_xxx"
20+
MEMU_API_KEY="mu_xxx"
21+
SUPERMEMORY_API_KEY="sm_xxx"
22+
MEMOBASE_API_KEY="xxx"
23+
MEMOBASE_PROJECT_URL="http://***.***.***.***:8019"
24+
25+
# eval settings
26+
PRE_SPLIT_CHUNK=false
1827

19-
MEMOBASE_API_KEY="xxxxx"
20-
MEMOBASE_PROJECT_URL="http://xxx.xxx.xxx.xxx:8019"
2128

2229
# Configuration Only For Scheduler
2330
# RabbitMQ Configuration

evaluation/README.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,14 @@ This repository provides tools and scripts for evaluating the LoCoMo dataset usi
2121

2222
2. Copy the `configs-example/` directory to a new directory named `configs/`, and modify the configuration files inside it as needed. This directory contains model and API-specific settings.
2323

24+
## Setup MemOS
25+
```bash
26+
#start server
27+
uvicorn memos.api.server_api:app --host 0.0.0.0 --port 8001 --workers 8
2428

29+
# modify .env file
30+
MEMOS_URL="http://127.0.0.1:8001"
31+
```
2532
## Evaluation Scripts
2633

2734
### LoCoMo Evaluation
@@ -45,10 +52,20 @@ First prepare the dataset `longmemeval_s` from https://huggingface.co/datasets/x
4552
./scripts/run_lme_eval.sh
4653
```
4754

48-
### prefEval Evaluation
55+
### PrefEval Evaluation
56+
To evaluate the **Prefeval** dataset using one of the supported memory frameworks — `memos`, `mem0`, or `zep` — run the following [script](./scripts/run_prefeval_eval.sh):
4957

50-
### personaMem Evaluation
58+
```bash
59+
# Edit the configuration in ./scripts/run_prefeval_eval.sh
60+
# Specify the model and memory backend you want to use (e.g., mem0, zep, etc.)
61+
./scripts/run_prefeval_eval.sh
62+
```
63+
64+
### PersonaMem Evaluation
5165
get `questions_32k.csv` and `shared_contexts_32k.jsonl` from https://huggingface.co/datasets/bowen-upenn/PersonaMem and save them at `data/personamem/`
5266
```bash
67+
# Edit the configuration in ./scripts/run_pm_eval.sh
68+
# Specify the model and memory backend you want to use (e.g., mem0, zep, etc.)
69+
# If you want to use MIRIX, edit the the configuration in ./scripts/personamem/config.yaml
5370
./scripts/run_pm_eval.sh
5471
```

evaluation/scripts/PrefEval/pref_eval.py

Lines changed: 31 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,6 @@
1515
API_KEY = os.getenv("OPENAI_API_KEY")
1616
API_URL = os.getenv("OPENAI_BASE_URL")
1717

18-
INPUT_FILE = "./results/prefeval/pref_memos_process.jsonl"
19-
OUTPUT_FILE = "./results/prefeval/eval_pref_memos.jsonl"
20-
OUTPUT_EXCEL_FILE = "./results/prefeval/eval_pref_memos_summary.xlsx"
21-
2218

2319
async def call_gpt4o_mini_async(client: OpenAI, prompt: str) -> str:
2420
messages = [{"role": "user", "content": prompt}]
@@ -255,9 +251,10 @@ def generate_excel_summary(
255251
avg_search_time: float,
256252
avg_context_tokens: float,
257253
avg_add_time: float,
254+
output_excel_file: str,
258255
model_name: str = "gpt-4o-mini",
259256
):
260-
print(f"Generating Excel summary at {OUTPUT_EXCEL_FILE}...")
257+
print(f"Generating Excel summary at {output_excel_file}...")
261258

262259
def get_pct(key):
263260
return summary_results.get(key, {}).get("percentage", 0)
@@ -282,7 +279,7 @@ def get_pct(key):
282279

283280
df = pd.DataFrame(data)
284281

285-
with pd.ExcelWriter(OUTPUT_EXCEL_FILE, engine="xlsxwriter") as writer:
282+
with pd.ExcelWriter(output_excel_file, engine="xlsxwriter") as writer:
286283
df.to_excel(writer, index=False, sheet_name="Summary")
287284

288285
workbook = writer.book
@@ -300,10 +297,10 @@ def get_pct(key):
300297
bold_pct_format = workbook.add_format({"num_format": "0.0%", "bold": True})
301298
worksheet.set_column("F:F", 18, bold_pct_format)
302299

303-
print(f"Successfully saved summary to {OUTPUT_EXCEL_FILE}")
300+
print(f"Successfully saved summary to {output_excel_file}")
304301

305302

306-
async def main(concurrency_limit: int):
303+
async def main(concurrency_limit: int, input_file: str, output_file: str, output_excel_file: str):
307304
semaphore = asyncio.Semaphore(concurrency_limit)
308305
error_counter = Counter()
309306

@@ -313,17 +310,17 @@ async def main(concurrency_limit: int):
313310
total_add_time = 0
314311

315312
print(f"Starting evaluation with a concurrency limit of {concurrency_limit}...")
316-
print(f"Input file: {INPUT_FILE}")
317-
print(f"Output JSONL: {OUTPUT_FILE}")
318-
print(f"Output Excel: {OUTPUT_EXCEL_FILE}")
313+
print(f"Input file: {input_file}")
314+
print(f"Output JSONL: {output_file}")
315+
print(f"Output Excel: {output_excel_file}")
319316

320317
client = OpenAI(api_key=API_KEY, base_url=API_URL)
321318

322319
try:
323-
with open(INPUT_FILE, "r", encoding="utf-8") as f:
320+
with open(input_file, "r", encoding="utf-8") as f:
324321
lines = f.readlines()
325322
except FileNotFoundError:
326-
print(f"Error: Input file not found at '{INPUT_FILE}'")
323+
print(f"Error: Input file not found at '{input_file}'")
327324
return
328325

329326
if not lines:
@@ -332,7 +329,7 @@ async def main(concurrency_limit: int):
332329

333330
tasks = [process_line(line, client, semaphore) for line in lines]
334331

335-
with open(OUTPUT_FILE, "w", encoding="utf-8") as outfile:
332+
with open(output_file, "w", encoding="utf-8") as outfile:
336333
pbar = tqdm(
337334
asyncio.as_completed(tasks),
338335
total=len(tasks),
@@ -382,13 +379,19 @@ async def main(concurrency_limit: int):
382379
avg_search_time,
383380
avg_context_tokens,
384381
avg_add_time,
382+
output_excel_file,
385383
)
386384
except Exception as e:
387385
print(f"\nFailed to generate Excel file: {e}")
388386

389387

390388
if __name__ == "__main__":
391389
parser = argparse.ArgumentParser(description="Evaluate assistant responses from a JSONL file.")
390+
391+
parser.add_argument(
392+
"--input", type=str, required=True, help="Path to the input JSONL file from pref_memos.py."
393+
)
394+
392395
parser.add_argument(
393396
"--concurrency-limit",
394397
type=int,
@@ -397,4 +400,17 @@ async def main(concurrency_limit: int):
397400
)
398401
args = parser.parse_args()
399402

400-
asyncio.run(main(concurrency_limit=args.concurrency_limit))
403+
input_path = args.input
404+
output_dir = os.path.dirname(input_path)
405+
406+
output_jsonl_path = os.path.join(output_dir, "eval_pref_memos.jsonl")
407+
output_excel_path = os.path.join(output_dir, "eval_pref_memos_summary.xlsx")
408+
409+
asyncio.run(
410+
main(
411+
concurrency_limit=args.concurrency_limit,
412+
input_file=input_path,
413+
output_file=output_jsonl_path,
414+
output_excel_file=output_excel_path,
415+
)
416+
)

0 commit comments

Comments
 (0)