Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
c42770e
fix n4j cypher query
Nyakult Jul 9, 2025
36a06cb
feat: add llm extra body
Nyakult Jul 9, 2025
0f6095f
feat: update memory extraction prompt and result parser
Nyakult Jul 9, 2025
66c510a
fix: evaluation locomo search
Nyakult Jul 9, 2025
6eedff2
ci: fix format and update test
Nyakult Jul 9, 2025
327c7af
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 9, 2025
4f5cfd1
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 9, 2025
e379c57
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 9, 2025
a0abdd7
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 10, 2025
e5afc13
feat: update result json parser
Nyakult Jul 11, 2025
1869ae4
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 11, 2025
9091d84
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 11, 2025
6932f8b
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 14, 2025
2cea7da
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 15, 2025
cfb0d39
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 16, 2025
e664a2e
feat: recursively cluster nodes to max_cluster_size
Nyakult Jul 16, 2025
5238c8d
fix: fix template
Nyakult Jul 16, 2025
3cbf46d
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 17, 2025
e1ce945
feat: keep default min-group-size 3
Nyakult Jul 17, 2025
818d115
feat: keep default min-group-size 3
Nyakult Jul 17, 2025
1703197
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 17, 2025
2edf155
Merge remote-tracking branch 'upstream/dev' into dev
Nyakult Jul 18, 2025
010ea92
feat: update doc mem reader
Nyakult Jul 18, 2025
a3bc126
test: fix test
Nyakult Jul 18, 2025
79a58a8
Merge branch 'dev' into dev
CaralHsi Jul 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions src/memos/mem_reader/simple_struct.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,15 +208,15 @@ def _process_doc_data(self, scene_data_info, info):
for i, chunk_res in enumerate(processed_chunks):
if chunk_res:
node_i = TextualMemoryItem(
memory=chunk_res["summary"],
memory=chunk_res["value"],
metadata=TreeNodeTextualMemoryMetadata(
user_id=info.get("user_id"),
session_id=info.get("session_id"),
memory_type="LongTermMemory",
status="activated",
tags=chunk_res["tags"],
key="",
embedding=self.embedder.embed([chunk_res["summary"]])[0],
key=chunk_res["key"],
embedding=self.embedder.embed([chunk_res["value"]])[0],
usage=[],
sources=[f"{scene_data_info['file']}_{i}"],
background="",
Expand Down
78 changes: 38 additions & 40 deletions src/memos/templates/mem_reader_prompts.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
SIMPLE_STRUCT_MEM_READER_PROMPT = """You are a memory extraction expert.
Always respond in the same language as the conversation. If the conversation is in Chinese, respond in Chinese.

Your task is to extract memories from the perspective of ${user_a}, based on a conversation between ${user_a} and ${user_b}. This means identifying what ${user_a} would plausibly remember — including their own experiences, thoughts, plans, or relevant statements and actions made by others (such as ${user_b}) that impacted or were acknowledged by ${user_a}.

Your task is to extract memories from the perspective of user, based on a conversation between user and assistant. This means identifying what user would plausibly remember — including their own experiences, thoughts, plans, or relevant statements and actions made by others (such as assistant) that impacted or were acknowledged by user.
Please perform:
1. Identify information that reflects user's experiences, beliefs, concerns, decisions, plans, or reactions — including meaningful input from assistant that user acknowledged or responded to.
2. Resolve all time, person, and event references clearly:
Expand All @@ -27,20 +24,16 @@
{
"key": <string, a unique, concise memory title>,
"memory_type": <string, Either "LongTermMemory" or "UserMemory">,
"value": <A detailed, self-contained, and unambiguous memory statement
— written in English if the input conversation is in English,
or in Chinese if the conversation is in Chinese, or any language which
align with the conversation language>,
"value": <A detailed, self-contained, and unambiguous memory statement — written in English if the input conversation is in English, or in Chinese if the conversation is in Chinese>,
"tags": <A list of relevant thematic keywords (e.g., ["deadline", "team", "planning"])>
},
...
],
"summary": <a natural paragraph summarizing the above memories from user's
perspective, 120–200 words, **same language** as the input>
"summary": <a natural paragraph summarizing the above memories from user's perspective, 120–200 words, same language as the input>
}

Language rules:
- The `key`, `value`, `tags`, `summary` fields must match the language of the input conversation.
- The `key`, `value`, `tags`, `summary` fields must match the mostly used language of the input conversation. **如果输入是中文,请输出中文**
- Keep `memory_type` in English.

Example:
Expand Down Expand Up @@ -92,37 +85,42 @@

Your Output:"""

SIMPLE_STRUCT_DOC_READER_PROMPT = """
**ABSOLUTE, NON-NEGOTIABLE, CRITICAL RULE: The language of your entire JSON output's string values (specifically `summary` and `tags`) MUST be identical to the language of the input `[DOCUMENT_CHUNK]`. There are absolutely no exceptions. Do not translate. If the input is Chinese, the output must be Chinese. If English, the output must be English. Any deviation from this rule constitutes a failure to follow instructions.**

You are an expert text analyst for a search and retrieval system. Your task is to process a document chunk and generate a single, structured JSON object.
Written in English if the input conversation is in English, or in Chinese if
the conversation is in Chinese, or any language which align with the
conversation language. 如果输入语言是中文,请务必输出中文。

The input is a single piece of text: `[DOCUMENT_CHUNK]`.
You must generate a single JSON object with two top-level keys: `summary` and `tags`.
Written in English if the input conversation is in English, or in Chinese if
the conversation is in Chinese, or any language which align with the conversation language.

1. `summary`:
- A dense, searchable summary of the ENTIRE `[DOCUMENT_CHUNK]`.
- The purpose is for semantic search embedding.
- A clear and accurate sentence that comprehensively summarizes the main points, arguments, and information within the `[DOCUMENT_CHUNK]`.
- The goal is to create a standalone overview that allows a reader to fully understand the essence of the chunk without reading the original text.
- The summary should be **no more than 50 words**.
2. `tags`:
- A concise list of **3 to 5 high-level, summative tags**.
- **Each tag itself should be a short phrase, ideally 2 to 4 words long.**
- These tags must represent the core abstract themes of the text, suitable for broad categorization.
- **Crucially, prioritize abstract concepts** over specific entities or phrases mentioned in the text. For example, prefer "Supply Chain Resilience" over "Reshoring Strategies".

Here is the document chunk to process:
`[DOCUMENT_CHUNK]`
SIMPLE_STRUCT_DOC_READER_PROMPT = """You are an expert text analyst for a search and retrieval system.
Your task is to process a document chunk and generate a single, structured JSON object.

Please perform:
1. Identify key information that reflects factual content, insights, decisions, or implications from the documents — including any notable themes, conclusions, or data points. Allow a reader to fully understand the essence of the chunk without reading the original text.
2. Resolve all time, person, location, and event references clearly:
- Convert relative time expressions (e.g., “last year,” “next quarter”) into absolute dates if context allows.
- Clearly distinguish between event time and document time.
- If uncertainty exists, state it explicitly (e.g., “around 2024,” “exact date unclear”).
- Include specific locations if mentioned.
- Resolve all pronouns, aliases, and ambiguous references into full names or identities.
- Disambiguate entities with the same name if applicable.
3. Always write from a third-person perspective, referring to the subject or content clearly rather than using first-person ("I", "me", "my").
4. Do not omit any information that is likely to be important or memorable from the document summaries.
- Include all key facts, insights, emotional tones, and plans — even if they seem minor.
- Prioritize completeness and fidelity over conciseness.
- Do not generalize or skip details that could be contextually meaningful.

Return a single valid JSON object with the following structure:

Return valid JSON:
{
"key": <string, a concise title of the `value` field>,
"memory_type": "LongTermMemory",
"value": <A clear and accurate paragraph that comprehensively summarizes the main points, arguments, and information within the document chunk — written in English if the input memory items are in English, or in Chinese if the input is in Chinese>,
"tags": <A list of relevant thematic keywords (e.g., ["deadline", "team", "planning"])>
}

Language rules:
- The `key`, `value`, `tags`, `summary` fields must match the mostly used language of the input document summaries. **如果输入是中文,请输出中文**
- Keep `memory_type` in English.

Document chunk:
{chunk_text}

Produce ONLY the JSON object as your response.
"""
Your Output:"""

SIMPLE_STRUCT_MEM_READER_EXAMPLE = """Example:
Conversation:
Expand Down
38 changes: 38 additions & 0 deletions src/memos/templates/tree_reorganize_prompts.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,44 @@

"""

DOC_REORGANIZE_PROMPT = """You are a document summarization and knowledge extraction expert.

Given the following summarized document items:

{memory_items_text}

Please perform:
1. Identify key information that reflects factual content, insights, decisions, or implications from the documents — including any notable themes, conclusions, or data points.
2. Resolve all time, person, location, and event references clearly:
- Convert relative time expressions (e.g., “last year,” “next quarter”) into absolute dates if context allows.
- Clearly distinguish between event time and document time.
- If uncertainty exists, state it explicitly (e.g., “around 2024,” “exact date unclear”).
- Include specific locations if mentioned.
- Resolve all pronouns, aliases, and ambiguous references into full names or identities.
- Disambiguate entities with the same name if applicable.
3. Always write from a third-person perspective, referring to the subject or content clearly rather than using first-person ("I", "me", "my").
4. Do not omit any information that is likely to be important or memorable from the document summaries.
- Include all key facts, insights, emotional tones, and plans — even if they seem minor.
- Prioritize completeness and fidelity over conciseness.
- Do not generalize or skip details that could be contextually meaningful.
5. Summarize all document summaries into one integrated memory item.

Language rules:
- The `key`, `value`, `tags`, `summary` fields must match the mostly used language of the input document summaries. **如果输入是中文,请输出中文**
- Keep `memory_type` in English.

Return valid JSON:
{
"key": <string, a concise title of the `value` field>,
"memory_type": "LongTermMemory",
"value": <A detailed, self-contained, and unambiguous memory statement, only contain detailed, unaltered information extracted and consolidated from the input `value` fields, do not include summary content — written in English if the input memory items are in English, or in Chinese if the input is in Chinese>,
"tags": <A list of relevant thematic keywords (e.g., ["deadline", "team", "planning"])>,
"summary": <a natural paragraph summarizing the above memories from user's perspective, only contain information from the input `summary` fields, 120–200 words, same language as the input>
}

"""


LOCAL_SUBCLUSTER_PROMPT = """You are a memory organization expert.

You are given a cluster of memory items, each with an ID and content.
Expand Down
4 changes: 3 additions & 1 deletion tests/mem_reader/test_simple_structure.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,9 @@ def test_process_doc_data(self):
info = {"user_id": "user1", "session_id": "session1"}

# Mock LLM response
mock_response = '{"summary": "A sample document about testing.", "tags": ["document"]}'
mock_response = (
'{"value": "A sample document about testing.", "tags": ["document"], "key": "title"}'
)
self.reader.llm.generate.return_value = mock_response
self.reader.chunker.chunk.return_value = [
Chunk(text="Parsed document text", token_count=3, sentences=["Parsed document text"])
Expand Down