Skip to content

Commit 27b1cc8

Browse files
authored
feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic (#178)
* fix(chart): update Helm chart helpers and values for improved configuration * feat(SynthesisTaskTab): enhance task table with tooltip support and improved column widths * feat(CreateTask, SynthFileTask): improve task creation and detail view with enhanced payload handling and UI updates * feat(SynthFileTask): enhance file display with progress tracking and delete action * feat(SynthFileTask): enhance file display with progress tracking and delete action * feat(SynthDataDetail): add delete action for chunks with confirmation prompt * feat(SynthDataDetail): update edit and delete buttons to icon-only format * feat(SynthDataDetail): add confirmation modals for chunk and synthesis data deletion * feat(DocumentSplitter): add enhanced document splitting functionality with CJK support and metadata detection * feat(DataSynthesis): refactor data synthesis models and update task handling logic * feat(DataSynthesis): streamline synthesis task handling and enhance chunk processing logic * feat(DataSynthesis): refactor data synthesis models and update task handling logic * fix(generation_service): ensure processed chunks are incremented regardless of question generation success * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(CreateTask): enhance task creation with new synthesis templates and improved configuration options * feat(model_chat): enhance JSON parsing by removing additional thought tags and improving fallback logic
1 parent e0e9b1d commit 27b1cc8

File tree

1 file changed

+26
-3
lines changed

1 file changed

+26
-3
lines changed

runtime/datamate-python/app/module/shared/util/model_chat.py

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,18 +24,41 @@ def extract_json_substring(raw: str) -> str:
2424
- 再从后向前找最后一个 '}' 或 ']' 作为结束;
2525
- 如果找不到合适的边界,就退回原始字符串。
2626
- 部分模型可能会在回复中加入 `<think>...</think>` 内部思考内容,应在解析前先去除。
27+
- 也有模型会在 JSON 前后增加如 <reasoning>...</reasoning>、<analysis>...</analysis> 等标签,本方法会一并去除。
2728
该方法不会保证截取的一定是合法 JSON,但能显著提高 json.loads 的成功率。
2829
"""
2930
if not raw:
3031
return raw
3132

32-
# 先移除所有 <think>...</think> 段落(包括跨多行的情况)
3333
try:
3434
import re
3535

36-
raw = re.sub(r"<think>[\s\S]*?</think>", "", raw, flags=re.IGNORECASE)
36+
# 1. 先把所有完整的思考标签块整体去掉:<think>...</think> 等
37+
thought_tags = [
38+
"think",
39+
"thinking",
40+
"analysis",
41+
"reasoning",
42+
"reflection",
43+
"inner_thoughts",
44+
]
45+
for tag in thought_tags:
46+
pattern = rf"<{tag}>[\s\S]*?</{tag}>"
47+
raw = re.sub(pattern, "", raw, flags=re.IGNORECASE)
48+
49+
# 2. 再做一次“截取最后一个 </think>(或其它思考标签结束)之后的内容”的兜底
50+
# 这样就算标签不成对或嵌套异常,也能保留尾部真正的回答
51+
last_pos = -1
52+
for tag in thought_tags:
53+
# 匹配类似 </think> 或 </THINK>
54+
m = list(re.finditer(rf"</{tag}>", raw, flags=re.IGNORECASE))
55+
if m:
56+
last_pos = max(last_pos, m[-1].end())
57+
if last_pos != -1 and last_pos < len(raw):
58+
raw = raw[last_pos:]
59+
3760
except Exception:
38-
# 正则异常时不影响后续逻辑,继续使用原始文本
61+
# 正则异常时不影响后续逻辑,继续使用当前文本
3962
pass
4063

4164
start = None

0 commit comments

Comments
 (0)