-
Notifications
You must be signed in to change notification settings - Fork 483
Feat/refractor deepresearch #858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
alcholiclg
merged 12 commits into
modelscope:main
from
alcholiclg:feat/refractor_deepresearch
Feb 5, 2026
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
6e8250d
deepresearch v2.0.0 - testing on deep research bench
33cda15
support for memory compression but unused
ce553af
prevent exa from returning parsed content
a8e68e1
enhance stability and efficiency; fix bug in quitting thread; add eva…
9215c91
Merge branch 'main' of https://github.com/modelscope/ms-agent into fe…
32697e3
add readme
7e16dde
Merge branch 'main' of https://github.com/modelscope/ms-agent into fe…
58235fd
Merge branch 'main' of https://github.com/modelscope/ms-agent into fe…
c876da4
add webui for deep research v2 and refine deep research v2
c0f96d7
Merge branch 'main' of https://github.com/modelscope/ms-agent into fe…
57f9442
fix cache_indice
00c2e90
Merge branch 'main' of https://github.com/modelscope/ms-agent into fe…
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -160,3 +160,6 @@ neo4j.lock | |
|
|
||
| **/temp_workspace/ | ||
| ms_agent/app/temp_workspace/ | ||
|
|
||
| # webui | ||
| webui/work_dir/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,6 +30,9 @@ class OpenAI(LLM): | |
| 'role', 'content', 'tool_calls', 'partial', 'prefix', 'tool_call_id' | ||
| } | ||
|
|
||
| # Providers that support cache_control in structured content blocks | ||
| CACHE_CONTROL_PROVIDERS = ['dashscope', 'anthropic'] | ||
|
|
||
| def __init__( | ||
| self, | ||
| config: DictConfig, | ||
|
|
@@ -51,9 +54,85 @@ def __init__( | |
| api_key=api_key, | ||
| base_url=base_url, | ||
| ) | ||
| self.base_url = base_url or '' | ||
| self.args: Dict = OmegaConf.to_container( | ||
| getattr(config, 'generation_config', DictConfig({}))) | ||
|
|
||
| # Prefix cache configuration | ||
| # - force_prefix_cache: enable structured content with cache_control for explicit caching | ||
| # - prefix_cache_roles: which messages to cache (only these are converted to structured format) | ||
| # Supports: | ||
| # - Role names: 'system', 'user', 'assistant', 'tool' | ||
| # - Special values: 'last_message' (only cache the last message in the list) | ||
| # Default: ['system'] - system prompt is usually the longest stable prefix | ||
| self._prefix_cache_enabled = self.args.get('force_prefix_cache', False) | ||
| self._prefix_cache_roles = set( | ||
| self.args.get('prefix_cache_roles', ['system'])) | ||
| self._prefix_cache_provider = self._detect_cache_provider() | ||
|
|
||
| def _detect_cache_provider(self) -> Optional[str]: | ||
| """ | ||
| Detect which provider-specific cache_control format to use based on base_url. | ||
|
|
||
| Returns: | ||
| Provider name (e.g. 'dashscope', 'anthropic') or None for native OpenAI | ||
| (which uses automatic prefix caching without explicit cache_control). | ||
| """ | ||
| if not self._prefix_cache_enabled: | ||
| return None | ||
| base_url_lower = self.base_url.lower() | ||
| for provider in self.CACHE_CONTROL_PROVIDERS: | ||
| if provider in base_url_lower: | ||
| return provider | ||
| # Native OpenAI: automatic prefix caching, no need for cache_control | ||
| return None | ||
|
|
||
| @staticmethod | ||
| def _to_structured_content( | ||
| content: Any, | ||
| add_cache_control: bool = False, | ||
| provider: Optional[str] = None, | ||
| ) -> Any: | ||
| """ | ||
| Convert message content to structured content blocks for prefix caching. | ||
|
|
||
| This method is idempotent: already-structured content is returned as-is | ||
| (with optional cache_control addition for dashscope/anthropic). | ||
|
|
||
| Args: | ||
| content: Original content (str or list) | ||
| add_cache_control: Whether to add cache_control to text blocks | ||
|
|
||
| Returns: | ||
| Structured content list or original content if not applicable | ||
| """ | ||
| if not add_cache_control: | ||
| return content | ||
|
|
||
| # Case 1: plain string -> wrap in structured block | ||
| if isinstance(content, str): | ||
| block: Dict[str, Any] = {'type': 'text', 'text': content} | ||
| if provider in {'dashscope', 'anthropic'}: | ||
| block['cache_control'] = {'type': 'ephemeral'} | ||
| return [block] | ||
|
|
||
| # Case 2: already a list (multimodal or pre-structured) | ||
| if isinstance(content, list): | ||
| # Add cache_control to text blocks that don't have it | ||
| new_list = [] | ||
| for item in content: | ||
| if (isinstance(item, dict) and item.get('type') == 'text' | ||
| and 'cache_control' not in item): | ||
| new_item = dict(item) | ||
| new_item['cache_control'] = {'type': 'ephemeral'} | ||
| new_list.append(new_item) | ||
| else: | ||
| new_list.append(item) | ||
| return new_list | ||
|
|
||
| # Other types: return as-is | ||
| return content | ||
|
|
||
| def format_tools(self, | ||
| tools: Optional[List[Tool]] = None | ||
| ) -> List[Dict[str, Any]]: | ||
|
|
@@ -143,6 +222,34 @@ def _call_llm(self, | |
| return self.client.chat.completions.create( | ||
| model=self.model, messages=messages, tools=tools, **kwargs) | ||
|
|
||
| @staticmethod | ||
| def _extract_cache_info(usage_obj: Any) -> tuple: | ||
| """ | ||
| Extract cache info from an OpenAI-compatible usage object. | ||
|
|
||
| Returns: | ||
| tuple: (cached_tokens, cache_creation_input_tokens) | ||
| - cached_tokens: tokens that hit existing cache | ||
| - cache_creation_input_tokens: tokens used to create new cache (explicit cache only) | ||
|
|
||
| OpenAI/DashScope format: usage.prompt_tokens_details.{cached_tokens, cache_creation_input_tokens} | ||
| """ | ||
| if not usage_obj: | ||
| return 0, 0 | ||
| details = getattr(usage_obj, 'prompt_tokens_details', None) | ||
| if details is None and isinstance(usage_obj, dict): | ||
| details = usage_obj.get('prompt_tokens_details') | ||
| if details is None: | ||
| return 0, 0 | ||
| if isinstance(details, dict): | ||
| cached = int(details.get('cached_tokens', 0) or 0) | ||
| created = int(details.get('cache_creation_input_tokens', 0) or 0) | ||
| else: | ||
| cached = int(getattr(details, 'cached_tokens', 0) or 0) | ||
| created = int( | ||
| getattr(details, 'cache_creation_input_tokens', 0) or 0) | ||
| return cached, created | ||
|
|
||
| def _merge_stream_message(self, pre_message_chunk: Optional[Message], | ||
| message_chunk: Message) -> Optional[Message]: | ||
| """Merges a new chunk of message into the previous chunks during streaming. | ||
|
|
@@ -227,6 +334,10 @@ def _stream_continue_generate(self, | |
| try: | ||
| next_chunk = next(completion) | ||
| message.prompt_tokens += next_chunk.usage.prompt_tokens | ||
| cached, created = self._extract_cache_info( | ||
| getattr(next_chunk, 'usage', None)) | ||
| message.cached_tokens += cached | ||
| message.cache_creation_input_tokens += created | ||
| message.completion_tokens += next_chunk.usage.completion_tokens | ||
| except (StopIteration, AttributeError): | ||
| # The stream may end without a final usage chunk, which is acceptable. | ||
|
|
@@ -323,13 +434,17 @@ def _format_output_message(completion) -> Message: | |
| tool_name=tool_call.function.name) for idx, tool_call in | ||
| enumerate(completion.choices[0].message.tool_calls) | ||
| ] | ||
| cached, created = OpenAI._extract_cache_info( | ||
| getattr(completion, 'usage', None)) | ||
| return Message( | ||
| role='assistant', | ||
| content=content, | ||
| reasoning_content=reasoning_content, | ||
| tool_calls=tool_calls, | ||
| id=completion.id, | ||
| prompt_tokens=completion.usage.prompt_tokens, | ||
| cached_tokens=cached, | ||
| cache_creation_input_tokens=created, | ||
| completion_tokens=completion.usage.completion_tokens) | ||
|
|
||
| @staticmethod | ||
|
|
@@ -343,6 +458,9 @@ def _merge_partial_message(messages: List[Message], new_message: Message): | |
| messages[-1].reasoning_content += new_message.reasoning_content | ||
| messages[-1].content += new_message.content | ||
| messages[-1].prompt_tokens += new_message.prompt_tokens | ||
| messages[-1].cached_tokens += new_message.cached_tokens | ||
| messages[ | ||
| -1].cache_creation_input_tokens += new_message.cache_creation_input_tokens | ||
| messages[-1].completion_tokens += new_message.completion_tokens | ||
| if new_message.tool_calls: | ||
| if messages[-1].tool_calls: | ||
|
|
@@ -432,12 +550,44 @@ def _format_input_message(self, | |
| Returns: | ||
| List[Dict[str, Any]]: List of dictionaries compatible with OpenAI's input format. | ||
| """ | ||
| # Determine if we need to add cache_control (for dashscope/anthropic) | ||
| add_cache_control = self._prefix_cache_provider is not None | ||
|
|
||
| # Determine which message index should have cache_control (the last matching one) | ||
| cache_indice = None | ||
| if self._prefix_cache_enabled and add_cache_control: | ||
| cache_indices = set() | ||
| # Check for 'last_message' special value | ||
| if 'last_message' in self._prefix_cache_roles and messages: | ||
| cache_indices.add(len(messages) - 1) | ||
| # Check for role-based caching | ||
| role_cache = self._prefix_cache_roles - {'last_message'} | ||
| for idx, msg in enumerate(messages): | ||
| msg_role = msg.role if isinstance(msg, Message) else msg.get( | ||
| 'role', '') | ||
| if msg_role in role_cache: | ||
| cache_indices.add(idx) | ||
| cache_indice = max(cache_indices) if cache_indices else None | ||
|
|
||
| openai_messages = [] | ||
| for message in messages: | ||
| for idx, message in enumerate(messages): | ||
| if isinstance(message, Message): | ||
| if isinstance(message.content, str): | ||
| message.content = message.content.strip() | ||
| message = message.to_dict_clean() | ||
| else: | ||
| message = dict(message) | ||
|
|
||
| content = message.get('content', '') | ||
| if isinstance(content, str): | ||
| content = content.strip() | ||
|
|
||
| # Apply prefix cache structured content transformation | ||
| if cache_indice is not None and idx == cache_indice: | ||
| content = self._to_structured_content( | ||
| content, | ||
| add_cache_control=True, | ||
| provider=self._prefix_cache_provider) | ||
|
|
||
| message = { | ||
| key: value.strip() if isinstance(value, str) else value | ||
|
|
@@ -446,6 +596,7 @@ def _format_input_message(self, | |
| } | ||
| if 'content' not in message: | ||
| message['content'] = '' | ||
| message['content'] = content if content else '' | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| openai_messages.append(message) | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for
reasoning_outputin the code is'stdout', but the docstring states that the default is"stderr". This discrepancy should be corrected to ensure the code and documentation are consistent. The docstring's suggestion ofstderras the default seems more sensible to keep the main output clean.