-
Notifications
You must be signed in to change notification settings - Fork 1k
Feature/modelscope support #898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Feature/modelscope support #898
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
app/core/llm/check_llm.py
Outdated
| model=model, | ||
| messages=[ | ||
| {"role": "system", "content": "You are a helpful assistant."}, | ||
| {"role": "user", 'content': 'Just respond with "Hello"!'}, |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent quote style: This line uses single quotes while the corresponding line in the else branch (line 58) uses double quotes. For consistency, both should use the same quote style.
| {"role": "user", 'content': 'Just respond with "Hello"!'}, | |
| {"role": "user", "content": "Just respond with \"Hello\"!"}, |
app/core/llm/check_llm.py
Outdated
| ) | ||
| return True, response.choices[0].message.content | ||
| # Check whether it is the ModelScope platform, as some models require the stream and enable_thinking parameters | ||
| if "modelscope" in base_url: |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ModelScope detection logic relies on a simple substring check of "modelscope" in the base_url. This approach is fragile and could match unintended URLs (e.g., "example.com/modelscope-test"). Consider using a more robust detection mechanism, such as checking against the configured ModelScope base URL from the config, or using the LLMServiceEnum to properly identify the service type.
app/core/llm/check_llm.py
Outdated
| for chunk in response_stream: | ||
| if chunk.choices and chunk.choices[0].delta.content: | ||
| full_answer += chunk.choices[0].delta.content | ||
|
|
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for empty streaming responses. If the streaming response yields no content chunks (full_answer remains empty), this will return an empty string. Consider adding validation to ensure a meaningful response was received, similar to how the non-streaming path validates response.choices[0].message.content.
| if not full_answer.strip(): | |
| return False, "No content received from streaming response." |
app/core/llm/client.py
Outdated
| fake_message = SimpleNamespace(content=full_content) | ||
| fake_choice = SimpleNamespace(message=fake_message) | ||
| response = SimpleNamespace(choices=[fake_choice]) |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SimpleNamespace approach creates a fake response object that may not fully conform to the OpenAI API response structure. This could break if the validation logic or downstream code expects additional attributes or methods from the actual ChatCompletion object. Consider using the official OpenAI response classes or ensuring all necessary attributes are present on the SimpleNamespace objects.
| fake_message = SimpleNamespace(content=full_content) | |
| fake_choice = SimpleNamespace(message=fake_message) | |
| response = SimpleNamespace(choices=[fake_choice]) | |
| import time | |
| fake_message = SimpleNamespace(content=full_content) | |
| fake_choice = SimpleNamespace(message=fake_message, finish_reason="stop", index=0) | |
| response = SimpleNamespace( | |
| id="chatcmpl-fake-modelscope", | |
| object="chat.completion", | |
| created=int(time.time()), | |
| model=model, | |
| choices=[fake_choice], | |
| ) |
app/core/llm/client.py
Outdated
| for chunk in response_stream: | ||
| if chunk.choices and chunk.choices[0].delta.content: | ||
| full_content += chunk.choices[0].delta.content | ||
|
|
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for empty streaming responses. If the streaming response yields no content chunks (full_content remains empty), the validation at lines 171-179 may fail or pass with an empty string. Consider adding explicit validation after the streaming loop to ensure meaningful content was received.
| if not full_content: | |
| raise ValueError("Streaming response yielded no content") |
app/core/llm/check_llm.py
Outdated
| extra_body = {"enable_thinking": True} | ||
| response_stream = openai.OpenAI( | ||
| base_url=base_url, api_key=api_key, timeout=60 | ||
| ).chat.completions.create( | ||
| model=model, | ||
| messages=[ | ||
| {"role": "system", "content": "You are a helpful assistant."}, | ||
| {"role": "user", 'content': 'Just respond with "Hello"!'}, | ||
| ], | ||
| stream=True, | ||
| extra_body=extra_body, | ||
| timeout=30, | ||
| ) | ||
|
|
||
| full_answer = "" | ||
| for chunk in response_stream: | ||
| if chunk.choices and chunk.choices[0].delta.content: | ||
| full_answer += chunk.choices[0].delta.content | ||
|
|
||
| return True, full_answer.strip() |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "some models require the stream and enable_thinking parameters", but the code unconditionally applies these settings to all ModelScope API calls. If only certain models require these parameters, this could cause errors or unexpected behavior for models that don't support them. Consider making this behavior configurable per model or checking model compatibility before applying these parameters.
| extra_body = {"enable_thinking": True} | |
| response_stream = openai.OpenAI( | |
| base_url=base_url, api_key=api_key, timeout=60 | |
| ).chat.completions.create( | |
| model=model, | |
| messages=[ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", 'content': 'Just respond with "Hello"!'}, | |
| ], | |
| stream=True, | |
| extra_body=extra_body, | |
| timeout=30, | |
| ) | |
| full_answer = "" | |
| for chunk in response_stream: | |
| if chunk.choices and chunk.choices[0].delta.content: | |
| full_answer += chunk.choices[0].delta.content | |
| return True, full_answer.strip() | |
| # List of ModelScope models that require stream and enable_thinking | |
| models_require_thinking = [ | |
| "qwen-72b-chat", "qwen-7b-chat", "qwen-14b-chat", "qwen-1.8b-chat" | |
| # Add other models as needed | |
| ] | |
| if model in models_require_thinking: | |
| extra_body = {"enable_thinking": True} | |
| response_stream = openai.OpenAI( | |
| base_url=base_url, api_key=api_key, timeout=60 | |
| ).chat.completions.create( | |
| model=model, | |
| messages=[ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", 'content': 'Just respond with "Hello"!'}, | |
| ], | |
| stream=True, | |
| extra_body=extra_body, | |
| timeout=30, | |
| ) | |
| full_answer = "" | |
| for chunk in response_stream: | |
| if chunk.choices and chunk.choices[0].delta.content: | |
| full_answer += chunk.choices[0].delta.content | |
| return True, full_answer.strip() | |
| else: | |
| response = openai.OpenAI( | |
| base_url=base_url, api_key=api_key, timeout=60 | |
| ).chat.completions.create( | |
| model=model, | |
| messages=[ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", 'content': 'Just respond with "Hello"!'}, | |
| ], | |
| timeout=30, | |
| ) | |
| return True, response.choices[0].message.content |
app/core/llm/client.py
Outdated
| """ | ||
| client = get_llm_client() | ||
| # Check whether it is the ModelScope platform, as some models require the stream and enable_thinking parameters | ||
| if "modelscope" in str(client.base_url): |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ModelScope detection logic relies on a simple substring check of "modelscope" in the base_url. This approach is fragile and could match unintended URLs (e.g., "example.com/modelscope-test"). Consider using a more robust detection mechanism, such as checking against the configured ModelScope base URL from the config, or using the LLMServiceEnum to properly identify the service type.
app/core/llm/client.py
Outdated
| if "modelscope" in str(client.base_url): | ||
| logger.info("Detected ModelScope API, using stream mode with enable_thinking=True.") | ||
|
|
||
| extra_body = {"enable_thinking": True} | ||
|
|
||
| response_stream = client.chat.completions.create( | ||
| model=model, | ||
| messages=messages, # pyright: ignore[reportArgumentType] | ||
| temperature=temperature, | ||
| stream=True, | ||
| extra_body=extra_body, | ||
| **kwargs, | ||
| ) | ||
|
|
||
| response = client.chat.completions.create( | ||
| model=model, | ||
| messages=messages, # pyright: ignore[reportArgumentType] | ||
| temperature=temperature, | ||
| **kwargs, | ||
| ) | ||
| full_content = "" | ||
| for chunk in response_stream: | ||
| if chunk.choices and chunk.choices[0].delta.content: | ||
| full_content += chunk.choices[0].delta.content | ||
|
|
||
| fake_message = SimpleNamespace(content=full_content) | ||
| fake_choice = SimpleNamespace(message=fake_message) | ||
| response = SimpleNamespace(choices=[fake_choice]) |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is significant code duplication between this ModelScope-specific logic and the similar implementation in check_llm.py (lines 30-50). Consider extracting this into a shared helper function to improve maintainability and ensure consistent behavior across both locations.
app/core/llm/client.py
Outdated
| ValueError: If response is invalid (empty choices or content) | ||
| """ | ||
| client = get_llm_client() | ||
| # Check whether it is the ModelScope platform, as some models require the stream and enable_thinking parameters |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect indentation: This comment has an extra leading space. It should be aligned with the code at the same indentation level (4 spaces, not 5).
| # Check whether it is the ModelScope platform, as some models require the stream and enable_thinking parameters | |
| # Check whether it is the ModelScope platform, as some models require the stream and enable_thinking parameters |
app/core/llm/client.py
Outdated
| # Check whether it is the ModelScope platform, as some models require the stream and enable_thinking parameters | ||
| if "modelscope" in str(client.base_url): | ||
| logger.info("Detected ModelScope API, using stream mode with enable_thinking=True.") | ||
|
|
||
| extra_body = {"enable_thinking": True} |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "some models require the stream and enable_thinking parameters", but the code unconditionally applies these settings to all ModelScope API calls. If only certain models require these parameters, this could cause errors or unexpected behavior for models that don't support them. Consider making this behavior configurable per model or checking model compatibility before applying these parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
README.md
Outdated
| | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | SiliconCloud | [SiliconCloud 官网](https://cloud.siliconflow.cn/i/onCHcaDx)配置方法请参考[配置文档](./docs/llm_config.md)<br>该并发较低,建议把线程设置为5以下。 | | ||
| | DeepSeek | [DeepSeek 官网](https://platform.deepseek.com),建议使用 `deepseek-v3` 模型,<br>官方网站最近服务好像并不太稳定。 | | ||
| | ModelScope | [ModelScope 官网](https://modelscope.cn/models?filter=inference_type&page=1&tabKey=task)配置方法请参考[配置文档](https://modelscope.cn/docs/model-service/API-Inference/intro)<br>该并发较低,建议把线程设置为5以下。 | |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Inconsistent spacing in the table. "ModelScope" has extra spaces before the pipe character compared to other rows. The cell should be formatted consistently:
Change:
| ModelScope | [ModelScope 官网]...To match the pattern of other rows like:
| ModelScope | [ModelScope 官网]...This ensures proper alignment when the markdown is rendered.
| | ModelScope | [ModelScope 官网](https://modelscope.cn/models?filter=inference_type&page=1&tabKey=task)配置方法请参考[配置文档](https://modelscope.cn/docs/model-service/API-Inference/intro)<br>该并发较低,建议把线程设置为5以下。 | | |
| | ModelScope | [ModelScope 官网](https://modelscope.cn/models?filter=inference_type&page=1&tabKey=task)配置方法请参考[配置文档](https://modelscope.cn/docs/model-service/API-Inference/intro)<br>该并发较低,建议把线程设置为5以下。 | |
app/core/llm/client.py
Outdated
|
|
||
| from app.core.utils.cache import get_llm_cache, memoize | ||
| from app.core.utils.logger import setup_logger | ||
| from types import SimpleNamespace |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import statement is not in the standard position. According to PEP 8, standard library imports should be grouped together at the top of the file. The from types import SimpleNamespace import should be moved up to line 5-6, grouped with other standard library imports like os and threading, rather than being placed after third-party imports.
| "api_base_cfg": cfg.modelscope_api_base, | ||
| "model_cfg": cfg.modelscope_model, | ||
| "default_base": "https://api-inference.modelscope.cn/v1", | ||
| "default_models": ["Qwen/Qwen3-8B", "Qwen/Qwen3-30B-A3B-Instruct-2507", "deepseek-ai/DeepSeek-V3.1"], |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model name "Qwen/Qwen3-30B-A3B-Instruct-2507" appears to be non-standard. Based on Qwen's naming conventions, this should likely be verified:
- Standard Qwen3 models follow patterns like "Qwen/Qwen2.5-72B-Instruct" or "Qwen/Qwen3-8B"
- The "A3B" designation is unusual and may be a typo or placeholder
- The "-2507" suffix (likely representing July 2025) seems far in the future
Please verify this model name exists in ModelScope's model registry, or update to a valid model name like "Qwen/Qwen2.5-32B-Instruct" if this was intended.
| "default_models": ["Qwen/Qwen3-8B", "Qwen/Qwen3-30B-A3B-Instruct-2507", "deepseek-ai/DeepSeek-V3.1"], | |
| "default_models": ["Qwen/Qwen3-8B", "Qwen/Qwen2.5-32B-Instruct", "deepseek-ai/DeepSeek-V3.1"], |
| "api_base_cfg": cfg.modelscope_api_base, | ||
| "model_cfg": cfg.modelscope_model, | ||
| "default_base": "https://api-inference.modelscope.cn/v1", | ||
| "default_models": ["Qwen/Qwen3-8B", "Qwen/Qwen3-30B-A3B-Instruct-2507", "deepseek-ai/DeepSeek-V3.1"], |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model name "deepseek-ai/DeepSeek-V3.1" may be incorrect. Based on DeepSeek's release history and the documentation in this repository (which references "deepseek-ai/DeepSeek-V3"), the current latest version is V3, not V3.1.
Please verify this model exists in ModelScope's registry. If it doesn't exist, consider using "deepseek-ai/DeepSeek-V3" instead, which is the verified model name used in the documentation.
| "default_models": ["Qwen/Qwen3-8B", "Qwen/Qwen3-30B-A3B-Instruct-2507", "deepseek-ai/DeepSeek-V3.1"], | |
| "default_models": ["Qwen/Qwen3-8B", "Qwen/Qwen3-30B-A3B-Instruct-2507", "deepseek-ai/DeepSeek-V3"], |
app/core/llm/check_llm.py
Outdated
| ) | ||
| return True, response.choices[0].message.content | ||
| # Check whether it is the ModelScope platform, as some models require the stream and enable_thinking parameter | ||
| if "modelscope" in base_url: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要在这个函数里面这样做特化判断, 很不优雅。而且这个只是检查的函数,检查通过就好了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要在这个函数里面这样做特化判断, 很不优雅。而且这个只是检查的函数,检查通过就好了。
好的,那我改一下把这一块的修改删掉。
app/core/llm/client.py
Outdated
|
|
||
| extra_body = {"enable_thinking": True} | ||
|
|
||
| response_stream = client.chat.completions.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么要用流式?他肯定也支持非流式的吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么要用流式?他肯定也支持非流式的吧?
是的,我测试下来,好像只有QwQ-32B模型是强制使用流式的,然后其他的一些模型在在非流式响应时,会强制enable_thinling参数使用False。如果在config里加一个extra_body设置选项的话,我想可以放弃这个QwQ-32B模型,这样修改好像也比较简单美观?这样可以接受吗?😊
app/core/llm/client.py
Outdated
| if "modelscope" in str(client.base_url): | ||
| logger.info("Detected ModelScope API, using stream mode with enable_thinking=True.") | ||
|
|
||
| extra_body = {"enable_thinking": True} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不建议在这个函数里面通过特化判断的方式,建议 config 搞一个extra_body 的设置选项,然后传入extra_body, 然后**kwargs就会接收到。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不建议在这个函数里面通过特化判断的方式,建议 config 搞一个extra_body 的设置选项,然后传入extra_body, 然后**kwargs就会接收到。
hello,我改了一下extra_body设置,麻烦有时间看一下🙏,有问题随时联系我。
This PR primarily adds support for the ModelScope API. Users can now utilize models supported by ModelScope API inference by setting an API key.
How to obtain an API key
Note: Please bind your Alibaba Cloud account before using this feature.