-
-
Notifications
You must be signed in to change notification settings - Fork 549
Description
例行检查 / Checklist
- 我已确认目前没有类似 issue (I have checked for similar issues)
- 我已确认我已升级到最新版本 (I have updated to the latest version)
- 我已完整查看过项目 README,尤其是常见问题部分 (I have read the README, especially the FAQ section)
- 我理解并愿意跟进此 issue,协助测试和提供反馈 (I am willing to follow up on this issue, assist with testing, and provide feedback)
- 我理解并认可上述内容,并理解项目维护者精力有限,不遵循规则的 issue 可能会被无视或直接关闭 (I understand and agree to the above, and I understand that the maintainers have limited time, so issues that do not follow the rules may be ignored or closed directly)
问题描述 / Bug Description
Hello. Thank you for your excellent work. I'd like to share a bug/issue I've encountered.
When using LibreChat with the standard Google endpoint, API requests are always in streaming mode. This results in receiving two responses instead of one, causing the model to repeat the same (or nearly the same) message in the chat. It seems the proxy might be sending the same request twice, without waiting for a response from the model when it's in the streamGenerateContent?alt=sse mode (the key part being alt=sse for Server-Side Events).
The proxy doesn't wait for these events and repeats the request, which leads to a duplicated response. I can see two responses concatenated together in the response body (both in the proxy logs and in my chat interface).
It's not possible to disable streaming for the default Google endpoint in LibreChat. However, the repetition disappears when I use the OpenAI-compatible Google API endpoint, configure the proxy to send OpenAI-compatible commands (/chat/completions), and disable streaming in the interface.
None of the available delay settings in the proxy interface solve this problem.
Would it be possible to add delay settings (or another solution) to ensure the proxy waits for the model's response via the SSE trigger? This would prevent it from resending requests or closing the connection prematurely, which seems to cause the LibreChat agents to duplicate the request.
复现步骤 / Steps to Reproduce
- Use LibreChat
- Set up default Google Enpoint
- Add an Agent/Assistant with Google model as the backend
预期结果 / Expected Behavior
The expected result of the fix is that the proxy will not repeat the request before a specified delay has passed. Consequently, the model will have sufficient time to compile and deliver complex responses without answering the same query multiple times.
相关截图 / Screenshots