Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
原始max_tokens 被限制在了1024,这对于国内的一些模型会出现问题。
比如我使用智谱的glm 4.6时,会开启思考模式,思考过程也会占用token,当思考的token大于1024时,得到的回复结果会是一个空字符串。
所以提取profile 和 提取memory时就非常的不稳定,有的时候可以提取出来,有的时候则不能。
我去掉max_tokens限制之后,一切都运行的很好。
建议删除max_tokens的限制,应该优先保证使用效果,而不是控制成本(通过这种方式控制成本感觉也不太对)。