-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Adding ThinkingBlock to Ollama and Bedrock Converse #19936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ok this should be now good to go, I tested e2e multiple times both with Ollama and Bedrock Converse and both are good with thinking :)) |
additional_kwargs={ | ||
"tool_calls": all_tool_calls, | ||
"thinking": thinking_txt, | ||
"tool_calls": list(set(all_tool_calls)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a revert of the fix made around day back: 8079db7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmh maybe the merge did this wrong, but technically we are addressing the tool_calls kwargs in this PR: #19947
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically tool_calls won't be needed anymore :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will still break though, we should probably go back to all_tool_calls to avoid the error in the PR that tyson linked
Looks like some merge conflicts got created (sorry, that might be my fault lol) |
if content_block_delta := chunk.get("contentBlockDelta"): | ||
content_delta = content_block_delta["delta"] | ||
content = join_two_dicts(content, content_delta) | ||
thinking = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it only returning the full thinking/reasoning text? Or is it streaming?
If its streaming, setting thinking = ""
means we are removing the complete thinking string right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to update the tests to check if the thinking is over, like, 50 chars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I tested the streaming behavior, this basically means that we have separate thinking chunks for each streamed response, instead of an incrementally growing response. So, rather than this:
The quick brown fox j
The quick brown fox jumps over th
The quick brown fox jumps over the lazy dog
We would have this:
The quick brown fox
jumps over the
lazy dog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the test for the streaming of thinking blocks does not check for the character length, it checks for the number of thinking blocks produced (which should be non-zero). I will add a test for character length, tho, to test the streaming
delta=content_delta.get("text", None) | ||
or content_delta.get("thinking", None) | ||
or "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a breaking change right? Because now its streaming the thinking and deltas in the same in same string field
We might want to save this for when we have proper streaming content blocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The other async streaming method also doesn't do this)
llama-index-integrations/llms/llama-index-llms-ollama/llama_index/llms/ollama/base.py
Show resolved
Hide resolved
Ok @logan-markewich just implemented your requested changes and merged main, hopefully I did not break anything ^_^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made quite a few changes lol but got it working
fyi I was testing with this e2e script. Will probably be helpful to use this for the tool calling block PR (and maybe double checking the other thinking LLMs)
from audioop import mul
from llama_index.core.agent import FunctionAgent, AgentStream
from llama_index.llms.ollama import Ollama
from llama_index.llms.bedrock_converse import BedrockConverse
from llama_index.core.workflow import Context
llm = ...
async def get_the_greeting() -> str:
"""Useful for getting the current greeting for new people/users."""
return "Good day sir, top of the morning to ye."
async def multiply(a: float, b: float) -> float:
"""Useful for multiplying two numbers."""
return float(a) * float(b)
async def divide(a: float, b: float) -> float:
"""Useful for dividing two numbers."""
return float(a) / float(b)
agent = FunctionAgent(
llm=llm,
tools=[get_the_greeting, multiply, divide],
)
ctx = Context(agent)
async def main():
resp = llm.complete("Hello!")
resp = await llm.acomplete("Hello!")
handler = agent.run("Hello! I am a new user!", ctx=ctx)
async for ev in handler.stream_events():
if isinstance(ev, AgentStream):
print(ev.delta, end="", flush=True)
print()
resp = await handler
print(str(resp))
print()
for block in resp.response.blocks:
print(block)
print()
handler = agent.run("What is 1244 * 12 / 234.5? Think carefully", ctx=ctx)
async for ev in handler.stream_events():
if isinstance(ev, AgentStream):
print(ev.delta, end="", flush=True)
print()
resp = await handler
print(str(resp))
print()
for block in resp.response.blocks:
print(block)
print()
resp = await agent.run("thanks!", ctx=ctx)
print(resp)
if __name__ == "__main__":
import asyncio
asyncio.run(main())
weird, |
With this PR we add: