Skip to content

Conversation

AstraBert
Copy link
Member

With this PR we add:

  • Support for thinking block in Ollama
  • Support for thinking and thinking block in Bedrock Converse

@AstraBert AstraBert marked this pull request as ready for review September 25, 2025 10:19
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Sep 25, 2025
@AstraBert
Copy link
Member Author

Ok this should be now good to go, I tested e2e multiple times both with Ollama and Bedrock Converse and both are good with thinking :))

additional_kwargs={
"tool_calls": all_tool_calls,
"thinking": thinking_txt,
"tool_calls": list(set(all_tool_calls)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a revert of the fix made around day back: 8079db7

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmh maybe the merge did this wrong, but technically we are addressing the tool_calls kwargs in this PR: #19947

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically tool_calls won't be needed anymore :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will still break though, we should probably go back to all_tool_calls to avoid the error in the PR that tyson linked

@logan-markewich
Copy link
Collaborator

Looks like some merge conflicts got created (sorry, that might be my fault lol)

if content_block_delta := chunk.get("contentBlockDelta"):
content_delta = content_block_delta["delta"]
content = join_two_dicts(content, content_delta)
thinking = ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only returning the full thinking/reasoning text? Or is it streaming?

If its streaming, setting thinking = "" means we are removing the complete thinking string right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to update the tests to check if the thinking is over, like, 50 chars?

Copy link
Member Author

@AstraBert AstraBert Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I tested the streaming behavior, this basically means that we have separate thinking chunks for each streamed response, instead of an incrementally growing response. So, rather than this:

The quick brown fox j
The quick brown fox jumps over th
The quick brown fox jumps over the lazy dog

We would have this:

The quick brown fox
jumps over the
lazy dog

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the test for the streaming of thinking blocks does not check for the character length, it checks for the number of thinking blocks produced (which should be non-zero). I will add a test for character length, tho, to test the streaming

Comment on lines 533 to 535
delta=content_delta.get("text", None)
or content_delta.get("thinking", None)
or "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a breaking change right? Because now its streaming the thinking and deltas in the same in same string field

We might want to save this for when we have proper streaming content blocks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The other async streaming method also doesn't do this)

@AstraBert
Copy link
Member Author

Ok @logan-markewich just implemented your requested changes and merged main, hopefully I did not break anything ^_^

Copy link
Collaborator

@logan-markewich logan-markewich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made quite a few changes lol but got it working

fyi I was testing with this e2e script. Will probably be helpful to use this for the tool calling block PR (and maybe double checking the other thinking LLMs)

from audioop import mul
from llama_index.core.agent import FunctionAgent, AgentStream
from llama_index.llms.ollama import Ollama
from llama_index.llms.bedrock_converse import BedrockConverse
from llama_index.core.workflow import Context


llm = ...


async def get_the_greeting() -> str:
    """Useful for getting the current greeting for new people/users."""
    return "Good day sir, top of the morning to ye."


async def multiply(a: float, b: float) -> float:
    """Useful for multiplying two numbers."""
    return float(a) * float(b)


async def divide(a: float, b: float) -> float:
    """Useful for dividing two numbers."""
    return float(a) / float(b)


agent = FunctionAgent(
    llm=llm,
    tools=[get_the_greeting, multiply, divide],
)
ctx = Context(agent)


async def main():
    resp = llm.complete("Hello!")
    resp = await llm.acomplete("Hello!")
    handler = agent.run("Hello! I am a new user!", ctx=ctx)
    async for ev in handler.stream_events():
        if isinstance(ev, AgentStream):
            print(ev.delta, end="", flush=True)

    print()
    resp = await handler
    print(str(resp))
    print()
    for block in resp.response.blocks:
        print(block)
    print()

    handler = agent.run("What is 1244 * 12 / 234.5? Think carefully", ctx=ctx)
    async for ev in handler.stream_events():
        if isinstance(ev, AgentStream):
            print(ev.delta, end="", flush=True)

    print()
    resp = await handler
    print(str(resp))
    print()
    for block in resp.response.blocks:
        print(block)
    print()

    resp = await agent.run("thanks!", ctx=ctx)
    print(resp)


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 4, 2025
@logan-markewich
Copy link
Collaborator

weird, make lint modified a ton of docs files. Gonna go with that for now lol

@logan-markewich logan-markewich merged commit 6957c5b into main Oct 4, 2025
10 of 11 checks passed
@logan-markewich logan-markewich deleted the clelia/ollama-and-bedrock-thinking branch October 4, 2025 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants