Skip to content

Conversation

@jakelorocco
Copy link
Contributor

Discussion: #103

Currently, the draft PR only adds the capability to ollama to highlight what changes were necessary.

Changes

  • m.act and m.validate had to be changed to support async calls
  • generate_from_context now returns a model output thunk that is ready for generation
    • model output thunk gets functions for generating and processing the output
    • generate returns a model output thunk with awaitable values but is not an async function
    • this also sets up support for lazy computation
  • by default, validation will run asynchronously
  • sampling strategies must be changed to support async generation

Note: I'm not happy with where/how some functions are defined (mostly the processing functions). I am planning on moving those. These changes also set us up for simplifying backends. A generic backend could define this control flow of generation and prepping a model output thunk while calling the processing methods implemented by a specific backend.

@mergify
Copy link

mergify bot commented Sep 11, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?:

Copy link
Contributor

@nrfulton nrfulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall approach looks good to me. I do wonder about everything becoming async but I think it's okay.

@jakelorocco
Copy link
Contributor Author

The overall approach looks good to me. I do wonder about everything becoming async but I think it's okay.

@nrfulton, if we want to keep synchronous versions of functions around we definitely can. generate_from_context would just have to have a parameter that sets the correct generate function depending on what's desired. It just makes higher level abstractions like sampling strategies and result validation a bit more complicated with juggling the async vs sync versions.

Copy link
Contributor

@HendrikStrobelt HendrikStrobelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, I understand the logic... but currently from the outside, .chat and .instruct act the same for streaming and non-streaming, or ?

@jakelorocco
Copy link
Contributor Author

I think, I understand the logic... but currently from the outside, .chat and .instruct act the same for streaming and non-streaming, or ?

Yes; once we introduce partial requirement checking / logging, you will be able to notice differences with streaming, but no real functional differences between the two.

@jakelorocco jakelorocco changed the title feat: add async and streaming (wip; only for ollama) feat: add async and streaming Sep 17, 2025
@jakelorocco
Copy link
Contributor Author

I still need to make a few tweaks and push my changes to fix aloras / logging.

@jakelorocco
Copy link
Contributor Author

@nrfulton @HendrikStrobelt
I was unable to test vllm/openai aloras because there is some issue with our vllm script (will open an issue for that as well). All the other tests passed locally for me (but we'll see what the github actions say).

Still need to finish up and push the documentation, but the code is ready for review. I'll monitor for test failures, I believe I fixed all of the things failing due to merging main back into my branch.

@jakelorocco jakelorocco marked this pull request as ready for review September 22, 2025 20:22
@jakelorocco
Copy link
Contributor Author

I am trying to debug why the tests are failing; it's some out of space issue so I'm not sure if I just happened to be the unlucky one that pushed us over the edge or if some change I made is actually causing the issue.

@jakelorocco
Copy link
Contributor Author

Tested that new code works in collab by installing my specific branch of mellea. Didn't see any issues in the notebook.

@jakelorocco jakelorocco merged commit 4ee56a9 into main Sep 23, 2025
4 checks passed
@jakelorocco jakelorocco deleted the jal/async-streaming branch September 23, 2025 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants