Skip to content

Conversation

@jakelorocco
Copy link
Contributor

@jakelorocco jakelorocco commented Nov 4, 2025

I ran into enough issues where I decided to fix them all. All tests pass locally now.

Fixes: #222 (comment)

List of changes:

  • vllm versioning: the vllm package was defaulting to a lower version when installing with .[all] vs .[vllm]; I could not figure out why but forcing the version works.
  • vllm bug with format resposne
    • some vllm servers don't like it when you just specify text, removed it from openai and litellm
  • vllm issues
    • changed generate_from_raw to utilize the event_loop helper like Ollama
    • added some minor event loop handling to vllm so that it can be utilized between both async and sync mfunc calls
  • vllm tests
    • Changed the skip condition to run at the fixture level and skip all other tests if it fails
    • Fixed the calls to generate_from_raw that got missed when the refactor happened
  • add ctx to val result
    • added example for accessing and utilizing these values
    • fixed issue with aloras not getting added to context properly
    • added tests
  • remove the β tests that always fail
  • add warning to huggingface generate_from_raw with mps
    • there's a but with our version of pytorch that causes batched requests to only populate the last item in the batch
  • fixed a doc folder that had .py in its name
  • removed pytest-xdist since it wasn't helping and was causing issues when running locally

@mergify
Copy link

mergify bot commented Nov 4, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@jakelorocco jakelorocco marked this pull request as ready for review November 4, 2025 18:52
Comment on lines +519 to +522
FancyLogger.get_logger().warning(
"utilizing device mps with a `generate_from_raw` request; you may see issues when submitting batches of prompts to a huggingface backend; ensure all ModelOutputThunks have non-empty values."
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's what this was!! I was having issues when I was running hf tests for this but it disappeared when I stepped into the while debugging. Thanks for adding this!

Copy link
Contributor

@avinash2692 avinash2692 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except not sure why the we're moving response_format into extra_params. Let me know if there is a reasoning for it and it can be documented here for posterity.

extra_params: dict[str, Any] = {}
if _format is not None:
response_format = {
extra_params["response_format"] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for the additional abstraction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with avi; response_format = None is better if the old value causes errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response_format = None sometimes causes issues as well with some backends (at least with the OpenAI backend I believe it used to). It's best to just not pass a response_format parameter if possible.

@avinash2692
Copy link
Contributor

Fixes issues related to vllm-project/vllm#26639

Copy link
Contributor

@guicho271828 guicho271828 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved, but its nice to fix minor requests

# if switching between async and sync calls.
if el != self._event_loop:
self._underlying_model.shutdown_background_loop()
self._underlying_model.start_background_loop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They call that a background_loop but it's not an event loop, it's actually a Future. Even the _background_loop_unshielded is a Task object.

I think it's fine to manage the reference to the event loop on our side. We only ever have the one AsyncLLMEngine per LocalVLLMBackend so there shouldn't be issues with us tracking it this way. Happy to change it if it causes issues later on.

extra_params: dict[str, Any] = {}
if _format is not None:
response_format = {
extra_params["response_format"] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with avi; response_format = None is better if the old value causes errors

@jakelorocco jakelorocco merged commit 7fa0891 into main Nov 4, 2025
3 of 4 checks passed
@jakelorocco jakelorocco deleted the jal/minor-fixes branch November 4, 2025 21:08
tuliocoppola pushed a commit to tuliocoppola/mellea that referenced this pull request Nov 5, 2025
* fix: enforce minimum vllm version

* fix: remove tests that look for "β"

* fix: remove default response_format from litellm and openai backends

* fix: remove xdist from pytests

* fix: fix vllm tests

* fix: vllm async event loop

* feat: add contexts to validation results

* fix: add warning for mps with huggingface generate from raw

* fix: remove .py from folder name

* fix: remove pytest-xdist specific args

* fix: add exception with vllm backend when env var not set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValidationResult should have its generation_ctx

4 participants