Skip to content

Conversation

@derekhiggins
Copy link
Contributor

o Introduces vLLM provider support to the record/replay testing framework
o Enabling both recording and replay of vLLM API interactions alongside existing Ollama support.

The changes enable testing of vLLM functionality. vLLM tests focus on
inference capabilities, while Ollama continues to exercise the full API surface
including vision features.

--
This is an alternative to #3128 , using qwen3 instead of llama 3.2 1B appears to be more capable at structure output and tool calls.

@@ -168,6 +168,11 @@ class Setup(BaseModel):
roots=base_roots,
default_setup="ollama",
),
"base-vllm-subset": Suite(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent here was to add this job with only the tests in "tests/integration/inference" and then once we're happy we haven't cause any major disruption we could expand to the entire suit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekhiggins you feel like this is ready?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe so,
Although the ollama record job is broken, I've had to rebase with this commit #3898 in order to get new ollama recordings

@derekhiggins derekhiggins force-pushed the vllm-ci-qwen branch 8 times, most recently from 1c3ea50 to 0f0d986 Compare October 15, 2025 09:58
It preforms better in tool calling and structured tests

Signed-off-by: Derek Higgins <[email protected]>
Add vLLM provider support to integration test CI workflows alongside
existing Ollama support. Configure provider-specific test execution
where vLLM runs only inference specific tests (excluding vision tests) while
Ollama continues to run the full test suite.

This enables comprehensive CI testing of both inference providers but
keeps the vLLM footprint small, this can be expanded later if it proves
to not be too disruptive.

Also updated test skips that were marked with "inline::vllm", this
should be "remote::vllm". This causes some failing log probs tests
to be skipped and should be revisted.

Signed-off-by: Derek Higgins <[email protected]>
Signed-off-by: Derek Higgins <[email protected]>
derekhiggins and others added 2 commits October 24, 2025 00:31
The vector_provider_wrapper was only limiting providers to faiss/sqlite-vec
for replay mode, but CI tests also run in record mode with the same limited
set of providers. This caused test failures when trying to test against
milvus, chromadb, pgvector, weaviate, and qdrant which aren't configured
in the record job.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants