Skip to content

Conversation

prashantgupta24
Copy link
Collaborator

@prashantgupta24 prashantgupta24 commented Jul 9, 2025

Description

🎨 Remove new_token_ids from warmup since new_token_ids are not used anymore.

Copy link

github-actions bot commented Jul 9, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@prashantgupta24 prashantgupta24 changed the title 🔥 remove new_token_ids from warmup decode [WIP] Fix the compiler issue with the new changes Jul 9, 2025
@prashantgupta24
Copy link
Collaborator Author

bot:test
MARKERS="cb and spyre"

@prashantgupta24 prashantgupta24 changed the title [WIP] Fix the compiler issue with the new changes 🎨 Remove new_token_ids from warmup Jul 9, 2025
@joerunde
Copy link
Collaborator

@prashantgupta24 do we wanna get this merged?

If so we should definitely test to triple-check that this works with the upcoming compiler changes for continuous batching

@prashantgupta24
Copy link
Collaborator Author

No hurry as such, don't want to add anything before the release lol

@prashantgupta24 prashantgupta24 marked this pull request as ready for review July 30, 2025 17:44
@prashantgupta24
Copy link
Collaborator Author

bot:test
MARKERS="spyre and not quantized"

1 similar comment
@prashantgupta24
Copy link
Collaborator Author

bot:test
MARKERS="spyre and not quantized"

Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@prashantgupta24
Copy link
Collaborator Author

prashantgupta24 commented Jul 30, 2025

Failed one of the tests on spyre

tests/e2e/test_spyre_cb_scheduler_steps.py::test_new_sequence_joins_during_decode[sendnn-ibm-ai-platform/micro-g3.3-8b-instruct-1b] FAILED

will investigate, seems flaky...

@prashantgupta24
Copy link
Collaborator Author

prashantgupta24 commented Aug 1, 2025

bot:test
MARKERS="spyre and cb" - all cb tests passed ! (170)

@prashantgupta24
Copy link
Collaborator Author

bot:test
MARKERS="spyre and not quantized"

@prashantgupta24
Copy link
Collaborator Author

Failed 2 different tests this time -

tests/e2e/test_spyre_cb_scheduler_steps.py::test_prompt_too_long_for_current_tkv[sendnn-ibm-ai-platform/micro-g3.3-8b-instruct-1b] FAILED
tests/e2e/test_spyre_cb_scheduler_steps.py::test_requested_tokens_not_fitting_remaining_space[sendnn-ibm-ai-platform/micro-g3.3-8b-instruct-1b] FAILED

Seems flaky since running only CB tests made them all pass 🤔

@prashantgupta24 prashantgupta24 marked this pull request as draft August 1, 2025 17:59
@prashantgupta24
Copy link
Collaborator Author

Too much flakiness going on, going to convert it to a draft for now.

@prashantgupta24
Copy link
Collaborator Author

bot:test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants