Skip to content

Conversation

@acardace
Copy link
Contributor

Summary

Updates the precise-prefix-cache-scorer to perform tokenization in the scheduler and pass pre-computed tokens to GetPodScores, rather than delegating tokenization to the kv-cache indexer.

Related to llm-d/llm-d-kv-cache#244
Related to #530

Note: This PR depends on llm-d/llm-d-kv-cache#266 and must be merged after it.

Changes

  • Build: Update Makefile PYTHONPATH to reference llm-d-kv-cache module
  • Scorer: Tokenize the prompt in the scheduler, then pass tokens to GetPodScores
  • Tests: Adapt to updated signatures and reuse tokenizer's built-in chat templater

@elevran
Copy link
Collaborator

elevran commented Jan 21, 2026

/hold for post 0.5

@github-actions github-actions bot added the hold label Jan 21, 2026
@elevran elevran moved this to In progress in llm-d-inference-scheduler Jan 21, 2026
@acardace
Copy link
Contributor Author

@elevran what's the release cadence for llm-d-kv-cache? Of course the corresponding PR in kv-cache must be merged first and have a tag before merging this.

@elevran
Copy link
Collaborator

elevran commented Jan 21, 2026

I believe it's 6w give or take. @vMaroon can give you a more exact answer. From inference scheduler point of view the hold can be removed as we cut the 0.5 RC in the next few days

@acardace acardace force-pushed the feat/getpodscores-with-token branch 2 times, most recently from 5a30a16 to 2859a30 Compare January 22, 2026 10:17
The new API separates tokenization from scoring, requiring explicit
token processor initialization and a two-step flow: tokenize first,
then get pod scores.

Signed-off-by: Antonio Cardace <[email protected]>
Adapt tests to the new llm-d-kv-cache API

Signed-off-by: Antonio Cardace <[email protected]>
@acardace acardace force-pushed the feat/getpodscores-with-token branch from 2859a30 to f6a5830 Compare January 22, 2026 11:34
@elevran elevran added this to the v0.6 milestone Jan 22, 2026
@elevran elevran removed the hold label Jan 26, 2026
@elevran
Copy link
Collaborator

elevran commented Jan 26, 2026

@acardace @vMaroon @kfswain
does it make sense to do toeknziation as part of the scorer or should this be more of an "infra" service (perhaps as part of an explicit data preparation phase)?

@acardace
Copy link
Contributor Author

@acardace @vMaroon @kfswain does it make sense to do toeknziation as part of the scorer or should this be more of an "infra" service (perhaps as part of an explicit data preparation phase)?

My take is that this is just prepping in order to move tokenization as a service, possibly inside GAIE. I'm actually working on a RFC to introduce tokenization as a service inside the IGW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants