Skip to content

feat: Add multi-modality support for content blocks in PrecisePrefixCacheScorer#565

Open
guygir wants to merge 4 commits intollm-d:mainfrom
guygir:multi-modality
Open

feat: Add multi-modality support for content blocks in PrecisePrefixCacheScorer#565
guygir wants to merge 4 commits intollm-d:mainfrom
guygir:multi-modality

Conversation

@guygir
Copy link
Contributor

@guygir guygir commented Jan 14, 2026

IMPORTANT: Depends on llm-d/llm-d-kv-cache#255. This PR should not be merged until the kv-cache PR is merged first.

This PR extends the PrecisePrefixCacheScorer to preserve structured multi-modality content blocks (OpenAI API format) through the pipeline.
This is the first stage of multi-modality support - only basic technical feasibility. This PR focuses solely on images - not audio or video, because GAIE already supports images but audio/video support requires additional GAIE changes (These will be addressed in the next stage)

Changes:

  • Added convertContentToPreprocessingFormat() helper to convert GAIE's structured content blocks to OpenAI API format
  • Modified getScores() to preserve structured content instead of using raw text
  • Maintains backward compatibility with text-only content

What Works:

  • Multi-modal requests are correctly parsed and preserved from GAIE
  • Structured content blocks flow correctly through the scheduler
  • Chat template rendering works with structured content
  • Requests are forwarded correctly to vLLM
  • Backward compatible with text-only requests

Known Limitations (for current stage):

  • Tokenization may not match vLLM exactly (images tokenized as text, not vision tokens - this only affects merged preprocessor models like Qwen2-VL)
  • Block hashes may not match vLLM exactly (missing mm_hash - we're consistent with ourselves but won't match vLLM's hashes for multimodal blocks, which is not an issue, just FYI)
  • These will be addressed in the next stage.

Testing:

  • Multi-modality conversion logic test (test/multi_modality/test_multimodality_conversion.go)

…fixCacheScorer, and a basic test script for e2e multi-modality inputs

Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Remove redundant \n from fmt.Println string literal (fmt.Println already adds newline)

Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Rename test_e2e_multimodality.go to test_multimodality_conversion.go.
This test verifies conversion logic (GAIE parsing + format conversion),
not the full end-to-end pipeline.

Signed-off-by: Guy Girmonsky <guygir@gmail.com>
Update print statements to say 'conversion logic test' instead of 'end-to-end test'

Signed-off-by: Guy Girmonsky <guygir@gmail.com>
@github-actions github-actions bot requested review from kfswain and nilig January 14, 2026 22:17
@elevran elevran added this to the v0.6 milestone Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants