Skip to content

Conversation

JunmooByun
Copy link

What does the PR do?

Support per-model tokenizer override when using Triton + vLLM in OpenAI-compatible mode.

This PR introduces HF_MODEL_NAME_MAP to associate custom model names with their corresponding Hugging Face model identifiers. During model registration, if a mapping is found, the tokenizer is loaded accordingly; otherwise, the system falls back to the default tokenizer.

This enables true multi-model serving in scenarios where each model may require a different tokenizer — something not possible with the previous global --tokenizer option.


Checklist

  • I have read the Contribution guidelines and signed the Contributor License Agreement
  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • I ran pre-commit locally (pre-commit install, pre-commit run --all)
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

  • feat

Related PRs:


Where should the reviewer start?

  • python/openai/openai_frontend/engine/triton_engine.py: Tokenizer override logic introduced here.

Test plan:

  • Ran frontend with:
    python3 openai_frontend/main.py --model-repository tests/vllm_models

@JunmooByun JunmooByun marked this pull request as draft August 4, 2025 01:19
@JunmooByun JunmooByun marked this pull request as ready for review August 4, 2025 01:29
@JunmooByun
Copy link
Author

This PR was created from a forked repository.

  • The branch has been updated to the latest main.
  • Currently, workflow approval and a code review are required.

Could you please:

  1. Approve and run the workflows
  2. Review and approve the PR

Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant