verify and support Kimi-K2.5 model#7612
verify and support Kimi-K2.5 model#7612liuyuhang-2025 wants to merge 1 commit intovllm-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes automated end-to-end testing for the Kimi-K2.5 model, specifically its W4A8 quantized version, within the vLLM Ascend backend. By integrating a new test configuration, it ensures robust continuous integration and verification for this model on Ascend NPU hardware, enhancing reliability and maintainability without affecting user-facing APIs. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds an end-to-end test configuration for the Kimi-K2.5-W4A8 model. The configuration is consistent with the provided documentation and the changes appear correct. The pull request description is well-written and follows the repository's template. However, the pull request title does not conform to the repository's style guide, which requires a [Branch][Module][Action] prefix. This is a high-priority requirement. A compliant title would be, for example: [Test][Feature] Add E2E test configuration for Kimi-K2.5.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
What this PR does / why we need it?
This PR introduces the official E2E verification test configuration for the
moonshotai/Kimi-K2.5model (W4A8 quantized version) on the vLLM Ascend backend.
Changes proposed:
tests/e2e/models/configs/Kimi-K2.5-W4A8.yamlto automate the verification pipeline based on the existing deployment tutorial.
This is needed to ensure continuous integration (CI) and automated verification for the newly supported Kimi-K2.5 model on Ascend NPU environments (Atlas 800 A2/A3).
Fixes [Feature]: Verify / Support
moonshotai/Kimi-K2.5#6683
Does this PR introduce any user-facing change?
No user-facing APIs or existing documentation were modified. This PR solely adds internal E2E testing configurations for CI validation.
How was this patch tested?
Kimi-K2.5-W4A8.yaml. The test parameters were strictly aligned with the existingKimi-K2.5.mddocumentation.