Skip to content

verify and support Kimi-K2.5 model#7612

Open
liuyuhang-2025 wants to merge 1 commit intovllm-project:mainfrom
liuyuhang-2025:support-kimi-k2.5
Open

verify and support Kimi-K2.5 model#7612
liuyuhang-2025 wants to merge 1 commit intovllm-project:mainfrom
liuyuhang-2025:support-kimi-k2.5

Conversation

@liuyuhang-2025
Copy link

@liuyuhang-2025 liuyuhang-2025 commented Mar 24, 2026

What this PR does / why we need it?

This PR introduces the official E2E verification test configuration for the moonshotai/Kimi-K2.5 model (W4A8 quantized version) on the vLLM Ascend backend.

Changes proposed:

  • Test Cases: Added tests/e2e/models/configs/Kimi-K2.5-W4A8.yaml to automate the verification pipeline based on the existing deployment tutorial.
    
    This is needed to ensure continuous integration (CI) and automated verification for the newly supported Kimi-K2.5 model on Ascend NPU environments (Atlas 800 A2/A3).
    
    Fixes [Feature]: Verify / Support moonshotai/Kimi-K2.5 #6683
    

Does this PR introduce any user-facing change?

No user-facing APIs or existing documentation were modified. This PR solely adds internal E2E testing configurations for CI validation.


How was this patch tested?

  • Automated Testing: CI validation via the newly added E2E test configuration Kimi-K2.5-W4A8.yaml. The test parameters were strictly aligned with the existing Kimi-K2.5.md documentation.
  • vLLM version: v0.18.0
  • vLLM main: vllm-project/vllm@ed359c4

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes automated end-to-end testing for the Kimi-K2.5 model, specifically its W4A8 quantized version, within the vLLM Ascend backend. By integrating a new test configuration, it ensures robust continuous integration and verification for this model on Ascend NPU hardware, enhancing reliability and maintainability without affecting user-facing APIs.

Highlights

  • New E2E Test Configuration: Introduced an official E2E verification test configuration for the moonshotai/Kimi-K2.5 model (W4A8 quantized version) on the vLLM Ascend backend.
  • Automated Verification: Added tests/e2e/models/configs/Kimi-K2.5-W4A8.yaml to automate the verification pipeline, ensuring continuous integration and automated verification for the Kimi-K2.5 model on Ascend NPU environments.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds an end-to-end test configuration for the Kimi-K2.5-W4A8 model. The configuration is consistent with the provided documentation and the changes appear correct. The pull request description is well-written and follows the repository's template. However, the pull request title does not conform to the repository's style guide, which requires a [Branch][Module][Action] prefix. This is a high-priority requirement. A compliant title would be, for example: [Test][Feature] Add E2E test configuration for Kimi-K2.5.

@github-actions
Copy link
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Verify / Support moonshotai/Kimi-K2.5

2 participants