Skip to content

Comments

Add env variable VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K to disable FlashInfer concat_mla_k#35016

Open
maazmusameta wants to merge 1 commit intovllm-project:mainfrom
maazmusameta:export-D93967992
Open

Add env variable VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K to disable FlashInfer concat_mla_k#35016
maazmusameta wants to merge 1 commit intovllm-project:mainfrom
maazmusameta:export-D93967992

Conversation

@maazmusameta
Copy link

Summary:
Add an environment variable check to allow disabling the FlashInfer
concat_mla_k kernel optimization. Setting VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K=1
will bypass this optimization, which is useful for debugging or when replaying
components on CUDA where FlashInfer may not work correctly.

Test Plan: VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K=1 buck2 run //vllm:test_mla_attention

Differential Revision: D93967992

…hInfer concat_mla_k

Summary:
Add an environment variable check to allow disabling the FlashInfer
concat_mla_k kernel optimization. Setting VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K=1
will bypass this optimization, which is useful for debugging or when replaying
components on CUDA where FlashInfer may not work correctly.

Test Plan: VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K=1 buck2 run //vllm:test_mla_attention

Differential Revision: D93967992
@dosubot
Copy link

dosubot bot commented Feb 21, 2026

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an environment variable VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K to disable the FlashInfer concat_mla_k kernel optimization, which is a useful addition for debugging purposes. The implementation is straightforward and correct. My only suggestion is to centralize the environment variable handling by using the vllm.envs module, which is the standard pattern in this codebase. This will improve code consistency and maintainability.

# num_heads=128, nope_dim=128, rope_dim=64
self._use_flashinfer_concat_mla_k = (
has_flashinfer()
and os.environ.get("VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K", "0") != "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For consistency with how other vLLM-specific environment variables are handled, it would be better to manage VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K through the vllm.envs module. This centralizes environment variable management and makes the code cleaner.

You can add the new environment variable to vllm/envs.py like this:

# In vllm/envs.py
'VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K': lambda: os.getenv("VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K", "0") == "1",

Then, you can use it here as suggested. With this change, the import os at the top of this file is no longer needed and can be removed.

Suggested change
and os.environ.get("VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K", "0") != "1"
and not envs.VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant