Add env variable VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K to disable FlashInfer concat_mla_k by maazmusameta · Pull Request #35016 · vllm-project/vllm

maazmusameta · 2026-02-21T09:29:19Z

Summary:
Add an environment variable check to allow disabling the FlashInfer
concat_mla_k kernel optimization. Setting VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K=1
will bypass this optimization, which is useful for debugging or when replaying
components on CUDA where FlashInfer may not work correctly.

Test Plan: VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K=1 buck2 run //vllm:test_mla_attention

Differential Revision: D93967992

…hInfer concat_mla_k Summary: Add an environment variable check to allow disabling the FlashInfer concat_mla_k kernel optimization. Setting VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K=1 will bypass this optimization, which is useful for debugging or when replaying components on CUDA where FlashInfer may not work correctly. Test Plan: VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K=1 buck2 run //vllm:test_mla_attention Differential Revision: D93967992

dosubot · 2026-02-21T09:29:28Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

gemini-code-assist

Code Review

This pull request introduces an environment variable VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K to disable the FlashInfer concat_mla_k kernel optimization, which is a useful addition for debugging purposes. The implementation is straightforward and correct. My only suggestion is to centralize the environment variable handling by using the vllm.envs module, which is the standard pattern in this codebase. This will improve code consistency and maintainability.

gemini-code-assist · 2026-02-21T09:30:45Z

vllm/model_executor/layers/attention/mla_attention.py

        # num_heads=128, nope_dim=128, rope_dim=64
        self._use_flashinfer_concat_mla_k = (
            has_flashinfer()
+            and os.environ.get("VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K", "0") != "1"


For consistency with how other vLLM-specific environment variables are handled, it would be better to manage VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K through the vllm.envs module. This centralizes environment variable management and makes the code cleaner.

You can add the new environment variable to vllm/envs.py like this:

# In vllm/envs.py 'VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K': lambda: os.getenv("VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K", "0") == "1",

Then, you can use it here as suggested. With this change, the import os at the top of this file is no longer needed and can be removed.

Suggested change

and os.environ.get("VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K", "0") != "1"

and not envs.VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K

maazmusameta requested a review from LucasWilkinson as a code owner February 21, 2026 09:29

meta-codesync bot added fb-exported meta-exported labels Feb 21, 2026

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Add env variable VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K to disable FlashInfer concat_mla_k#35016

Add env variable VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K to disable FlashInfer concat_mla_k#35016
maazmusameta wants to merge 1 commit intovllm-project:mainfrom
maazmusameta:export-D93967992

maazmusameta commented Feb 21, 2026

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	and os.environ.get("VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K", "0") != "1"
	and not envs.VLLM_DISABLE_FLASHINFER_CONCAT_MLA_K

Uh oh!

Comments

Conversation

maazmusameta commented Feb 21, 2026

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant