[Docs] [V1] Update docs to remove enforce_eager limitation for hybrid models. (#21233)

tdoublep · web-flow · commit 2b504eb77031 · 2025-07-19T16:09:58.000-07:00
Signed-off-by: Thomas Parnell &lt;tpa@zurich.ibm.com&gt;
diff --git a/docs/usage/v1_guide.md b/docs/usage/v1_guide.md
@@ -107,12 +107,11 @@ to enable simultaneous generation and embedding using the same engine instance i
 Models using selective state-space mechanisms instead of standard transformer attention are partially supported.
 Models that use Mamba-2 layers (e.g., `Mamba2ForCausalLM`) are supported, but models that use older Mamba-1 layers
 (e.g., `MambaForCausalLM`, `JambaForCausalLM`) are not yet supported. Please note that these models currently require
-enforcing eager mode and disabling prefix caching in V1.
+disabling prefix caching in V1.
 
 Models that combine Mamba-2 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`,
 `Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`). Please note that
-these models currently require enforcing eager mode, disabling prefix caching, and using the FlashInfer attention
-backend in V1.
+these models currently require disabling prefix caching and using the FlashInfer attention backend in V1.
 
 #### Encoder-Decoder Models