Skip to content

Commit 2b504eb

Browse files
authored
[Docs] [V1] Update docs to remove enforce_eager limitation for hybrid models. (#21233)
Signed-off-by: Thomas Parnell <[email protected]>
1 parent 10eb24c commit 2b504eb

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

docs/usage/v1_guide.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -107,12 +107,11 @@ to enable simultaneous generation and embedding using the same engine instance i
107107
Models using selective state-space mechanisms instead of standard transformer attention are partially supported.
108108
Models that use Mamba-2 layers (e.g., `Mamba2ForCausalLM`) are supported, but models that use older Mamba-1 layers
109109
(e.g., `MambaForCausalLM`, `JambaForCausalLM`) are not yet supported. Please note that these models currently require
110-
enforcing eager mode and disabling prefix caching in V1.
110+
disabling prefix caching in V1.
111111

112112
Models that combine Mamba-2 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`,
113113
`Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`). Please note that
114-
these models currently require enforcing eager mode, disabling prefix caching, and using the FlashInfer attention
115-
backend in V1.
114+
these models currently require disabling prefix caching and using the FlashInfer attention backend in V1.
116115

117116
#### Encoder-Decoder Models
118117

0 commit comments

Comments
 (0)