File tree Expand file tree Collapse file tree 1 file changed +2
-3
lines changed Expand file tree Collapse file tree 1 file changed +2
-3
lines changed Original file line number Diff line number Diff line change @@ -107,12 +107,11 @@ to enable simultaneous generation and embedding using the same engine instance i
107107Models using selective state-space mechanisms instead of standard transformer attention are partially supported.
108108Models that use Mamba-2 layers (e.g., ` Mamba2ForCausalLM ` ) are supported, but models that use older Mamba-1 layers
109109(e.g., ` MambaForCausalLM ` , ` JambaForCausalLM ` ) are not yet supported. Please note that these models currently require
110- enforcing eager mode and disabling prefix caching in V1.
110+ disabling prefix caching in V1.
111111
112112Models that combine Mamba-2 layers with standard attention layers are also supported (e.g., ` BambaForCausalLM ` ,
113113` Zamba2ForCausalLM ` , ` NemotronHForCausalLM ` , ` FalconH1ForCausalLM ` and ` GraniteMoeHybridForCausalLM ` ). Please note that
114- these models currently require enforcing eager mode, disabling prefix caching, and using the FlashInfer attention
115- backend in V1.
114+ these models currently require disabling prefix caching and using the FlashInfer attention backend in V1.
116115
117116#### Encoder-Decoder Models
118117
You can’t perform that action at this time.
0 commit comments