File tree Expand file tree Collapse file tree 1 file changed +2
-3
lines changed Expand file tree Collapse file tree 1 file changed +2
-3
lines changed Original file line number Diff line number Diff line change @@ -107,12 +107,11 @@ to enable simultaneous generation and embedding using the same engine instance i
107
107
Models using selective state-space mechanisms instead of standard transformer attention are partially supported.
108
108
Models that use Mamba-2 layers (e.g., ` Mamba2ForCausalLM ` ) are supported, but models that use older Mamba-1 layers
109
109
(e.g., ` MambaForCausalLM ` , ` JambaForCausalLM ` ) are not yet supported. Please note that these models currently require
110
- enforcing eager mode and disabling prefix caching in V1.
110
+ disabling prefix caching in V1.
111
111
112
112
Models that combine Mamba-2 layers with standard attention layers are also supported (e.g., ` BambaForCausalLM ` ,
113
113
` Zamba2ForCausalLM ` , ` NemotronHForCausalLM ` , ` FalconH1ForCausalLM ` and ` GraniteMoeHybridForCausalLM ` ). Please note that
114
- these models currently require enforcing eager mode, disabling prefix caching, and using the FlashInfer attention
115
- backend in V1.
114
+ these models currently require disabling prefix caching and using the FlashInfer attention backend in V1.
116
115
117
116
#### Encoder-Decoder Models
118
117
You can’t perform that action at this time.
0 commit comments