Skip to content

Commit c1c58f2

Browse files
tjohnson31415njhill
authored andcommitted
feat: support setting the attention impl in hf_transformers
Signed-off-by: Travis Johnson <[email protected]>
1 parent cfa10e3 commit c1c58f2

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

server/text_generation_server/inference_engine/hf_transformers.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ def __init__(
2424
"trust_remote_code": TRUST_REMOTE_CODE,
2525
}
2626

27+
# TODO: consider if Flash Attention should be enabled based on FLASH_ATTENTION=True
28+
if attn_impl := os.getenv("TRANSFORMERS_ATTN_IMPL"):
29+
kwargs["attn_implementation"] = attn_impl
30+
2731
if model_config.model_type == "mpt":
2832
model_config.init_device = str(self.device)
2933
kwargs["config"] = model_config

0 commit comments

Comments
 (0)