dynamo/docs/backends/trtllm/trtllm-gemma3-sliding-window-attention.md at main · drivenets/dynamo

title
Gemma3 Sliding Window

For general TensorRT-LLM features and configuration, see the Reference Guide.

This guide demonstrates how to deploy google/gemma-3-1b-it with Variable Sliding Window Attention (VSWA) using Dynamo. Since google/gemma-3-1b-it is a small model, each aggregated, decode, or prefill worker only requires one H100 GPU or one GB200 GPU. VSWA is a mechanism in which a model’s layers alternate between multiple sliding window sizes. An example of this is Gemma 3, which incorporates both global attention layers and sliding window layers.

Note

Ensure that required services such as nats and etcd are running before starting.
Request access to google/gemma-3-1b-it on Hugging Face and set your HF_TOKEN environment variable for authentication.

Aggregated Serving

cd $DYNAMO_HOME/examples/backends/trtllm
export MODEL_PATH=google/gemma-3-1b-it
export SERVED_MODEL_NAME=$MODEL_PATH
export AGG_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_agg.yaml
./launch/agg.sh

Aggregated Serving with KV Routing

cd $DYNAMO_HOME/examples/backends/trtllm
export MODEL_PATH=google/gemma-3-1b-it
export SERVED_MODEL_NAME=$MODEL_PATH
export AGG_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_agg.yaml
./launch/agg_router.sh

Disaggregated Serving

cd $DYNAMO_HOME/examples/backends/trtllm
export MODEL_PATH=google/gemma-3-1b-it
export SERVED_MODEL_NAME=$MODEL_PATH
export PREFILL_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_prefill.yaml
export DECODE_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_decode.yaml
./launch/disagg.sh

Disaggregated Serving with KV Routing

cd $DYNAMO_HOME/examples/backends/trtllm
export MODEL_PATH=google/gemma-3-1b-it
export SERVED_MODEL_NAME=$MODEL_PATH
export PREFILL_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_prefill.yaml
export DECODE_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_decode.yaml
./launch/disagg_router.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregated Serving

Aggregated Serving with KV Routing

Disaggregated Serving

Disaggregated Serving with KV Routing

FilesExpand file tree

trtllm-gemma3-sliding-window-attention.md

Latest commit

History

trtllm-gemma3-sliding-window-attention.md

File metadata and controls

Aggregated Serving

Aggregated Serving with KV Routing

Disaggregated Serving

Disaggregated Serving with KV Routing