Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions vllm/config/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1165,8 +1165,9 @@ def _verify_quantization(self) -> None:
"non-quantized models.", self.quantization)

def _verify_cuda_graph(self) -> None:
self.max_seq_len_to_capture = min(self.max_seq_len_to_capture,
self.max_model_len)
if not self.is_encoder_decoder:
self.max_seq_len_to_capture = min(self.max_seq_len_to_capture,
self.max_model_len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this change correctly addresses the issue for encoder-decoder models by not capping max_seq_len_to_capture, it relies on the default value being sufficiently large. This could be fragile if a user sets a smaller max_seq_len_to_capture or for models with very large encoder sequence lengths.

A more robust approach would be to explicitly determine the maximum length by considering both encoder and decoder sequence lengths, and then use that to cap max_seq_len_to_capture. This aligns better with the PR's goal to 'correctly determine the max sequence length'.

Consider this alternative implementation:

Suggested change
if not self.is_encoder_decoder:
self.max_seq_len_to_capture = min(self.max_seq_len_to_capture,
self.max_model_len)
max_len = self.max_model_len
if self.is_encoder_decoder:
max_len = max(
max_len, getattr(self.hf_config, "max_source_positions", 0))
self.max_seq_len_to_capture = min(self.max_seq_len_to_capture, max_len)

# CUDAGraph capture not supported for enc-dec models and mllama on ROCm
ROCM_UNSUPPORTED_MODELS = ['mllama']
unsupported_rocm = (self.hf_config.model_type
Expand Down