-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Description
Your current environment
The output of python collect_env.py
==============================
System Info
==============================
OS : Ubuntu 22.04.5 LTS (x86_64)
GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version : Could not collect
CMake version : version 3.22.1
Libc version : glibc-2.35
==============================
PyTorch Info
==============================
PyTorch version : 2.7.1+cu126
Is debug build : False
CUDA used to build PyTorch : 12.6
ROCM used to build PyTorch : N/A
==============================
Python Environment
==============================
Python version : 3.12.7 | packaged by conda-forge | (main, Oct 4 2024, 16:05:46) [GCC 13.3.0] (64-bit runtime)
Python platform : Linux-6.8.0-1017-aws-x86_64-with-glibc2.35
==============================
CUDA / GPU Info
==============================
Is CUDA available : True
CUDA runtime version : 12.4.131
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration : GPU 0: NVIDIA A10G
Nvidia driver version : 550.127.05
cuDNN version : Could not collect
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
CPU Info
==============================
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7R32
CPU family: 23
Model: 49
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 0
BogoMIPS: 5599.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 256 KiB (8 instances)
L1i cache: 256 KiB (8 instances)
L2 cache: 4 MiB (8 instances)
L3 cache: 32 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
==============================
Versions of relevant libraries
==============================
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.1.3
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-cufile-cu12==1.11.1.6
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] onnxruntime==1.20.1
[pip3] pyzmq==27.0.0
[pip3] torch==2.7.1
[pip3] torchao==0.12.0
[pip3] torchaudio==2.7.1
[pip3] torchvision==0.22.1
[pip3] transformers==4.55.3
[pip3] triton==3.3.1
[conda] numpy 2.1.3 pypi_0 pypi
[conda] nvidia-cublas-cu12 12.6.4.1 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.6.80 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.6.77 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.6.77 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.5.1.17 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.3.0.4 pypi_0 pypi
[conda] nvidia-cufile-cu12 1.11.1.6 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.7.77 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.7.1.2 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.5.4.2 pypi_0 pypi
[conda] nvidia-cusparselt-cu12 0.6.3 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.26.2 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.6.85 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.6.77 pypi_0 pypi
[conda] pyzmq 27.0.0 pypi_0 pypi
[conda] torch 2.7.1 pypi_0 pypi
[conda] torchao 0.12.0 pypi_0 pypi
[conda] torchaudio 2.7.1 pypi_0 pypi
[conda] torchvision 0.22.1 pypi_0 pypi
[conda] transformers 4.55.3 pypi_0 pypi
[conda] triton 3.3.1 pypi_0 pypi
==============================
vLLM Info
==============================
ROCM Version : Could not collect
Neuron SDK Version : N/A
vLLM Version : 0.10.1rc2.dev213+ga406a0e36.d20250828 (git sha: a406a0e36, date: 20250828)
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
�[4mGPU0 CPU Affinity NUMA Affinity GPU NUMA ID�[0m
GPU0 X 0-15 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
==============================
Environment Variables
==============================
LD_LIBRARY_PATH=/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
🐛 Describe the bug
Description
I'm attempting to enable LoRA support for the Gemma3nForConditionalGeneration model by implementing the SupportsLoRA interface. While the model functions correctly with torch compilation disabled (level=0) and without CUDA graphs, it fails during torch compilation when using PIECEWISE mode due to dynamic shape constraint violations in the embedding and position encoding components.
Environment
- OS: Ubuntu 22.04.5 LTS (Kernel: 6.8.0-1017-aws)
- Python Version: 3.11.13
- vLLM Version: 0.10.1rc2.dev213+ga406a0e36.d20250828 (development build)
- PyTorch Version: 2.7.1+cu126
- CUDA Runtime: 12.6
- CUDA Compiler: 12.4 (V12.4.131)
- GPU: NVIDIA A10G (23GB VRAM)
- NVIDIA Driver: 550.127.05
- Key Dependencies:
- transformers: 4.55.4
- torchvision: 0.22.1
- torchaudio: 2.7.1
- numpy: 2.2.6
- safetensors: 0.6.2
- tokenizers: 0.21.4
- triton: 3.3.1
- scipy: 1.16.1
Base Commit
Changes made from main branch commit: [Docs] [V1] [Hybrid] Add new documentation re: contributing mamba-based models (#23824 )
Expected Behavior
The Gemma3n model with LoRA support should compile successfully with PIECEWISE torch compilation mode, similar to other LoRA-enabled models in vLLM.
Current Behavior
The model compilation fails with a ConstraintViolationError during the torch.compile process, specifically related to dynamic shape inference for embedding tensors:
torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (L['inputs_embeds'].size()[0], L['per_layer_inputs'].size()[0], L['positions'].size()[0])!
- Not all values of RelaxedUnspecConstraint(L['inputs_embeds'].size()[0]) are valid because L['inputs_embeds'].size()[0] was inferred to be a constant (2048).
- Not all values of RelaxedUnspecConstraint(L['per_layer_inputs'].size()[0]) are valid because L['per_layer_inputs'].size()[0] was inferred to be a constant (2048).
- Not all values of RelaxedUnspecConstraint(L['positions'].size()[0]) are valid because L['positions'].size()[0] was inferred to be a constant (2048).
Changes Made
Based on commit [Docs] [V1] [Hybrid] Add new documentation re: contributing mamba-based models (#23824 ), I implemented the following modifications:
1. Added LoRA Support Interface
File: vllm/model_executor/models/gemma3n.py
# Line 52: Added SupportsLoRA import
from .interfaces import SupportsQuant, SupportsLoRA
# Line 1054: Added SupportsLoRA inheritance
class Gemma3nForCausalLM(nn.Module, SupportsLoRA):File: vllm/model_executor/models/gemma3n_mm.py
# Line 43: Added SupportsLoRA import
from .interfaces import MultiModalEmbeddings, SupportsLoRA, SupportsMultiModal
# Line 413: Added SupportsLoRA inheritance and required attributes
class Gemma3nForConditionalGeneration(nn.Module, SupportsMultiModal, SupportsLoRA):
# LoRA specific attributes - empty since we don't apply LoRA to embeddings
embedding_modules: dict[str, str] = {}
embedding_padding_modules: list[str] = []2. Enhanced Multimodal Key Mapping
@classmethod
def get_mm_mapping(cls) -> MultiModelKeys:
return MultiModelKeys.from_string_field(
language_model="language_model",
connector=["multi_modal_projector", "embed_audio", "embed_vision"],
tower_model=["vision_tower", "audio_tower"])3. Defensive Reshaping for Vision Processing
Added defensive reshaping logic in _process_vision_input to handle tensor dimension changes that occur with LoRA:
# Handle both 2D [batch*tokens, hidden] and 3D [batch*tokens, 1, hidden] cases
if embedded_vision.shape[0] == batch_size * tokens_per_image:
# Tensor was flattened, reshape it back to [batch, tokens, hidden]
embedded_vision = embedded_vision.view(batch_size, tokens_per_image, -1)Root Cause Analysis
The compilation failure appears to be related to dynamic shape handling in the multimodal embedding pipeline. The torch.compile system is inferring fixed shapes (2048 tokens) for tensors that should maintain dynamic shapes, particularly:
- Input Embeddings
- Per-Layer Inputs
- Position Embeddings
Reproduction Steps
- Apply the above changes to enable LoRA support for Gemma3n
- Configure vLLM with:
- Torch compilation level:
PIECEWISE - CUDA graphs enabled
- Any LoRA adapter (even a dummy one)
- Torch compilation level:
- Attempt to start the vLLM server
- Observe compilation failure during model profiling
Workaround
The model works correctly with:
--compilation-level=0(disables torch.compile)--disable-cuda-graph
Logs
Complete Error Logs
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]INFO 08-29 21:46:51 [backends.py:549] Dynamo bytecode transform time: 28.01 s
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]INFO 08-29 21:46:56 [backends.py:194] Cache the graph for dynamic shape for later use
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]DEBUG 08-29 21:46:56 [backends.py:200] Store the 0-th graph for dynamic shape from inductor via handle ('frz2a2z2ugx7fm7hcmringndzcr4nugqxdvncz5bgjfctug7ph63', '/home/ubuntu/.cache/vllm/torch_compile_cache/3104c0d8da/rank_0_0/inductor_cache/in/cin4zgruyhzptcnzq65mp3pwm7lvrsviculepwdc3r5iiexqmtfz.py')
�[1;36m(APIServer pid=33959)�[0;0m [vllm]DEBUG 08-29 21:46:57 [utils.py:777] Waiting for 1 local, 0 remote core engine proc(s) to start.
...
...
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]DEBUG 08-29 21:47:44 [backends.py:200] Store the 30-th graph for dynamic shape from inductor via handle ('fd54vujzgirhvqyupct3qpn4gaffifv3tfh3pgipd5kr2l7evtx7', '/home/ubuntu/.cache/vllm/torch_compile_cache/3104c0d8da/rank_0_0/inductor_cache/gv/cgvvmqk4t4x5mvvh6kizldgfr24uhaqe5bdxcwa6r7ujvqaji55y.py')
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]INFO 08-29 21:47:44 [backends.py:215] Compiling a graph for dynamic shape takes 51.15 s
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] Error while creating guard:
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] Name: ''
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] Source: shape_env
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] Create Function: SHAPE_ENV
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] Guard Types: None
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] Code List: None
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] Object Weakref: None
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] Guarded Class Weakref: None
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] Traceback (most recent call last):
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_guards.py", line 357, in create
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] return self.create_fn(builder, self)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/guards.py", line 1968, in SHAPE_ENV
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] python_code_parts, verbose_code_parts = _get_code_parts(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] ^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/guards.py", line 1951, in _get_code_parts
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] return output_graph.shape_env.produce_guards_verbose(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/fx/experimental/symbolic_shapes.py", line 5409, in produce_guards_verbose
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] raise ConstraintViolationError(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (L['inputs_embeds'].size()[0], L['per_layer_inputs'].size()[0], L['positions'].size()[0])! For more information, run with TORCH_LOGS="+dynamic".
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] - Not all values of RelaxedUnspecConstraint(L['inputs_embeds'].size()[0]) are valid because L['inputs_embeds'].size()[0] was inferred to be a constant (2048).
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] - Not all values of RelaxedUnspecConstraint(L['per_layer_inputs'].size()[0]) are valid because L['per_layer_inputs'].size()[0] was inferred to be a constant (2048).
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.745000 34269 site-packages/torch/_guards.py:359] [0/0] - Not all values of RelaxedUnspecConstraint(L['positions'].size()[0]) are valid because L['positions'].size()[0] was inferred to be a constant (2048).
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.747000 34269 site-packages/torch/_guards.py:361] [0/0] Created at:
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.747000 34269 site-packages/torch/_guards.py:361] [0/0] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 694, in transform
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.747000 34269 site-packages/torch/_guards.py:361] [0/0] tracer = InstructionTranslator(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.747000 34269 site-packages/torch/_guards.py:361] [0/0] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3327, in __init__
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.747000 34269 site-packages/torch/_guards.py:361] [0/0] output=OutputGraph(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.747000 34269 site-packages/torch/_guards.py:361] [0/0] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 358, in __init__
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.747000 34269 site-packages/torch/_guards.py:361] [0/0] self.init_ambient_guards()
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.747000 34269 site-packages/torch/_guards.py:361] [0/0] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 512, in init_ambient_guards
�[1;36m(EngineCore_0 pid=34269)�[0;0m [rank0]:E0829 21:47:46.747000 34269 site-packages/torch/_guards.py:361] [0/0] self.guards.add(ShapeEnvSource().make_guard(GuardBuilder.SHAPE_ENV))
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] EngineCore failed to start.
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] Traceback (most recent call last):
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 703, in run_engine_core
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] engine_core = EngineCoreProc(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 504, in __init__
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] super().__init__(vllm_config, executor_class, log_stats,
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 90, in __init__
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] self._initialize_kv_caches(vllm_config)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 182, in _initialize_kv_caches
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] self.model_executor.determine_available_memory())
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/executor/abstract.py", line 84, in determine_available_memory
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return self.collective_rpc("determine_available_memory")
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] answer = run_method(self.driver_worker, method, args, kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/utils/__init__.py", line 3036, in run_method
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return func(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return func(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_worker.py", line 245, in determine_available_memory
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] self.model_runner.profile_run()
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 2598, in profile_run
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] = self._dummy_run(self.max_num_tokens, is_profile=True)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return func(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 2375, in _dummy_run
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] outputs = self.model(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return self._call_impl(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return forward_call(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/model_executor/models/gemma3n_mm.py", line 674, in forward
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] hidden_states = self.language_model.model(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/home/ubuntu/yashpratap/exp/vllm/vllm/compilation/decorators.py", line 305, in __call__
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] output = self.compiled_callable(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return fn(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1432, in __call__
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return self._torchdynamo_orig_callable(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 598, in __call__
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return _compile(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1059, in _compile
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] guarded_code = compile_inner(code, one_graph, hooks, transform)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return function(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 761, in compile_inner
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return _compile_inner(code, one_graph, hooks, transform)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 906, in _compile_inner
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] check_fn = CheckFunctionManager(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/guards.py", line 2490, in __init__
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] guard.create(builder)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_guards.py", line 357, in create
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return self.create_fn(builder, self)
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/guards.py", line 1968, in SHAPE_ENV
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] python_code_parts, verbose_code_parts = _get_code_parts(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/guards.py", line 1951, in _get_code_parts
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] return output_graph.shape_env.produce_guards_verbose(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/fx/experimental/symbolic_shapes.py", line 5409, in produce_guards_verbose
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] raise ConstraintViolationError(
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (L['inputs_embeds'].size()[0], L['per_layer_inputs'].size()[0], L['positions'].size()[0])! For more information, run with TORCH_LOGS="+dynamic".
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] - Not all values of RelaxedUnspecConstraint(L['inputs_embeds'].size()[0]) are valid because L['inputs_embeds'].size()[0] was inferred to be a constant (2048).
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] - Not all values of RelaxedUnspecConstraint(L['per_layer_inputs'].size()[0]) are valid because L['per_layer_inputs'].size()[0] was inferred to be a constant (2048).
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712] - Not all values of RelaxedUnspecConstraint(L['positions'].size()[0]) are valid because L['positions'].size()[0] was inferred to be a constant (2048).
�[1;36m(EngineCore_0 pid=34269)�[0;0m [vllm]ERROR 08-29 21:47:46 [core.py:712]
�[1;36m(EngineCore_0 pid=34269)�[0;0m Process EngineCore_0:
�[1;36m(EngineCore_0 pid=34269)�[0;0m Traceback (most recent call last):
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
�[1;36m(EngineCore_0 pid=34269)�[0;0m self.run()
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/multiprocessing/process.py", line 108, in run
�[1;36m(EngineCore_0 pid=34269)�[0;0m self._target(*self._args, **self._kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 716, in run_engine_core
�[1;36m(EngineCore_0 pid=34269)�[0;0m raise e
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 703, in run_engine_core
�[1;36m(EngineCore_0 pid=34269)�[0;0m engine_core = EngineCoreProc(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 504, in __init__
�[1;36m(EngineCore_0 pid=34269)�[0;0m super().__init__(vllm_config, executor_class, log_stats,
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 90, in __init__
�[1;36m(EngineCore_0 pid=34269)�[0;0m self._initialize_kv_caches(vllm_config)
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core.py", line 182, in _initialize_kv_caches
�[1;36m(EngineCore_0 pid=34269)�[0;0m self.model_executor.determine_available_memory())
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/executor/abstract.py", line 84, in determine_available_memory
�[1;36m(EngineCore_0 pid=34269)�[0;0m return self.collective_rpc("determine_available_memory")
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
�[1;36m(EngineCore_0 pid=34269)�[0;0m answer = run_method(self.driver_worker, method, args, kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/utils/__init__.py", line 3036, in run_method
�[1;36m(EngineCore_0 pid=34269)�[0;0m return func(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
�[1;36m(EngineCore_0 pid=34269)�[0;0m return func(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_worker.py", line 245, in determine_available_memory
�[1;36m(EngineCore_0 pid=34269)�[0;0m self.model_runner.profile_run()
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 2598, in profile_run
�[1;36m(EngineCore_0 pid=34269)�[0;0m = self._dummy_run(self.max_num_tokens, is_profile=True)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
�[1;36m(EngineCore_0 pid=34269)�[0;0m return func(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/worker/gpu_model_runner.py", line 2375, in _dummy_run
�[1;36m(EngineCore_0 pid=34269)�[0;0m outputs = self.model(
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
�[1;36m(EngineCore_0 pid=34269)�[0;0m return self._call_impl(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
�[1;36m(EngineCore_0 pid=34269)�[0;0m return forward_call(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/model_executor/models/gemma3n_mm.py", line 674, in forward
�[1;36m(EngineCore_0 pid=34269)�[0;0m hidden_states = self.language_model.model(
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/compilation/decorators.py", line 305, in __call__
�[1;36m(EngineCore_0 pid=34269)�[0;0m output = self.compiled_callable(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 655, in _fn
�[1;36m(EngineCore_0 pid=34269)�[0;0m return fn(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1432, in __call__
�[1;36m(EngineCore_0 pid=34269)�[0;0m return self._torchdynamo_orig_callable(
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 598, in __call__
�[1;36m(EngineCore_0 pid=34269)�[0;0m return _compile(
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1059, in _compile
�[1;36m(EngineCore_0 pid=34269)�[0;0m guarded_code = compile_inner(code, one_graph, hooks, transform)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
�[1;36m(EngineCore_0 pid=34269)�[0;0m return function(*args, **kwargs)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 761, in compile_inner
�[1;36m(EngineCore_0 pid=34269)�[0;0m return _compile_inner(code, one_graph, hooks, transform)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 906, in _compile_inner
�[1;36m(EngineCore_0 pid=34269)�[0;0m check_fn = CheckFunctionManager(
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/guards.py", line 2490, in __init__
�[1;36m(EngineCore_0 pid=34269)�[0;0m guard.create(builder)
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_guards.py", line 357, in create
�[1;36m(EngineCore_0 pid=34269)�[0;0m return self.create_fn(builder, self)
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/guards.py", line 1968, in SHAPE_ENV
�[1;36m(EngineCore_0 pid=34269)�[0;0m python_code_parts, verbose_code_parts = _get_code_parts(
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/_dynamo/guards.py", line 1951, in _get_code_parts
�[1;36m(EngineCore_0 pid=34269)�[0;0m return output_graph.shape_env.produce_guards_verbose(
�[1;36m(EngineCore_0 pid=34269)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(EngineCore_0 pid=34269)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/torch/fx/experimental/symbolic_shapes.py", line 5409, in produce_guards_verbose
�[1;36m(EngineCore_0 pid=34269)�[0;0m raise ConstraintViolationError(
�[1;36m(EngineCore_0 pid=34269)�[0;0m torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (L['inputs_embeds'].size()[0], L['per_layer_inputs'].size()[0], L['positions'].size()[0])! For more information, run with TORCH_LOGS="+dynamic".
�[1;36m(EngineCore_0 pid=34269)�[0;0m - Not all values of RelaxedUnspecConstraint(L['inputs_embeds'].size()[0]) are valid because L['inputs_embeds'].size()[0] was inferred to be a constant (2048).
�[1;36m(EngineCore_0 pid=34269)�[0;0m - Not all values of RelaxedUnspecConstraint(L['per_layer_inputs'].size()[0]) are valid because L['per_layer_inputs'].size()[0] was inferred to be a constant (2048).
�[1;36m(EngineCore_0 pid=34269)�[0;0m - Not all values of RelaxedUnspecConstraint(L['positions'].size()[0]) are valid because L['positions'].size()[0] was inferred to be a constant (2048).
�[1;36m(EngineCore_0 pid=34269)�[0;0m
�[1;36m(APIServer pid=33959)�[0;0m [vllm]DEBUG 08-29 21:47:47 [utils.py:777] Waiting for 1 local, 0 remote core engine proc(s) to start.
[rank0]:[W829 21:47:47.936640409 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
�[1;36m(APIServer pid=33959)�[0;0m Traceback (most recent call last):
�[1;36m(APIServer pid=33959)�[0;0m File "<frozen runpy>", line 198, in _run_module_as_main
�[1;36m(APIServer pid=33959)�[0;0m File "<frozen runpy>", line 88, in _run_code
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/entrypoints/openai/api_server.py", line 1996, in <module>
�[1;36m(APIServer pid=33959)�[0;0m uvloop.run(run_server(args))
�[1;36m(APIServer pid=33959)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run
�[1;36m(APIServer pid=33959)�[0;0m return runner.run(wrapper())
�[1;36m(APIServer pid=33959)�[0;0m ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(APIServer pid=33959)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/asyncio/runners.py", line 118, in run
�[1;36m(APIServer pid=33959)�[0;0m return self._loop.run_until_complete(task)
�[1;36m(APIServer pid=33959)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(APIServer pid=33959)�[0;0m File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
�[1;36m(APIServer pid=33959)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper
�[1;36m(APIServer pid=33959)�[0;0m return await main
�[1;36m(APIServer pid=33959)�[0;0m ^^^^^^^^^^
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/entrypoints/openai/api_server.py", line 1926, in run_server
�[1;36m(APIServer pid=33959)�[0;0m await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/entrypoints/openai/api_server.py", line 1946, in run_server_worker
�[1;36m(APIServer pid=33959)�[0;0m async with build_async_engine_client(
�[1;36m(APIServer pid=33959)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/contextlib.py", line 210, in __aenter__
�[1;36m(APIServer pid=33959)�[0;0m return await anext(self.gen)
�[1;36m(APIServer pid=33959)�[0;0m ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/entrypoints/openai/api_server.py", line 178, in build_async_engine_client
�[1;36m(APIServer pid=33959)�[0;0m async with build_async_engine_client_from_engine_args(
�[1;36m(APIServer pid=33959)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/contextlib.py", line 210, in __aenter__
�[1;36m(APIServer pid=33959)�[0;0m return await anext(self.gen)
�[1;36m(APIServer pid=33959)�[0;0m ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/entrypoints/openai/api_server.py", line 220, in build_async_engine_client_from_engine_args
�[1;36m(APIServer pid=33959)�[0;0m async_llm = AsyncLLM.from_vllm_config(
�[1;36m(APIServer pid=33959)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/utils/__init__.py", line 1581, in inner
�[1;36m(APIServer pid=33959)�[0;0m return fn(*args, **kwargs)
�[1;36m(APIServer pid=33959)�[0;0m ^^^^^^^^^^^^^^^^^^^
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/async_llm.py", line 198, in from_vllm_config
�[1;36m(APIServer pid=33959)�[0;0m return cls(
�[1;36m(APIServer pid=33959)�[0;0m ^^^^
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/async_llm.py", line 124, in __init__
�[1;36m(APIServer pid=33959)�[0;0m self.engine_core = EngineCoreClient.make_async_mp_client(
�[1;36m(APIServer pid=33959)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
�[1;36m(APIServer pid=33959)�[0;0m return AsyncMPClient(*client_args)
�[1;36m(APIServer pid=33959)�[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core_client.py", line 767, in __init__
�[1;36m(APIServer pid=33959)�[0;0m super().__init__(
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/core_client.py", line 446, in __init__
�[1;36m(APIServer pid=33959)�[0;0m with launch_core_engines(vllm_config, executor_class,
�[1;36m(APIServer pid=33959)�[0;0m File "/opt/conda/envs/vln/lib/python3.11/contextlib.py", line 144, in __exit__
�[1;36m(APIServer pid=33959)�[0;0m next(self.gen)
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/utils.py", line 733, in launch_core_engines
�[1;36m(APIServer pid=33959)�[0;0m wait_for_engine_startup(
�[1;36m(APIServer pid=33959)�[0;0m File "/home/ubuntu/yashpratap/exp/vllm/vllm/v1/engine/utils.py", line 786, in wait_for_engine_startup
�[1;36m(APIServer pid=33959)�[0;0m raise RuntimeError("Engine core initialization failed. "
�[1;36m(APIServer pid=33959)�[0;0m RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status