Can't run Qwen3-30B-A3B on H100, same config works fine on A100/L40s

Hi!
I've tried to benchmark different GPUs for throughput with Qwen3 30B A3B FP8. 
First I tried A100 and L40s - everything was fine, I got my numbers.

Then I tried H100 - and it failed on vLLM startup (` Error initializing vLLM engine`). I tried to restart it, create a new worker, etc., but it always ends up with the same error. I'm not sure what the issue is and how to fix it.

Here are the logs:
```
2025-08-21T15:03:48.762848707Z INFO 08-21 15:03:48 [__init__.py:235] Automatically detected platform cuda.
2025-08-21T15:03:50.200115872Z engine.py           :27   2025-08-21 15:03:50,199 Engine args: AsyncEngineArgs(model='Qwen/Qwen3-30B-A3B-Instruct-2507-FP8', served_model_name=None, tokenizer=None, hf_config_path=None, task='auto', skip_tokenizer_init=False, enable_prompt_embeds=False, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path='', download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='fp8', seed=0, max_model_len=8192, cuda_graph_sizes=[], distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, data_parallel_size=1, data_parallel_rank=None, data_parallel_start_rank=None, data_parallel_size_local=None, data_parallel_address=None, data_parallel_rpc_port=None, data_parallel_hybrid_lb=False, data_parallel_backend='mp', enable_expert_parallel=False, enable_eplb=False, num_redundant_experts=0, eplb_window_size=1000, eplb_step_interval=3000, eplb_log_balancedness=False, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, prefix_caching_hash_algo='builtin', disable_sliding_window=False, disable_cascade_attn=False, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=256, max_logprobs=20, logprobs_mode='raw_logprobs', disable_log_stats=False, revision=None, code_revision=None, rope_scaling={}, rope_theta=None, hf_token=None, hf_overrides={}, tokenizer_revision=None, quantization=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, limit_mm_per_prompt={}, interleave_mm_strings=False, media_io_kwargs={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, default_mm_loras=None, fully_sharded_loras=False, max_cpu_loras=None, lora_dtype='auto', lora_extra_vocab_size=256, num_scheduler_steps=1, multi_step_stream_outputs=True, ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config={}, ignore_patterns=None, preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, disable_chunked_mm_input=False, disable_hybrid_kv_cache_manager=False, guided_decoding_backend='outlines', guided_decoding_disable_fallback=False, guided_decoding_disable_any_whitespace=False, guided_decoding_disable_additional_properties=False, logits_processor_pattern=None, speculative_config=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config={}, override_pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":null,"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":null,"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":null,"local_cache_dir":null}, worker_cls='auto', worker_extension_cls='', kv_transfer_config=None, kv_events_config=None, generation_config='auto', enable_sleep_mode=False, override_generation_config={}, model_impl='auto', override_attention_dtype=None, calculate_kv_scales=False, additional_config={}, reasoning_parser='', use_tqdm_on_load=True, pt_load_map_location='cpu', enable_multimodal_encoder_data_parallel=False, async_scheduling=False, enable_prompt_adapter=False, disable_log_requests=False)
2025-08-21T15:03:55.869296691Z INFO 08-21 15:03:55 [config.py:1604] Using max model len 8192
2025-08-21T15:03:56.284128956Z INFO 08-21 15:03:56 [config.py:1733] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
2025-08-21T15:03:56.429006608Z INFO 08-21 15:03:56 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=2048.
2025-08-21T15:04:04.812408188Z INFO 08-21 15:04:04 [__init__.py:235] Automatically detected platform cuda.
2025-08-21T15:04:06.050142092Z engine.py           :27   2025-08-21 15:04:06,049 Engine args: AsyncEngineArgs(model='Qwen/Qwen3-30B-A3B-Instruct-2507-FP8', served_model_name=None, tokenizer=None, hf_config_path=None, task='auto', skip_tokenizer_init=False, enable_prompt_embeds=False, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path='', download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='fp8', seed=0, max_model_len=8192, cuda_graph_sizes=[], distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, data_parallel_size=1, data_parallel_rank=None, data_parallel_start_rank=None, data_parallel_size_local=None, data_parallel_address=None, data_parallel_rpc_port=None, data_parallel_hybrid_lb=False, data_parallel_backend='mp', enable_expert_parallel=False, enable_eplb=False, num_redundant_experts=0, eplb_window_size=1000, eplb_step_interval=3000, eplb_log_balancedness=False, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, prefix_caching_hash_algo='builtin', disable_sliding_window=False, disable_cascade_attn=False, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=256, max_logprobs=20, logprobs_mode='raw_logprobs', disable_log_stats=False, revision=None, code_revision=None, rope_scaling={}, rope_theta=None, hf_token=None, hf_overrides={}, tokenizer_revision=None, quantization=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, limit_mm_per_prompt={}, interleave_mm_strings=False, media_io_kwargs={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, default_mm_loras=None, fully_sharded_loras=False, max_cpu_loras=None, lora_dtype='auto', lora_extra_vocab_size=256, num_scheduler_steps=1, multi_step_stream_outputs=True, ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config={}, ignore_patterns=None, preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, disable_chunked_mm_input=False, disable_hybrid_kv_cache_manager=False, guided_decoding_backend='outlines', guided_decoding_disable_fallback=False, guided_decoding_disable_any_whitespace=False, guided_decoding_disable_additional_properties=False, logits_processor_pattern=None, speculative_config=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config={}, override_pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":null,"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":null,"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":null,"local_cache_dir":null}, worker_cls='auto', worker_extension_cls='', kv_transfer_config=None, kv_events_config=None, generation_config='auto', enable_sleep_mode=False, override_generation_config={}, model_impl='auto', override_attention_dtype=None, calculate_kv_scales=False, additional_config={}, reasoning_parser='', use_tqdm_on_load=True, pt_load_map_location='cpu', enable_multimodal_encoder_data_parallel=False, async_scheduling=False, enable_prompt_adapter=False, disable_log_requests=False)
2025-08-21T15:04:11.457497375Z INFO 08-21 15:04:11 [config.py:1604] Using max model len 8192
2025-08-21T15:04:11.513592686Z INFO 08-21 15:04:11 [config.py:1733] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
2025-08-21T15:04:11.610994571Z INFO 08-21 15:04:11 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=2048.
2025-08-21T15:04:12.535464541Z engine.py           :170  2025-08-21 15:04:12,534 Error initializing vLLM engine:
2025-08-21T15:04:12.535490158Z         An attempt has been made to start a new process before the
2025-08-21T15:04:12.535492124Z         current process has finished its bootstrapping phase.
2025-08-21T15:04:12.535494995Z         This probably means that you are not using fork to start your
2025-08-21T15:04:12.535496221Z         child processes and you have forgotten to use the proper idiom
2025-08-21T15:04:12.535497789Z         in the main module:
2025-08-21T15:04:12.535500407Z             if __name__ == '__main__':
2025-08-21T15:04:12.535502128Z                 freeze_support()
2025-08-21T15:04:12.535503123Z                 ...
2025-08-21T15:04:12.535505680Z         The "freeze_support()" line can be omitted if the program
2025-08-21T15:04:12.535506883Z         is not going to be frozen to produce an executable.
2025-08-21T15:04:12.538381491Z Traceback (most recent call last):
2025-08-21T15:04:12.538402698Z   File "<string>", line 1, in <module>
2025-08-21T15:04:12.538403987Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
2025-08-21T15:04:12.538405747Z     exitcode = _main(fd, parent_sentinel)
2025-08-21T15:04:12.538407086Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
2025-08-21T15:04:12.538408240Z     prepare(preparation_data)
2025-08-21T15:04:12.538409789Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
2025-08-21T15:04:12.538411223Z     _fixup_main_from_path(data['init_main_from_path'])
2025-08-21T15:04:12.538412513Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
2025-08-21T15:04:12.538414114Z     main_content = runpy.run_path(main_path,
2025-08-21T15:04:12.538415032Z   File "/usr/lib/python3.10/runpy.py", line 289, in run_path
2025-08-21T15:04:12.538416090Z     return _run_module_code(code, init_globals, run_name,
2025-08-21T15:04:12.538417078Z   File "/usr/lib/python3.10/runpy.py", line 96, in _run_module_code
2025-08-21T15:04:12.538418403Z     _run_code(code, mod_globals, init_globals,
2025-08-21T15:04:12.538419499Z   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
2025-08-21T15:04:12.538420498Z     exec(code, run_globals)
2025-08-21T15:04:12.538421571Z   File "/src/handler.py", line 6, in <module>
2025-08-21T15:04:12.538422496Z     vllm_engine = vLLMEngine()
2025-08-21T15:04:12.538423268Z   File "/src/engine.py", line 30, in __init__
2025-08-21T15:04:12.538424303Z     self.llm = self._initialize_llm() if engine is None else engine.llm
2025-08-21T15:04:12.538425251Z   File "/src/engine.py", line 171, in _initialize_llm
2025-08-21T15:04:12.538426036Z     raise e
2025-08-21T15:04:12.538427510Z   File "/src/engine.py", line 165, in _initialize_llm
2025-08-21T15:04:12.538428407Z     engine = AsyncLLMEngine.from_engine_args(self.engine_args)
2025-08-21T15:04:12.538429306Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 190, in from_engine_args
2025-08-21T15:04:12.538430941Z     return cls(
2025-08-21T15:04:12.538431743Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 117, in __init__
2025-08-21T15:04:12.538432846Z     self.engine_core = EngineCoreClient.make_async_mp_client(
2025-08-21T15:04:12.538433623Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client
2025-08-21T15:04:12.538434524Z     return AsyncMPClient(*client_args)
2025-08-21T15:04:12.538435311Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 677, in __init__
2025-08-21T15:04:12.538436237Z     super().__init__(
2025-08-21T15:04:12.538437417Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 408, in __init__
2025-08-21T15:04:12.538438380Z     with launch_core_engines(vllm_config, executor_class,
2025-08-21T15:04:12.538448595Z   File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
2025-08-21T15:04:12.538449617Z     return next(self.gen)
2025-08-21T15:04:12.538450466Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 680, in launch_core_engines
2025-08-21T15:04:12.538453547Z     local_engine_manager = CoreEngineProcManager(
2025-08-21T15:04:12.538454690Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 133, in __init__
2025-08-21T15:04:12.538455796Z     proc.start()
2025-08-21T15:04:12.538456678Z   File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start
2025-08-21T15:04:12.538457621Z     self._popen = self._Popen(self)
2025-08-21T15:04:12.538458594Z   File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
2025-08-21T15:04:12.538460962Z     return Popen(process_obj)
2025-08-21T15:04:12.538461720Z   File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
2025-08-21T15:04:12.538462628Z     super().__init__(process_obj)
2025-08-21T15:04:12.538463465Z   File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
2025-08-21T15:04:12.538464370Z     self._launch(process_obj)
2025-08-21T15:04:12.538465248Z   File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
2025-08-21T15:04:12.538466454Z     prep_data = spawn.get_preparation_data(process_obj._name)
2025-08-21T15:04:12.538467392Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
2025-08-21T15:04:12.538468298Z     _check_not_importing_main()
2025-08-21T15:04:12.538469048Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
2025-08-21T15:04:12.538470161Z     raise RuntimeError('''
2025-08-21T15:04:12.538471053Z RuntimeError:
2025-08-21T15:04:12.538472138Z         An attempt has been made to start a new process before the
2025-08-21T15:04:12.538473003Z         current process has finished its bootstrapping phase.
2025-08-21T15:04:12.538474696Z         This probably means that you are not using fork to start your
2025-08-21T15:04:12.538475518Z         child processes and you have forgotten to use the proper idiom
2025-08-21T15:04:12.538476419Z         in the main module:
2025-08-21T15:04:12.538477958Z             if __name__ == '__main__':
2025-08-21T15:04:12.538478808Z                 freeze_support()
2025-08-21T15:04:12.538479711Z                 ...
2025-08-21T15:04:12.538481175Z         The "freeze_support()" line can be omitted if the program
2025-08-21T15:04:12.538482022Z         is not going to be frozen to produce an executable.
2025-08-21T15:04:13.468176690Z engine.py           :170  2025-08-21 15:04:13,467 Error initializing vLLM engine: Engine core initialization failed. See root cause above. Failed core proc(s): {}
2025-08-21T15:04:13.469869348Z Traceback (most recent call last):
2025-08-21T15:04:13.469874242Z   File "/src/handler.py", line 6, in <module>
2025-08-21T15:04:13.469875445Z     vllm_engine = vLLMEngine()
2025-08-21T15:04:13.469877633Z   File "/src/engine.py", line 30, in __init__
2025-08-21T15:04:13.469879003Z     self.llm = self._initialize_llm() if engine is None else engine.llm
2025-08-21T15:04:13.469881501Z   File "/src/engine.py", line 171, in _initialize_llm
2025-08-21T15:04:13.469883456Z     raise e
2025-08-21T15:04:13.469885468Z   File "/src/engine.py", line 165, in _initialize_llm
2025-08-21T15:04:13.469886718Z     engine = AsyncLLMEngine.from_engine_args(self.engine_args)
2025-08-21T15:04:13.469888363Z   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 653, in from_engine_args
2025-08-21T15:04:13.469890531Z     return async_engine_cls.from_vllm_config(
2025-08-21T15:04:13.469891700Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 163, in from_vllm_config
2025-08-21T15:04:13.469893429Z     return cls(
2025-08-21T15:04:13.469894677Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 117, in __init__
2025-08-21T15:04:13.469904371Z     self.engine_core = EngineCoreClient.make_async_mp_client(
2025-08-21T15:04:13.469905665Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client
2025-08-21T15:04:13.469906807Z     return AsyncMPClient(*client_args)
2025-08-21T15:04:13.469908131Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 677, in __init__
2025-08-21T15:04:13.469909266Z     super().__init__(
2025-08-21T15:04:13.469911346Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 408, in __init__
2025-08-21T15:04:13.469912333Z     with launch_core_engines(vllm_config, executor_class,
2025-08-21T15:04:13.469913543Z   File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
2025-08-21T15:04:13.469914639Z     next(self.gen)
2025-08-21T15:04:13.469915795Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 697, in launch_core_engines
2025-08-21T15:04:13.469916789Z     wait_for_engine_startup(
2025-08-21T15:04:13.469917895Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 750, in wait_for_engine_startup
2025-08-21T15:04:13.469918891Z     raise RuntimeError("Engine core initialization failed. "
2025-08-21T15:04:13.469920299Z RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
2025-08-21T15:04:23.935732003Z INFO 08-21 15:04:23 [__init__.py:235] Automatically detected platform cuda.
2025-08-21T15:04:25.163807834Z engine.py           :27   2025-08-21 15:04:25,163 Engine args: AsyncEngineArgs(model='Qwen/Qwen3-30B-A3B-Instruct-2507-FP8', served_model_name=None, tokenizer=None, hf_config_path=None, task='auto', skip_tokenizer_init=False, enable_prompt_embeds=False, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path='', download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='fp8', seed=0, max_model_len=8192, cuda_graph_sizes=[], distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, data_parallel_size=1, data_parallel_rank=None, data_parallel_start_rank=None, data_parallel_size_local=None, data_parallel_address=None, data_parallel_rpc_port=None, data_parallel_hybrid_lb=False, data_parallel_backend='mp', enable_expert_parallel=False, enable_eplb=False, num_redundant_experts=0, eplb_window_size=1000, eplb_step_interval=3000, eplb_log_balancedness=False, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, prefix_caching_hash_algo='builtin', disable_sliding_window=False, disable_cascade_attn=False, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=256, max_logprobs=20, logprobs_mode='raw_logprobs', disable_log_stats=False, revision=None, code_revision=None, rope_scaling={}, rope_theta=None, hf_token=None, hf_overrides={}, tokenizer_revision=None, quantization=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, limit_mm_per_prompt={}, interleave_mm_strings=False, media_io_kwargs={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, default_mm_loras=None, fully_sharded_loras=False, max_cpu_loras=None, lora_dtype='auto', lora_extra_vocab_size=256, num_scheduler_steps=1, multi_step_stream_outputs=True, ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config={}, ignore_patterns=None, preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, disable_chunked_mm_input=False, disable_hybrid_kv_cache_manager=False, guided_decoding_backend='outlines', guided_decoding_disable_fallback=False, guided_decoding_disable_any_whitespace=False, guided_decoding_disable_additional_properties=False, logits_processor_pattern=None, speculative_config=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config={}, override_pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":null,"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":null,"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":null,"local_cache_dir":null}, worker_cls='auto', worker_extension_cls='', kv_transfer_config=None, kv_events_config=None, generation_config='auto', enable_sleep_mode=False, override_generation_config={}, model_impl='auto', override_attention_dtype=None, calculate_kv_scales=False, additional_config={}, reasoning_parser='', use_tqdm_on_load=True, pt_load_map_location='cpu', enable_multimodal_encoder_data_parallel=False, async_scheduling=False, enable_prompt_adapter=False, disable_log_requests=False)
2025-08-21T15:04:30.705151790Z INFO 08-21 15:04:30 [config.py:1604] Using max model len 8192
2025-08-21T15:04:31.060571466Z INFO 08-21 15:04:31 [config.py:1733] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
2025-08-21T15:04:31.160376965Z INFO 08-21 15:04:31 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=2048.
2025-08-21T15:04:36.657001974Z INFO 08-21 15:04:36 [__init__.py:235] Automatically detected platform cuda.
2025-08-21T15:04:37.879273101Z engine.py           :27   2025-08-21 15:04:37,878 Engine args: AsyncEngineArgs(model='Qwen/Qwen3-30B-A3B-Instruct-2507-FP8', served_model_name=None, tokenizer=None, hf_config_path=None, task='auto', skip_tokenizer_init=False, enable_prompt_embeds=False, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path='', download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='fp8', seed=0, max_model_len=8192, cuda_graph_sizes=[], distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, data_parallel_size=1, data_parallel_rank=None, data_parallel_start_rank=None, data_parallel_size_local=None, data_parallel_address=None, data_parallel_rpc_port=None, data_parallel_hybrid_lb=False, data_parallel_backend='mp', enable_expert_parallel=False, enable_eplb=False, num_redundant_experts=0, eplb_window_size=1000, eplb_step_interval=3000, eplb_log_balancedness=False, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, prefix_caching_hash_algo='builtin', disable_sliding_window=False, disable_cascade_attn=False, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=256, max_logprobs=20, logprobs_mode='raw_logprobs', disable_log_stats=False, revision=None, code_revision=None, rope_scaling={}, rope_theta=None, hf_token=None, hf_overrides={}, tokenizer_revision=None, quantization=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, limit_mm_per_prompt={}, interleave_mm_strings=False, media_io_kwargs={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, default_mm_loras=None, fully_sharded_loras=False, max_cpu_loras=None, lora_dtype='auto', lora_extra_vocab_size=256, num_scheduler_steps=1, multi_step_stream_outputs=True, ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config={}, ignore_patterns=None, preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, disable_chunked_mm_input=False, disable_hybrid_kv_cache_manager=False, guided_decoding_backend='outlines', guided_decoding_disable_fallback=False, guided_decoding_disable_any_whitespace=False, guided_decoding_disable_additional_properties=False, logits_processor_pattern=None, speculative_config=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config={}, override_pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":null,"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":null,"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":null,"local_cache_dir":null}, worker_cls='auto', worker_extension_cls='', kv_transfer_config=None, kv_events_config=None, generation_config='auto', enable_sleep_mode=False, override_generation_config={}, model_impl='auto', override_attention_dtype=None, calculate_kv_scales=False, additional_config={}, reasoning_parser='', use_tqdm_on_load=True, pt_load_map_location='cpu', enable_multimodal_encoder_data_parallel=False, async_scheduling=False, enable_prompt_adapter=False, disable_log_requests=False)
2025-08-21T15:04:43.403426412Z INFO 08-21 15:04:43 [config.py:1604] Using max model len 8192
2025-08-21T15:04:43.461700225Z INFO 08-21 15:04:43 [config.py:1733] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
2025-08-21T15:04:43.561869007Z INFO 08-21 15:04:43 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=2048.
2025-08-21T15:04:44.479653524Z engine.py           :170  2025-08-21 15:04:44,479 Error initializing vLLM engine:
2025-08-21T15:04:44.479688587Z         An attempt has been made to start a new process before the
2025-08-21T15:04:44.479691310Z         current process has finished its bootstrapping phase.
2025-08-21T15:04:44.479694768Z         This probably means that you are not using fork to start your
2025-08-21T15:04:44.479695945Z         child processes and you have forgotten to use the proper idiom
2025-08-21T15:04:44.479697147Z         in the main module:
2025-08-21T15:04:44.479700014Z             if __name__ == '__main__':
2025-08-21T15:04:44.479701751Z                 freeze_support()
2025-08-21T15:04:44.479702805Z                 ...
2025-08-21T15:04:44.479705433Z         The "freeze_support()" line can be omitted if the program
2025-08-21T15:04:44.479706430Z         is not going to be frozen to produce an executable.
2025-08-21T15:04:44.480419736Z Traceback (most recent call last):
2025-08-21T15:04:44.480437074Z   File "<string>", line 1, in <module>
2025-08-21T15:04:44.480438201Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
2025-08-21T15:04:44.480439985Z     exitcode = _main(fd, parent_sentinel)
2025-08-21T15:04:44.480440983Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
2025-08-21T15:04:44.480442508Z     prepare(preparation_data)
2025-08-21T15:04:44.480444652Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
2025-08-21T15:04:44.480445702Z     _fixup_main_from_path(data['init_main_from_path'])
2025-08-21T15:04:44.480447055Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
2025-08-21T15:04:44.480448771Z     main_content = runpy.run_path(main_path,
2025-08-21T15:04:44.480449664Z   File "/usr/lib/python3.10/runpy.py", line 289, in run_path
2025-08-21T15:04:44.480450624Z     return _run_module_code(code, init_globals, run_name,
2025-08-21T15:04:44.480451541Z   File "/usr/lib/python3.10/runpy.py", line 96, in _run_module_code
2025-08-21T15:04:44.480452460Z     _run_code(code, mod_globals, init_globals,
2025-08-21T15:04:44.480453325Z   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
2025-08-21T15:04:44.480454293Z     exec(code, run_globals)
2025-08-21T15:04:44.480455115Z   File "/src/handler.py", line 6, in <module>
2025-08-21T15:04:44.480455982Z     vllm_engine = vLLMEngine()
2025-08-21T15:04:44.480457079Z   File "/src/engine.py", line 30, in __init__
2025-08-21T15:04:44.480466967Z     self.llm = self._initialize_llm() if engine is None else engine.llm
2025-08-21T15:04:44.480467955Z   File "/src/engine.py", line 171, in _initialize_llm
2025-08-21T15:04:44.480468879Z     raise e
2025-08-21T15:04:44.480470275Z   File "/src/engine.py", line 165, in _initialize_llm
2025-08-21T15:04:44.480471131Z     engine = AsyncLLMEngine.from_engine_args(self.engine_args)
2025-08-21T15:04:44.480472063Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 190, in from_engine_args
2025-08-21T15:04:44.480474147Z     return cls(
2025-08-21T15:04:44.480474934Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 117, in __init__
2025-08-21T15:04:44.480475816Z     self.engine_core = EngineCoreClient.make_async_mp_client(
2025-08-21T15:04:44.480476705Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client
2025-08-21T15:04:44.480477609Z     return AsyncMPClient(*client_args)
2025-08-21T15:04:44.480478498Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 677, in __init__
2025-08-21T15:04:44.480479603Z     super().__init__(
2025-08-21T15:04:44.480481189Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 408, in __init__
2025-08-21T15:04:44.480482101Z     with launch_core_engines(vllm_config, executor_class,
2025-08-21T15:04:44.480483680Z   File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
2025-08-21T15:04:44.480484591Z     return next(self.gen)
2025-08-21T15:04:44.480485490Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 680, in launch_core_engines
2025-08-21T15:04:44.480486266Z     local_engine_manager = CoreEngineProcManager(
2025-08-21T15:04:44.480487254Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 133, in __init__
2025-08-21T15:04:44.480488206Z     proc.start()
2025-08-21T15:04:44.480488987Z   File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start
2025-08-21T15:04:44.480489885Z     self._popen = self._Popen(self)
2025-08-21T15:04:44.480494339Z   File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
2025-08-21T15:04:44.480495395Z     return Popen(process_obj)
2025-08-21T15:04:44.480496299Z   File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
2025-08-21T15:04:44.480497344Z     super().__init__(process_obj)
2025-08-21T15:04:44.480498231Z   File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
2025-08-21T15:04:44.480499157Z     self._launch(process_obj)
2025-08-21T15:04:44.480499921Z   File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
2025-08-21T15:04:44.480501082Z     prep_data = spawn.get_preparation_data(process_obj._name)
2025-08-21T15:04:44.480501969Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
2025-08-21T15:04:44.480502978Z     _check_not_importing_main()
2025-08-21T15:04:44.480503920Z   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
2025-08-21T15:04:44.480504928Z     raise RuntimeError('''
2025-08-21T15:04:44.480505670Z RuntimeError:
2025-08-21T15:04:44.480506528Z         An attempt has been made to start a new process before the
2025-08-21T15:04:44.480507439Z         current process has finished its bootstrapping phase.
2025-08-21T15:04:44.480509126Z         This probably means that you are not using fork to start your
2025-08-21T15:04:44.480509897Z         child processes and you have forgotten to use the proper idiom
2025-08-21T15:04:44.480510685Z         in the main module:
2025-08-21T15:04:44.480512254Z             if __name__ == '__main__':
2025-08-21T15:04:44.480513015Z                 freeze_support()
2025-08-21T15:04:44.480513790Z                 ...
2025-08-21T15:04:44.480515237Z         The "freeze_support()" line can be omitted if the program
2025-08-21T15:04:44.480516013Z         is not going to be frozen to produce an executable.
2025-08-21T15:04:45.449484424Z engine.py           :170  2025-08-21 15:04:45,448 Error initializing vLLM engine: Engine core initialization failed. See root cause above. Failed core proc(s): {}
2025-08-21T15:04:45.450628202Z Traceback (most recent call last):
2025-08-21T15:04:45.450647248Z   File "/src/handler.py", line 6, in <module>
2025-08-21T15:04:45.450648646Z     vllm_engine = vLLMEngine()
2025-08-21T15:04:45.450650500Z   File "/src/engine.py", line 30, in __init__
2025-08-21T15:04:45.450651605Z     self.llm = self._initialize_llm() if engine is None else engine.llm
2025-08-21T15:04:45.450653655Z   File "/src/engine.py", line 171, in _initialize_llm
2025-08-21T15:04:45.450655700Z     raise e
2025-08-21T15:04:45.450656877Z   File "/src/engine.py", line 165, in _initialize_llm
2025-08-21T15:04:45.450657964Z     engine = AsyncLLMEngine.from_engine_args(self.engine_args)
2025-08-21T15:04:45.450659069Z   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 653, in from_engine_args
2025-08-21T15:04:45.450661249Z     return async_engine_cls.from_vllm_config(
2025-08-21T15:04:45.450662384Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 163, in from_vllm_config
2025-08-21T15:04:45.450663539Z     return cls(
2025-08-21T15:04:45.450664524Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 117, in __init__
2025-08-21T15:04:45.450665716Z     self.engine_core = EngineCoreClient.make_async_mp_client(
2025-08-21T15:04:45.450666776Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client
2025-08-21T15:04:45.450667820Z     return AsyncMPClient(*client_args)
2025-08-21T15:04:45.450668733Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 677, in __init__
2025-08-21T15:04:45.450669728Z     super().__init__(
2025-08-21T15:04:45.450671203Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 408, in __init__
2025-08-21T15:04:45.450672164Z     with launch_core_engines(vllm_config, executor_class,
2025-08-21T15:04:45.450673139Z   File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
2025-08-21T15:04:45.450674042Z     next(self.gen)
2025-08-21T15:04:45.450675046Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 697, in launch_core_engines
2025-08-21T15:04:45.450675961Z     wait_for_engine_startup(
2025-08-21T15:04:45.450676846Z   File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 750, in wait_for_engine_startup
2025-08-21T15:04:45.450677755Z     raise RuntimeError("Engine core initialization failed. "
2025-08-21T15:04:45.450678804Z RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't run Qwen3-30B-A3B on H100, same config works fine on A100/L40s #213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can't run Qwen3-30B-A3B on H100, same config works fine on A100/L40s #213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions