2025-08-21T15:03:48.762848707Z INFO 08-21 15:03:48 [__init__.py:235] Automatically detected platform cuda.
2025-08-21T15:03:50.200115872Z engine.py :27 2025-08-21 15:03:50,199 Engine args: AsyncEngineArgs(model='Qwen/Qwen3-30B-A3B-Instruct-2507-FP8', served_model_name=None, tokenizer=None, hf_config_path=None, task='auto', skip_tokenizer_init=False, enable_prompt_embeds=False, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path='', download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='fp8', seed=0, max_model_len=8192, cuda_graph_sizes=[], distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, data_parallel_size=1, data_parallel_rank=None, data_parallel_start_rank=None, data_parallel_size_local=None, data_parallel_address=None, data_parallel_rpc_port=None, data_parallel_hybrid_lb=False, data_parallel_backend='mp', enable_expert_parallel=False, enable_eplb=False, num_redundant_experts=0, eplb_window_size=1000, eplb_step_interval=3000, eplb_log_balancedness=False, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, prefix_caching_hash_algo='builtin', disable_sliding_window=False, disable_cascade_attn=False, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=256, max_logprobs=20, logprobs_mode='raw_logprobs', disable_log_stats=False, revision=None, code_revision=None, rope_scaling={}, rope_theta=None, hf_token=None, hf_overrides={}, tokenizer_revision=None, quantization=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, limit_mm_per_prompt={}, interleave_mm_strings=False, media_io_kwargs={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, default_mm_loras=None, fully_sharded_loras=False, max_cpu_loras=None, lora_dtype='auto', lora_extra_vocab_size=256, num_scheduler_steps=1, multi_step_stream_outputs=True, ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config={}, ignore_patterns=None, preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, disable_chunked_mm_input=False, disable_hybrid_kv_cache_manager=False, guided_decoding_backend='outlines', guided_decoding_disable_fallback=False, guided_decoding_disable_any_whitespace=False, guided_decoding_disable_additional_properties=False, logits_processor_pattern=None, speculative_config=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config={}, override_pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":null,"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":null,"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":null,"local_cache_dir":null}, worker_cls='auto', worker_extension_cls='', kv_transfer_config=None, kv_events_config=None, generation_config='auto', enable_sleep_mode=False, override_generation_config={}, model_impl='auto', override_attention_dtype=None, calculate_kv_scales=False, additional_config={}, reasoning_parser='', use_tqdm_on_load=True, pt_load_map_location='cpu', enable_multimodal_encoder_data_parallel=False, async_scheduling=False, enable_prompt_adapter=False, disable_log_requests=False)
2025-08-21T15:03:55.869296691Z INFO 08-21 15:03:55 [config.py:1604] Using max model len 8192
2025-08-21T15:03:56.284128956Z INFO 08-21 15:03:56 [config.py:1733] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
2025-08-21T15:03:56.429006608Z INFO 08-21 15:03:56 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=2048.
2025-08-21T15:04:04.812408188Z INFO 08-21 15:04:04 [__init__.py:235] Automatically detected platform cuda.
2025-08-21T15:04:06.050142092Z engine.py :27 2025-08-21 15:04:06,049 Engine args: AsyncEngineArgs(model='Qwen/Qwen3-30B-A3B-Instruct-2507-FP8', served_model_name=None, tokenizer=None, hf_config_path=None, task='auto', skip_tokenizer_init=False, enable_prompt_embeds=False, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path='', download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='fp8', seed=0, max_model_len=8192, cuda_graph_sizes=[], distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, data_parallel_size=1, data_parallel_rank=None, data_parallel_start_rank=None, data_parallel_size_local=None, data_parallel_address=None, data_parallel_rpc_port=None, data_parallel_hybrid_lb=False, data_parallel_backend='mp', enable_expert_parallel=False, enable_eplb=False, num_redundant_experts=0, eplb_window_size=1000, eplb_step_interval=3000, eplb_log_balancedness=False, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, prefix_caching_hash_algo='builtin', disable_sliding_window=False, disable_cascade_attn=False, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=256, max_logprobs=20, logprobs_mode='raw_logprobs', disable_log_stats=False, revision=None, code_revision=None, rope_scaling={}, rope_theta=None, hf_token=None, hf_overrides={}, tokenizer_revision=None, quantization=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, limit_mm_per_prompt={}, interleave_mm_strings=False, media_io_kwargs={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, default_mm_loras=None, fully_sharded_loras=False, max_cpu_loras=None, lora_dtype='auto', lora_extra_vocab_size=256, num_scheduler_steps=1, multi_step_stream_outputs=True, ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config={}, ignore_patterns=None, preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, disable_chunked_mm_input=False, disable_hybrid_kv_cache_manager=False, guided_decoding_backend='outlines', guided_decoding_disable_fallback=False, guided_decoding_disable_any_whitespace=False, guided_decoding_disable_additional_properties=False, logits_processor_pattern=None, speculative_config=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config={}, override_pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":null,"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":null,"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":null,"local_cache_dir":null}, worker_cls='auto', worker_extension_cls='', kv_transfer_config=None, kv_events_config=None, generation_config='auto', enable_sleep_mode=False, override_generation_config={}, model_impl='auto', override_attention_dtype=None, calculate_kv_scales=False, additional_config={}, reasoning_parser='', use_tqdm_on_load=True, pt_load_map_location='cpu', enable_multimodal_encoder_data_parallel=False, async_scheduling=False, enable_prompt_adapter=False, disable_log_requests=False)
2025-08-21T15:04:11.457497375Z INFO 08-21 15:04:11 [config.py:1604] Using max model len 8192
2025-08-21T15:04:11.513592686Z INFO 08-21 15:04:11 [config.py:1733] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
2025-08-21T15:04:11.610994571Z INFO 08-21 15:04:11 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=2048.
2025-08-21T15:04:12.535464541Z engine.py :170 2025-08-21 15:04:12,534 Error initializing vLLM engine:
2025-08-21T15:04:12.535490158Z An attempt has been made to start a new process before the
2025-08-21T15:04:12.535492124Z current process has finished its bootstrapping phase.
2025-08-21T15:04:12.535494995Z This probably means that you are not using fork to start your
2025-08-21T15:04:12.535496221Z child processes and you have forgotten to use the proper idiom
2025-08-21T15:04:12.535497789Z in the main module:
2025-08-21T15:04:12.535500407Z if __name__ == '__main__':
2025-08-21T15:04:12.535502128Z freeze_support()
2025-08-21T15:04:12.535503123Z ...
2025-08-21T15:04:12.535505680Z The "freeze_support()" line can be omitted if the program
2025-08-21T15:04:12.535506883Z is not going to be frozen to produce an executable.
2025-08-21T15:04:12.538381491Z Traceback (most recent call last):
2025-08-21T15:04:12.538402698Z File "<string>", line 1, in <module>
2025-08-21T15:04:12.538403987Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
2025-08-21T15:04:12.538405747Z exitcode = _main(fd, parent_sentinel)
2025-08-21T15:04:12.538407086Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
2025-08-21T15:04:12.538408240Z prepare(preparation_data)
2025-08-21T15:04:12.538409789Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
2025-08-21T15:04:12.538411223Z _fixup_main_from_path(data['init_main_from_path'])
2025-08-21T15:04:12.538412513Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
2025-08-21T15:04:12.538414114Z main_content = runpy.run_path(main_path,
2025-08-21T15:04:12.538415032Z File "/usr/lib/python3.10/runpy.py", line 289, in run_path
2025-08-21T15:04:12.538416090Z return _run_module_code(code, init_globals, run_name,
2025-08-21T15:04:12.538417078Z File "/usr/lib/python3.10/runpy.py", line 96, in _run_module_code
2025-08-21T15:04:12.538418403Z _run_code(code, mod_globals, init_globals,
2025-08-21T15:04:12.538419499Z File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
2025-08-21T15:04:12.538420498Z exec(code, run_globals)
2025-08-21T15:04:12.538421571Z File "/src/handler.py", line 6, in <module>
2025-08-21T15:04:12.538422496Z vllm_engine = vLLMEngine()
2025-08-21T15:04:12.538423268Z File "/src/engine.py", line 30, in __init__
2025-08-21T15:04:12.538424303Z self.llm = self._initialize_llm() if engine is None else engine.llm
2025-08-21T15:04:12.538425251Z File "/src/engine.py", line 171, in _initialize_llm
2025-08-21T15:04:12.538426036Z raise e
2025-08-21T15:04:12.538427510Z File "/src/engine.py", line 165, in _initialize_llm
2025-08-21T15:04:12.538428407Z engine = AsyncLLMEngine.from_engine_args(self.engine_args)
2025-08-21T15:04:12.538429306Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 190, in from_engine_args
2025-08-21T15:04:12.538430941Z return cls(
2025-08-21T15:04:12.538431743Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 117, in __init__
2025-08-21T15:04:12.538432846Z self.engine_core = EngineCoreClient.make_async_mp_client(
2025-08-21T15:04:12.538433623Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client
2025-08-21T15:04:12.538434524Z return AsyncMPClient(*client_args)
2025-08-21T15:04:12.538435311Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 677, in __init__
2025-08-21T15:04:12.538436237Z super().__init__(
2025-08-21T15:04:12.538437417Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 408, in __init__
2025-08-21T15:04:12.538438380Z with launch_core_engines(vllm_config, executor_class,
2025-08-21T15:04:12.538448595Z File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
2025-08-21T15:04:12.538449617Z return next(self.gen)
2025-08-21T15:04:12.538450466Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 680, in launch_core_engines
2025-08-21T15:04:12.538453547Z local_engine_manager = CoreEngineProcManager(
2025-08-21T15:04:12.538454690Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 133, in __init__
2025-08-21T15:04:12.538455796Z proc.start()
2025-08-21T15:04:12.538456678Z File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start
2025-08-21T15:04:12.538457621Z self._popen = self._Popen(self)
2025-08-21T15:04:12.538458594Z File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
2025-08-21T15:04:12.538460962Z return Popen(process_obj)
2025-08-21T15:04:12.538461720Z File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
2025-08-21T15:04:12.538462628Z super().__init__(process_obj)
2025-08-21T15:04:12.538463465Z File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
2025-08-21T15:04:12.538464370Z self._launch(process_obj)
2025-08-21T15:04:12.538465248Z File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
2025-08-21T15:04:12.538466454Z prep_data = spawn.get_preparation_data(process_obj._name)
2025-08-21T15:04:12.538467392Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
2025-08-21T15:04:12.538468298Z _check_not_importing_main()
2025-08-21T15:04:12.538469048Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
2025-08-21T15:04:12.538470161Z raise RuntimeError('''
2025-08-21T15:04:12.538471053Z RuntimeError:
2025-08-21T15:04:12.538472138Z An attempt has been made to start a new process before the
2025-08-21T15:04:12.538473003Z current process has finished its bootstrapping phase.
2025-08-21T15:04:12.538474696Z This probably means that you are not using fork to start your
2025-08-21T15:04:12.538475518Z child processes and you have forgotten to use the proper idiom
2025-08-21T15:04:12.538476419Z in the main module:
2025-08-21T15:04:12.538477958Z if __name__ == '__main__':
2025-08-21T15:04:12.538478808Z freeze_support()
2025-08-21T15:04:12.538479711Z ...
2025-08-21T15:04:12.538481175Z The "freeze_support()" line can be omitted if the program
2025-08-21T15:04:12.538482022Z is not going to be frozen to produce an executable.
2025-08-21T15:04:13.468176690Z engine.py :170 2025-08-21 15:04:13,467 Error initializing vLLM engine: Engine core initialization failed. See root cause above. Failed core proc(s): {}
2025-08-21T15:04:13.469869348Z Traceback (most recent call last):
2025-08-21T15:04:13.469874242Z File "/src/handler.py", line 6, in <module>
2025-08-21T15:04:13.469875445Z vllm_engine = vLLMEngine()
2025-08-21T15:04:13.469877633Z File "/src/engine.py", line 30, in __init__
2025-08-21T15:04:13.469879003Z self.llm = self._initialize_llm() if engine is None else engine.llm
2025-08-21T15:04:13.469881501Z File "/src/engine.py", line 171, in _initialize_llm
2025-08-21T15:04:13.469883456Z raise e
2025-08-21T15:04:13.469885468Z File "/src/engine.py", line 165, in _initialize_llm
2025-08-21T15:04:13.469886718Z engine = AsyncLLMEngine.from_engine_args(self.engine_args)
2025-08-21T15:04:13.469888363Z File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 653, in from_engine_args
2025-08-21T15:04:13.469890531Z return async_engine_cls.from_vllm_config(
2025-08-21T15:04:13.469891700Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 163, in from_vllm_config
2025-08-21T15:04:13.469893429Z return cls(
2025-08-21T15:04:13.469894677Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 117, in __init__
2025-08-21T15:04:13.469904371Z self.engine_core = EngineCoreClient.make_async_mp_client(
2025-08-21T15:04:13.469905665Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client
2025-08-21T15:04:13.469906807Z return AsyncMPClient(*client_args)
2025-08-21T15:04:13.469908131Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 677, in __init__
2025-08-21T15:04:13.469909266Z super().__init__(
2025-08-21T15:04:13.469911346Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 408, in __init__
2025-08-21T15:04:13.469912333Z with launch_core_engines(vllm_config, executor_class,
2025-08-21T15:04:13.469913543Z File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
2025-08-21T15:04:13.469914639Z next(self.gen)
2025-08-21T15:04:13.469915795Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 697, in launch_core_engines
2025-08-21T15:04:13.469916789Z wait_for_engine_startup(
2025-08-21T15:04:13.469917895Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 750, in wait_for_engine_startup
2025-08-21T15:04:13.469918891Z raise RuntimeError("Engine core initialization failed. "
2025-08-21T15:04:13.469920299Z RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
2025-08-21T15:04:23.935732003Z INFO 08-21 15:04:23 [__init__.py:235] Automatically detected platform cuda.
2025-08-21T15:04:25.163807834Z engine.py :27 2025-08-21 15:04:25,163 Engine args: AsyncEngineArgs(model='Qwen/Qwen3-30B-A3B-Instruct-2507-FP8', served_model_name=None, tokenizer=None, hf_config_path=None, task='auto', skip_tokenizer_init=False, enable_prompt_embeds=False, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path='', download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='fp8', seed=0, max_model_len=8192, cuda_graph_sizes=[], distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, data_parallel_size=1, data_parallel_rank=None, data_parallel_start_rank=None, data_parallel_size_local=None, data_parallel_address=None, data_parallel_rpc_port=None, data_parallel_hybrid_lb=False, data_parallel_backend='mp', enable_expert_parallel=False, enable_eplb=False, num_redundant_experts=0, eplb_window_size=1000, eplb_step_interval=3000, eplb_log_balancedness=False, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, prefix_caching_hash_algo='builtin', disable_sliding_window=False, disable_cascade_attn=False, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=256, max_logprobs=20, logprobs_mode='raw_logprobs', disable_log_stats=False, revision=None, code_revision=None, rope_scaling={}, rope_theta=None, hf_token=None, hf_overrides={}, tokenizer_revision=None, quantization=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, limit_mm_per_prompt={}, interleave_mm_strings=False, media_io_kwargs={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, default_mm_loras=None, fully_sharded_loras=False, max_cpu_loras=None, lora_dtype='auto', lora_extra_vocab_size=256, num_scheduler_steps=1, multi_step_stream_outputs=True, ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config={}, ignore_patterns=None, preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, disable_chunked_mm_input=False, disable_hybrid_kv_cache_manager=False, guided_decoding_backend='outlines', guided_decoding_disable_fallback=False, guided_decoding_disable_any_whitespace=False, guided_decoding_disable_additional_properties=False, logits_processor_pattern=None, speculative_config=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config={}, override_pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":null,"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":null,"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":null,"local_cache_dir":null}, worker_cls='auto', worker_extension_cls='', kv_transfer_config=None, kv_events_config=None, generation_config='auto', enable_sleep_mode=False, override_generation_config={}, model_impl='auto', override_attention_dtype=None, calculate_kv_scales=False, additional_config={}, reasoning_parser='', use_tqdm_on_load=True, pt_load_map_location='cpu', enable_multimodal_encoder_data_parallel=False, async_scheduling=False, enable_prompt_adapter=False, disable_log_requests=False)
2025-08-21T15:04:30.705151790Z INFO 08-21 15:04:30 [config.py:1604] Using max model len 8192
2025-08-21T15:04:31.060571466Z INFO 08-21 15:04:31 [config.py:1733] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
2025-08-21T15:04:31.160376965Z INFO 08-21 15:04:31 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=2048.
2025-08-21T15:04:36.657001974Z INFO 08-21 15:04:36 [__init__.py:235] Automatically detected platform cuda.
2025-08-21T15:04:37.879273101Z engine.py :27 2025-08-21 15:04:37,878 Engine args: AsyncEngineArgs(model='Qwen/Qwen3-30B-A3B-Instruct-2507-FP8', served_model_name=None, tokenizer=None, hf_config_path=None, task='auto', skip_tokenizer_init=False, enable_prompt_embeds=False, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path='', download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='fp8', seed=0, max_model_len=8192, cuda_graph_sizes=[], distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, data_parallel_size=1, data_parallel_rank=None, data_parallel_start_rank=None, data_parallel_size_local=None, data_parallel_address=None, data_parallel_rpc_port=None, data_parallel_hybrid_lb=False, data_parallel_backend='mp', enable_expert_parallel=False, enable_eplb=False, num_redundant_experts=0, eplb_window_size=1000, eplb_step_interval=3000, eplb_log_balancedness=False, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, prefix_caching_hash_algo='builtin', disable_sliding_window=False, disable_cascade_attn=False, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=256, max_logprobs=20, logprobs_mode='raw_logprobs', disable_log_stats=False, revision=None, code_revision=None, rope_scaling={}, rope_theta=None, hf_token=None, hf_overrides={}, tokenizer_revision=None, quantization=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, limit_mm_per_prompt={}, interleave_mm_strings=False, media_io_kwargs={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, default_mm_loras=None, fully_sharded_loras=False, max_cpu_loras=None, lora_dtype='auto', lora_extra_vocab_size=256, num_scheduler_steps=1, multi_step_stream_outputs=True, ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config={}, ignore_patterns=None, preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, disable_chunked_mm_input=False, disable_hybrid_kv_cache_manager=False, guided_decoding_backend='outlines', guided_decoding_disable_fallback=False, guided_decoding_disable_any_whitespace=False, guided_decoding_disable_additional_properties=False, logits_processor_pattern=None, speculative_config=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config={}, override_pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":null,"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":null,"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":null,"local_cache_dir":null}, worker_cls='auto', worker_extension_cls='', kv_transfer_config=None, kv_events_config=None, generation_config='auto', enable_sleep_mode=False, override_generation_config={}, model_impl='auto', override_attention_dtype=None, calculate_kv_scales=False, additional_config={}, reasoning_parser='', use_tqdm_on_load=True, pt_load_map_location='cpu', enable_multimodal_encoder_data_parallel=False, async_scheduling=False, enable_prompt_adapter=False, disable_log_requests=False)
2025-08-21T15:04:43.403426412Z INFO 08-21 15:04:43 [config.py:1604] Using max model len 8192
2025-08-21T15:04:43.461700225Z INFO 08-21 15:04:43 [config.py:1733] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
2025-08-21T15:04:43.561869007Z INFO 08-21 15:04:43 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=2048.
2025-08-21T15:04:44.479653524Z engine.py :170 2025-08-21 15:04:44,479 Error initializing vLLM engine:
2025-08-21T15:04:44.479688587Z An attempt has been made to start a new process before the
2025-08-21T15:04:44.479691310Z current process has finished its bootstrapping phase.
2025-08-21T15:04:44.479694768Z This probably means that you are not using fork to start your
2025-08-21T15:04:44.479695945Z child processes and you have forgotten to use the proper idiom
2025-08-21T15:04:44.479697147Z in the main module:
2025-08-21T15:04:44.479700014Z if __name__ == '__main__':
2025-08-21T15:04:44.479701751Z freeze_support()
2025-08-21T15:04:44.479702805Z ...
2025-08-21T15:04:44.479705433Z The "freeze_support()" line can be omitted if the program
2025-08-21T15:04:44.479706430Z is not going to be frozen to produce an executable.
2025-08-21T15:04:44.480419736Z Traceback (most recent call last):
2025-08-21T15:04:44.480437074Z File "<string>", line 1, in <module>
2025-08-21T15:04:44.480438201Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
2025-08-21T15:04:44.480439985Z exitcode = _main(fd, parent_sentinel)
2025-08-21T15:04:44.480440983Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
2025-08-21T15:04:44.480442508Z prepare(preparation_data)
2025-08-21T15:04:44.480444652Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
2025-08-21T15:04:44.480445702Z _fixup_main_from_path(data['init_main_from_path'])
2025-08-21T15:04:44.480447055Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
2025-08-21T15:04:44.480448771Z main_content = runpy.run_path(main_path,
2025-08-21T15:04:44.480449664Z File "/usr/lib/python3.10/runpy.py", line 289, in run_path
2025-08-21T15:04:44.480450624Z return _run_module_code(code, init_globals, run_name,
2025-08-21T15:04:44.480451541Z File "/usr/lib/python3.10/runpy.py", line 96, in _run_module_code
2025-08-21T15:04:44.480452460Z _run_code(code, mod_globals, init_globals,
2025-08-21T15:04:44.480453325Z File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
2025-08-21T15:04:44.480454293Z exec(code, run_globals)
2025-08-21T15:04:44.480455115Z File "/src/handler.py", line 6, in <module>
2025-08-21T15:04:44.480455982Z vllm_engine = vLLMEngine()
2025-08-21T15:04:44.480457079Z File "/src/engine.py", line 30, in __init__
2025-08-21T15:04:44.480466967Z self.llm = self._initialize_llm() if engine is None else engine.llm
2025-08-21T15:04:44.480467955Z File "/src/engine.py", line 171, in _initialize_llm
2025-08-21T15:04:44.480468879Z raise e
2025-08-21T15:04:44.480470275Z File "/src/engine.py", line 165, in _initialize_llm
2025-08-21T15:04:44.480471131Z engine = AsyncLLMEngine.from_engine_args(self.engine_args)
2025-08-21T15:04:44.480472063Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 190, in from_engine_args
2025-08-21T15:04:44.480474147Z return cls(
2025-08-21T15:04:44.480474934Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 117, in __init__
2025-08-21T15:04:44.480475816Z self.engine_core = EngineCoreClient.make_async_mp_client(
2025-08-21T15:04:44.480476705Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client
2025-08-21T15:04:44.480477609Z return AsyncMPClient(*client_args)
2025-08-21T15:04:44.480478498Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 677, in __init__
2025-08-21T15:04:44.480479603Z super().__init__(
2025-08-21T15:04:44.480481189Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 408, in __init__
2025-08-21T15:04:44.480482101Z with launch_core_engines(vllm_config, executor_class,
2025-08-21T15:04:44.480483680Z File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
2025-08-21T15:04:44.480484591Z return next(self.gen)
2025-08-21T15:04:44.480485490Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 680, in launch_core_engines
2025-08-21T15:04:44.480486266Z local_engine_manager = CoreEngineProcManager(
2025-08-21T15:04:44.480487254Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 133, in __init__
2025-08-21T15:04:44.480488206Z proc.start()
2025-08-21T15:04:44.480488987Z File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start
2025-08-21T15:04:44.480489885Z self._popen = self._Popen(self)
2025-08-21T15:04:44.480494339Z File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
2025-08-21T15:04:44.480495395Z return Popen(process_obj)
2025-08-21T15:04:44.480496299Z File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
2025-08-21T15:04:44.480497344Z super().__init__(process_obj)
2025-08-21T15:04:44.480498231Z File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
2025-08-21T15:04:44.480499157Z self._launch(process_obj)
2025-08-21T15:04:44.480499921Z File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
2025-08-21T15:04:44.480501082Z prep_data = spawn.get_preparation_data(process_obj._name)
2025-08-21T15:04:44.480501969Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
2025-08-21T15:04:44.480502978Z _check_not_importing_main()
2025-08-21T15:04:44.480503920Z File "/usr/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
2025-08-21T15:04:44.480504928Z raise RuntimeError('''
2025-08-21T15:04:44.480505670Z RuntimeError:
2025-08-21T15:04:44.480506528Z An attempt has been made to start a new process before the
2025-08-21T15:04:44.480507439Z current process has finished its bootstrapping phase.
2025-08-21T15:04:44.480509126Z This probably means that you are not using fork to start your
2025-08-21T15:04:44.480509897Z child processes and you have forgotten to use the proper idiom
2025-08-21T15:04:44.480510685Z in the main module:
2025-08-21T15:04:44.480512254Z if __name__ == '__main__':
2025-08-21T15:04:44.480513015Z freeze_support()
2025-08-21T15:04:44.480513790Z ...
2025-08-21T15:04:44.480515237Z The "freeze_support()" line can be omitted if the program
2025-08-21T15:04:44.480516013Z is not going to be frozen to produce an executable.
2025-08-21T15:04:45.449484424Z engine.py :170 2025-08-21 15:04:45,448 Error initializing vLLM engine: Engine core initialization failed. See root cause above. Failed core proc(s): {}
2025-08-21T15:04:45.450628202Z Traceback (most recent call last):
2025-08-21T15:04:45.450647248Z File "/src/handler.py", line 6, in <module>
2025-08-21T15:04:45.450648646Z vllm_engine = vLLMEngine()
2025-08-21T15:04:45.450650500Z File "/src/engine.py", line 30, in __init__
2025-08-21T15:04:45.450651605Z self.llm = self._initialize_llm() if engine is None else engine.llm
2025-08-21T15:04:45.450653655Z File "/src/engine.py", line 171, in _initialize_llm
2025-08-21T15:04:45.450655700Z raise e
2025-08-21T15:04:45.450656877Z File "/src/engine.py", line 165, in _initialize_llm
2025-08-21T15:04:45.450657964Z engine = AsyncLLMEngine.from_engine_args(self.engine_args)
2025-08-21T15:04:45.450659069Z File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 653, in from_engine_args
2025-08-21T15:04:45.450661249Z return async_engine_cls.from_vllm_config(
2025-08-21T15:04:45.450662384Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 163, in from_vllm_config
2025-08-21T15:04:45.450663539Z return cls(
2025-08-21T15:04:45.450664524Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/async_llm.py", line 117, in __init__
2025-08-21T15:04:45.450665716Z self.engine_core = EngineCoreClient.make_async_mp_client(
2025-08-21T15:04:45.450666776Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client
2025-08-21T15:04:45.450667820Z return AsyncMPClient(*client_args)
2025-08-21T15:04:45.450668733Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 677, in __init__
2025-08-21T15:04:45.450669728Z super().__init__(
2025-08-21T15:04:45.450671203Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core_client.py", line 408, in __init__
2025-08-21T15:04:45.450672164Z with launch_core_engines(vllm_config, executor_class,
2025-08-21T15:04:45.450673139Z File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
2025-08-21T15:04:45.450674042Z next(self.gen)
2025-08-21T15:04:45.450675046Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 697, in launch_core_engines
2025-08-21T15:04:45.450675961Z wait_for_engine_startup(
2025-08-21T15:04:45.450676846Z File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/utils.py", line 750, in wait_for_engine_startup
2025-08-21T15:04:45.450677755Z raise RuntimeError("Engine core initialization failed. "
2025-08-21T15:04:45.450678804Z RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
Hi!
I've tried to benchmark different GPUs for throughput with Qwen3 30B A3B FP8.
First I tried A100 and L40s - everything was fine, I got my numbers.
Then I tried H100 - and it failed on vLLM startup (
Error initializing vLLM engine). I tried to restart it, create a new worker, etc., but it always ends up with the same error. I'm not sure what the issue is and how to fix it.Here are the logs: