Skip to content

sglang 运行AngelSlim/Qwen3-4B_eagle3 问题 #284

@cicijohn1983

Description

@cicijohn1983

您好,我这边使用lmsysorg/sglang:latest的镜像运行模型,
使用的命令如下:
python3 -m sglang.launch_server --model /models/Qwen3-4B --speculative-algorithm EAGLE3 --speculative-draft-model-path /models/Qwen3-4B_eagle3 --speculative-num-steps 5 --speculative-eagle-topk 8 --speculative-num-draft-tokens 32 --mem-fraction 0.5 --served-model-name codeqwen --cuda-graph-max-bs 2 --dtype float16
最终报如下错误:
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2555, in run_scheduler_process
scheduler = Scheduler(
^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 329, in init
self.draft_worker = EAGLEWorker(
^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/speculative/eagle_worker.py", line 125, in init
super().init(
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 84, in init
self.model_runner = ModelRunner(
^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 242, in init
self.initialize(min_per_gpu_memory)
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 288, in initialize
self.load_model()
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 679, in load_model
self.model = get_model(
^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/init.py", line 22, in get_model
return loader.load_model(
^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 444, in load_model
model = _initialize_model(
^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 186, in _initialize_model
return model_class(
^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/models/llama_eagle3.py", line 183, in init
self.model = LlamaModel(
^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/models/llama_eagle3.py", line 130, in init
self.midlayer = LlamaDecoderLayer(config, 0, quant_config, prefix)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/models/llama_eagle3.py", line 50, in init
super().init(config, layer_id, quant_config, prefix)
File "/sgl-workspace/sglang/python/sglang/srt/models/llama.py", line 227, in init
self.self_attn = LlamaAttention(
^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/models/llama.py", line 170, in init
self.rotary_emb = get_rope(
^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/layers/rotary_embedding.py", line 1716, in get_rope
rotary_emb = RotaryEmbedding(
^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/layers/rotary_embedding.py", line 107, in init
from vllm._custom_ops import rotary_embedding
ModuleNotFoundError: No module named 'vllm'

想请问一下可以运行AngelSlim/Qwen3-4B_eagle3模型的版本号是多少,需要安装对应的vllm,vllm的版本号是多少,有相应的可以运行的镜像提供吗

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions