Skip to content

[Bugfix][NPU][XPU] Use platform-aware profiler activities for trace generation#1542

Open
lishunyang12 wants to merge 3 commits intovllm-project:mainfrom
lishunyang12:fix/npu-profiler-activity
Open

[Bugfix][NPU][XPU] Use platform-aware profiler activities for trace generation#1542
lishunyang12 wants to merge 3 commits intovllm-project:mainfrom
lishunyang12:fix/npu-profiler-activity

Conversation

@lishunyang12
Copy link
Contributor

Summary

  • The diffusion TorchProfiler hardcodes ProfilerActivity.CUDA, which fails on NPU (Ascend) devices since CUDA activity is not available there.
  • This extracts activity selection into a helper that checks current_omni_platform.device_type and uses ProfilerActivity.NPU (provided by torch_npu) on NPU devices, falling back to ProfilerActivity.CUDA otherwise.

Fixes #1484

Test plan

  • On NPU: profiler should now start without error and export_chrome_trace should produce a valid trace file.
  • On CUDA: no behavior change — ProfilerActivity.CUDA is still used.

cc @gcanlin

@gcanlin
Copy link
Collaborator

gcanlin commented Feb 27, 2026

Thanks! I will test it on NPU.

@lishunyang12 lishunyang12 force-pushed the fix/npu-profiler-activity branch from 352bb61 to 3b5b321 Compare February 27, 2026 15:03

activities = [ProfilerActivity.CPU]
device_type = current_omni_platform.device_type
if device_type == "npu":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it support other platforms(rocm, xpu)? Does it require adaptation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCm uses \ in PyTorch's profiler API, so the else branch already covers it. XPU support can be added when needed — this PR is scoped to fix #1484.

Copy link
Contributor Author

@lishunyang12 lishunyang12 Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — I'll add XPU support too. ROCm already works with the CUDA fallback since PyTorch maps it to ProfilerActivity.CUDA.

@lishunyang12
Copy link
Contributor Author

@xuechendi PTAL

@lishunyang12 lishunyang12 changed the title [Bugfix][NPU] Use platform-aware profiler activities for trace generation [Bugfix][NPU][XPU] Use platform-aware profiler activities for trace generation Feb 28, 2026
Copy link
Contributor

@xuechendi xuechendi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

activities.append(getattr(ProfilerActivity, "NPU"))
elif device_type == "xpu":
# Intel XPU support
activities.append(getattr(ProfilerActivity, "XPU"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how we did in vLLM main repo

TorchProfilerActivity = Literal["CPU", "CUDA", "XPU"]
TorchProfilerActivityMap = {
    "CPU": torch.profiler.ProfilerActivity.CPU,
    "CUDA": torch.profiler.ProfilerActivity.CUDA,
    "XPU": torch.profiler.ProfilerActivity.XPU,
}

Current codes with getattr also works. Thanks for adding XPU

@david6666666
Copy link
Collaborator

please fix DCO

@gcanlin gcanlin mentioned this pull request Feb 28, 2026
5 tasks
@JustQJ
Copy link
Contributor

JustQJ commented Feb 28, 2026

Hi, I still encounter error

[Stage-0] INFO 02-28 08:59:07 [diffusion_engine.py:227] Starting diffusion profiling → /mnt/deepseek/cloudide/tpcode/profile/stage_0_diffusion_1772269147*.json
[Stage-0] INFO 02-28 08:59:07 [torch_profiler.py:66] [Rank 0] Starting End-to-End Torch profiler

Processed prompts:   0%|                        [Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684] Error executing method 'start_profile'. This might cause issues in distributed execution.                                                                                 | 0/1 [00:00<?, ?it/s, est
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684] Traceback (most recent call last):
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 680, in execute_method
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]     return func(*args, **kwargs)
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]            ^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 172, in start_profile
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]     return CurrentProfiler.start(trace_path_template)
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/profiler/torch_profiler.py", line 90, in start
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]     activities=_get_profiler_activities(),
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]                ^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/profiler/torch_profiler.py", line 25, in _get_profiler_activities
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]     activities.append(getattr(ProfilerActivity, "NPU"))
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:684] AttributeError: type object 'torch._C._profiler.ProfilerActivity' has no attribute 'NPU'
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401] Error executing RPC: type object 'torch._C._profiler.ProfilerActivity' has no attribute 'NPU'
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401] Traceback (most recent call last):
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 398, in execute_rpc
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]     result = self.worker.execute_method(method, *args, **kwargs)
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 685, in execute_method
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]     raise e
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 680, in execute_method
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]     return func(*args, **kwargs)
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]            ^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 172, in start_profile
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]     return CurrentProfiler.start(trace_path_template)
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/profiler/torch_profiler.py", line 90, in start
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]     activities=_get_profiler_activities(),
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]                ^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/profiler/torch_profiler.py", line 25, in _get_profiler_activities
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]     activities.append(getattr(ProfilerActivity, "NPU"))
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:401] AttributeError: type object 'torch._C._profiler.ProfilerActivity' has no attribute 'NPU'
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430] Error processing RPC: type object 'torch._C._profiler.ProfilerActivity' has no attribute 'NPU'
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430] Traceback (most recent call last):
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 426, in worker_busy_loop
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]     result, should_reply = self.execute_rpc(msg)
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]                            ^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 402, in execute_rpc
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]     raise e
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 398, in execute_rpc
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]     result = self.worker.execute_method(method, *args, **kwargs)
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 685, in execute_method
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]     raise e
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 680, in execute_method
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]     return func(*args, **kwargs)
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]            ^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/worker/diffusion_worker.py", line 172, in start_profile
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]     return CurrentProfiler.start(trace_path_template)
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/profiler/torch_profiler.py", line 90, in start
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]     activities=_get_profiler_activities(),
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]                ^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]   File "/mnt/deepseek/cloudide/tpcode/omni-qwen2512/vllm_omni/diffusion/profiler/torch_profiler.py", line 25, in _get_profiler_activities
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]     activities.append(getattr(ProfilerActivity, "NPU"))
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Stage-0] ERROR 02-28 08:59:07 [diffusion_worker.py:430] AttributeError: type object 'torch._C._profiler.ProfilerActivity' has no attribute 'NPU'
[Stage-0] INFO 02-28 08:59:07 [omni_stage.py:805] [Stage-0] Diffusion Torch profiler started

Signed-off-by: lishunyang <lishunyang12@163.com>
Signed-off-by: lishunyang <lishunyang12@163.com>
Signed-off-by: lishunyang <lishunyang12@163.com>
@lishunyang12 lishunyang12 force-pushed the fix/npu-profiler-activity branch from ac733e2 to c9961c9 Compare February 28, 2026 12:50
@lishunyang12
Copy link
Contributor Author

@JustQJ Please tried it again. I made some changes according to your bug results. :)

@JustQJ
Copy link
Contributor

JustQJ commented Mar 2, 2026

@JustQJ Please tried it again. I made some changes according to your bug results. :)

Hi, In my test, hasattr(ProfilerActivity, "NPU") is still false after importing torch_npu.

>>> import torch
/usr/local/python3.11.14/lib/python3.11/site-packages/torch_npu/__init__.py:309: UserWarning: On the interactive interface, the value of TASK_QUEUE_ENABLE is set to 0 by default.                      Do not set it to 1 to prevent some unknown errors
  warnings.warn("On the interactive interface, the value of TASK_QUEUE_ENABLE is set to 0 by default. \
>>> import torch_npu
>>> from torch.profiler import ProfilerActivity
>>> ProfilerActivity.CPU
<ProfilerActivity.CPU: 0>
>>> ProfilerActivity.CUDA
<ProfilerActivity.CUDA: 2>
>>> ProfilerActivity.NPU
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'torch._C._profiler.ProfilerActivity' has no attribute 'NPU'. Did you mean: 'CPU'?
>>> hasattr(ProfilerActivity, "NPU")
False

my env

accelerate                        1.12.0
aenum                             3.1.16
aiofiles                          24.1.0
aiohappyeyeballs                  2.6.1
aiohttp                           3.13.3
aiosignal                         1.4.0
annotated-doc                     0.0.4
annotated-types                   0.7.0
anthropic                         0.71.0
antlr4-python3-runtime            4.9.3
anyio                             4.12.1
arctic_inference                  0.1.1
asc_op_compile_base               0.1.0
asc_opc_tool                      0.1.0
astor                             0.8.1
attrs                             25.4.0
audioread                         3.1.0
auto_tune                         0.1.0
blake3                            1.0.8
blinker                           1.9.0
brotli                            1.2.0
cache_dit                         1.2.0
cachetools                        6.2.6
cbor2                             5.8.0
certifi                           2026.1.4
cffi                              2.0.0
charset-normalizer                3.4.4
click                             8.3.1
cloudpickle                       3.1.2
cmake                             4.2.1
coloredlogs                       15.0.1
compressed-tensors                0.13.0
cryptography                      46.0.4
dataflow                          0.0.1
decorator                         5.2.1
depyf                             0.20.0
diffusers                         0.36.0
dill                              0.4.1
diskcache                         5.6.3
distro                            1.9.0
dnspython                         2.8.0
docstring_parser                  0.17.0
einops                            0.8.2
email-validator                   2.3.0
es_math                           1.0.0
fastapi                           0.123.10
fastapi-cli                       0.0.20
fastapi-cloud-cli                 0.11.0
fastar                            0.8.0
ffmpy                             1.0.0
filelock                          3.20.3
Flask                             3.1.2
flatbuffers                       25.12.19
frozenlist                        1.8.0
fsspec                            2026.1.0
ge-py                             0.0.1
gguf                              0.17.1
gradio                            5.50.0
gradio_client                     1.14.0
groovy                            0.1.2
grpcio                            1.76.0
grpcio-reflection                 1.76.0
h11                               0.16.0
h2                                4.3.0
hccl                              0.1.0
hf-xet                            1.2.0
hpack                             4.1.0
httpcore                          1.0.9
httptools                         0.7.1
httpx                             0.28.1
httpx-sse                         0.4.3
huggingface-hub                   0.36.0
humanfriendly                     10.0
Hypercorn                         0.18.0
hyperframe                        6.1.0
idna                              3.11
ijson                             3.4.0.post0
ImageIO                           2.37.2
imageio-ffmpeg                    0.6.0
importlib_metadata                8.7.1
interegular                       0.3.3
itsdangerous                      2.2.0
Jinja2                            3.1.6
jiter                             0.12.0
jmespath                          1.1.0
joblib                            1.5.3
jsonschema                        4.26.0
jsonschema-specifications         2025.9.1
lark                              1.2.2
lazy_loader                       0.4
librosa                           0.11.0
llguidance                        1.3.0
llm_datadist                      0.0.1
llm_datadist_v1                   0.0.1
llvmlite                          0.46.0
lm-format-enforcer                0.11.3
loguru                            0.7.3
markdown-it-py                    4.0.0
MarkupSafe                        3.0.3
mcp                               1.26.0
mdurl                             0.1.2
mindiesd                          2.3.0
mistral_common                    1.9.0
model-hosting-container-standards 0.1.13
modelscope                        1.34.0
more-itertools                    10.8.0
mpmath                            1.3.0
msgpack                           1.1.2
msgspec                           0.20.0
msobjdump                         0.1.0
mspti                             0.0.1
multidict                         6.7.1
networkx                          3.6.1
ninja                             1.13.0
numba                             0.63.1
numpy                             2.3.5
omegaconf                         2.3.0
onnxruntime-cann                  1.23.2
op_compile_tool                   0.1.0
op_gen                            0.1
op_test_frame                     0.1
opc_tool                          0.1.0
openai                            2.16.0
openai-harmony                    0.0.8
openai-whisper                    20250625
opencv-python-headless            4.13.0.92
orjson                            3.11.7
outlines_core                     0.2.11
packaging                         26.0
pandas                            2.3.3
pandas-stubs                      2.3.3.260113
partial-json-parser               0.2.1.1.post7
pillow                            11.3.0
pip                               25.3
platformdirs                      4.5.1
pooch                             1.8.2
prettytable                       3.17.0
priority                          2.0.0
prometheus_client                 0.24.1
prometheus-fastapi-instrumentator 7.1.0
propcache                         0.4.1
protobuf                          6.33.5
psutil                            7.2.2
py-cpuinfo                        9.0.0
pybase64                          1.4.3
pybind11                          3.0.1
pycountry                         24.6.1
pycparser                         3.0
pydantic                          2.12.3
pydantic_core                     2.41.4
pydantic-extra-types              2.11.0
pydantic-settings                 2.12.0
pydub                             0.25.1
Pygments                          2.19.2
PyJWT                             2.10.1
python-dateutil                   2.9.0.post0
python-dotenv                     1.2.1
python-json-logger                4.0.0
python-multipart                  0.0.22
pytz                              2025.2
PyYAML                            6.0.3
pyzmq                             27.1.0
Quart                             0.20.0
ray                               2.48.0
referencing                       0.37.0
regex                             2026.1.15
requests                          2.32.5
resampy                           0.4.3
rich                              14.3.1
rich-toolkit                      0.17.1
rignore                           0.7.6
rpds-py                           0.30.0
ruff                              0.15.4
safehttpx                         0.1.7
safetensors                       0.7.0
schedule_search                   0.0.1
scikit-learn                      1.8.0
scipy                             1.17.0
semantic-version                  2.10.0
sentencepiece                     0.2.1
sentry-sdk                        2.51.0
setproctitle                      1.3.7
setuptools                        79.0.1
setuptools-scm                    9.2.2
shellingham                       1.5.4
show_kernel_debug_data            0.1.0
six                               1.17.0
sniffio                           1.3.1
soundfile                         0.13.1
sox                               1.5.0
soxr                              1.0.0
sse-starlette                     3.2.0
starlette                         0.50.0
superkernel                       0.1.0
supervisor                        4.3.0
sympy                             1.14.0
te                                0.4.0
threadpoolctl                     3.6.0
tiktoken                          0.12.0
tokenizers                        0.22.2
tomlkit                           0.13.3
torch                             2.9.0+cpu
torch_npu                         2.9.0
torchaudio                        2.9.0
torchsde                          0.2.6
torchvision                       0.24.0+cpu
tqdm                              4.67.1
trampoline                        0.1.2
transformers                      4.57.6
triton                            3.6.0
triton-ascend                     3.2.0
typer                             0.21.1
types-pytz                        2025.2.0.20251108
typing_extensions                 4.15.0
typing-inspection                 0.4.2
tzdata                            2025.3
urllib3                           2.6.3
uvicorn                           0.40.0
uvloop                            0.22.1
watchfiles                        1.1.1
wcwidth                           0.6.0
websockets                        15.0.1
Werkzeug                          3.1.5
wheel                             0.46.3
wsproto                           1.3.2
xgrammar                          0.1.29
yarl                              1.22.0
zipp                              3.23.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][NPU]: When I use an offline script for profile analysis, I am unable to generate a trace file.

6 participants