How to understand trace for onnx2xla.py example #20412

jchia · 2024-03-24T10:47:58Z

jchia
Mar 24, 2024

I changed the following onnx2xla.py example code to generate a trace for one inference after a warmup of inference runs:
https://github.com/google/jax/blob/ba557d5e1beb480851117a003ebf76c0ed2249e0/examples/onnx2xla.py

Blocks of GPU activity are interleaved and overlapping with blocks of host (CPU) activity. Is this good or normal? Does it suggest that a lot of computation is still being done on CPU instead of on the GPU? Performance-wise, isn't it better to have a single block of GPU activity? If so, what's wrong and how can it be fixed?

onnx2xla-perfetto_trace.json.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to understand trace for onnx2xla.py example #20412

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to understand trace for onnx2xla.py example #20412

Uh oh!

jchia Mar 24, 2024

Replies: 0 comments

jchia
Mar 24, 2024