[tritonbench] fix tritonbench noise issue #132

xuzhao9 · 2026-01-13T19:38:01Z

We found that DGX B200 runner's CPU is unstable when allowing the process to migrate across multiple CPU cores. We pin the process to a single CPU core to mitigate.

Mitigates #130

Test plan:

https://github.com/pytorch/pytorch-integration-testing/actions/runs/20979471728

Manual validation on the DGX host:

Before:

$ docker run -e CONDA_ENV=triton-main --gpus all --privileged -it ghcr.io/meta-pytorch/tritonbench:latest bash -c '. /workspace/setup_instance.sh; cd /workspace/tritonbench; python run.py --op embedding --only torch_embedding,liger_embedding --bwd'
           (B, T, D, V)    torch_embedding-latency    liger_embedding-latency
-----------------------  -------------------------  -------------------------
   (32, 512, 768, 1024)         0.532576 (±62.89%)         0.501024 (±73.63%)
   (32, 512, 768, 2048)         0.434016 (±74.22%)        0.089152 (±243.36%)
   (32, 512, 768, 4096)        0.204800 (±214.33%)         0.430048 (±78.37%)
   (32, 512, 768, 8192)         0.501600 (±50.62%)         0.496928 (±52.07%)
  (32, 512, 768, 16384)         0.410432 (±70.73%)         0.500512 (±60.92%)
  (32, 512, 768, 32768)         0.522272 (±27.65%)         0.351200 (±54.89%)
  (32, 512, 768, 65536)          0.453696 (±0.92%)          0.222272 (±1.84%)
 (32, 512, 768, 131072)          0.575424 (±0.86%)          0.337888 (±0.95%)
  (8, 2048, 4096, 1024)         0.563136 (±43.71%)          0.355264 (±0.61%)
  (8, 2048, 4096, 2048)         0.522272 (±30.48%)         0.514912 (±36.42%)
  (8, 2048, 4096, 4096)          0.538624 (±0.78%)         0.432160 (±64.47%)
  (8, 2048, 4096, 8192)          0.824416 (±0.62%)          0.502656 (±0.61%)
 (8, 2048, 4096, 16384)          1.240096 (±0.49%)          0.603008 (±0.66%)
 (8, 2048, 4096, 32768)          1.506272 (±0.48%)          0.770944 (±2.40%)
 (8, 2048, 4096, 65536)          1.874880 (±0.43%)          1.076320 (±0.19%)
(8, 2048, 4096, 131072)          2.512864 (±0.32%)          1.680320 (±0.25%)
                average         0.8260860005393624          0.554038003552705

After:

$ docker run --cpuset-cpus 10 -e CONDA_ENV=triton-main --gpus all --privileged -it ghcr.io/meta-pytorch/tritonbench:latest bash -c '. /workspace/setup_instance.sh; cd /workspace/tritonbench; python run.py --op embedding --only torch_embedding,liger_embedding --bwd'


           (B, T, D, V)    torch_embedding-latency    liger_embedding-latency
-----------------------  -------------------------  -------------------------
   (32, 512, 768, 1024)          0.176096 (±1.25%)          0.085056 (±4.82%)
   (32, 512, 768, 2048)          0.186368 (±0.12%)          0.089056 (±2.23%)
   (32, 512, 768, 4096)          0.202784 (±2.08%)          0.094240 (±3.46%)
   (32, 512, 768, 8192)          0.247904 (±1.65%)          0.101344 (±0.22%)
  (32, 512, 768, 16384)          0.325600 (±0.99%)          0.119872 (±1.84%)
  (32, 512, 768, 32768)          0.381088 (±1.04%)          0.160640 (±1.91%)
  (32, 512, 768, 65536)          0.453792 (±0.85%)          0.222240 (±0.98%)
 (32, 512, 768, 131072)          0.575328 (±0.85%)          0.338016 (±1.20%)
  (8, 2048, 4096, 1024)          0.317440 (±0.06%)          0.355168 (±0.85%)
  (8, 2048, 4096, 2048)          0.378976 (±1.32%)          0.383904 (±1.06%)
  (8, 2048, 4096, 4096)          0.536608 (±0.78%)          0.431104 (±0.51%)
  (8, 2048, 4096, 8192)          0.823200 (±0.64%)          0.501984 (±0.60%)
 (8, 2048, 4096, 16384)          1.240224 (±0.51%)          0.603072 (±0.70%)
 (8, 2048, 4096, 32768)          1.505280 (±0.47%)          0.769088 (±0.54%)
 (8, 2048, 4096, 65536)          1.872960 (±0.39%)          1.076320 (±0.28%)
(8, 2048, 4096, 131072)          2.513056 (±0.30%)          1.680384 (±0.37%)
                average         0.7335439994931221        0.43821800500154495

nWEIdia · 2026-01-13T20:30:32Z

Quick fyi just in case you missed it:

docker run --runtime=nvidia <the rest stays the same>
can be used to totally replace the use of --gpus all --privileged

xuzhao9 · 2026-01-13T21:07:10Z

@nWEIdia I get an error: "docker: Error response from daemon: Requested CPUs are not available - requested 10, available: 140-167" (https://github.com/pytorch/pytorch-integration-testing/actions/runs/20970063138/job/60270801988). Is this set of CPU cores fixed? How do I pin to single CPU core, e.g., can I use --cpuset-cpus 140 ?

nWEIdia · 2026-01-13T21:17:05Z

It becomes tricky, as Meta's provision scripts has their own way of dividing CPU cores to the 8 runners (user Alice/Bob -> through Henry) and each are confined to those CPU cores.
cc @huydhn for ideas.

huydhn · 2026-01-14T01:19:48Z

In the multi-tenancy setup, the CPU are sliced so that each user has an equivalent, no overlapping share https://github.com/meta-pytorch/pytorch-gha-infra/blob/main/multi-tenant/playbooks/setup-host.yml#L206. This is under the assumption that all CPU cores are the same

Paste the snippet here for @nWEIdia visibility:

[Slice]
AllowedCPUs={{ (cpu_cores.stdout | int // ansible_loop.length | int) * ansible_loop.index0 }}-{{ ((cpu_cores.stdout | int // ansible_loop.length | int) * ansible_loop.index) - 1 }}
MemoryMax={{ memory_per_user }}
TasksMax=10000
DevicePolicy=closed

So that the reason why in an 8 users setup, AllowedCPUs is different for each user. Let me see if there is a bash command to find out which CPU cores are allowed. In the above example, 140-167 is the assigned CPU cores of the runner (user) picking up the job, then we can pin to any CPU in the list.

xuzhao9 · 2026-01-14T01:50:28Z

@huydhn @nWEIdia I verify that the allowed core list can be extracted by command taskset -pc $$

huydhn

LGTM!

nWEIdia

Shipit!

xuzhao9 · 2026-01-14T03:07:31Z

@huydhn it seems I still couldn't run docker pinned to 1 cpu core. The error is at https://github.com/pytorch/pytorch-integration-testing/actions/runs/20979471728/job/60301236292: "docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process:"

Meanwhile, I will try with the non docker version at #118

xuzhao9 · 2026-01-14T22:50:20Z

This is replaced by #118

fix tritonbench noise issue

f73b04f

meta-cla bot added the cla signed label Jan 13, 2026

xuzhao9 had a problem deploying to pytorch-x-vllm January 13, 2026 19:38 — with GitHub Actions Failure

xuzhao9 requested a review from huydhn January 13, 2026 19:39

nWEIdia mentioned this pull request Jan 13, 2026

B200 runner noise issue #130

Closed

set diff

bfe6aa1

xuzhao9 had a problem deploying to pytorch-x-vllm January 13, 2026 21:05 — with GitHub Actions Failure

fix test

73dc980

xuzhao9 had a problem deploying to pytorch-x-vllm January 13, 2026 21:22 — with GitHub Actions Failure

fix taskset

6a5ebb0

xuzhao9 had a problem deploying to pytorch-x-vllm January 13, 2026 21:29 — with GitHub Actions Failure

allow core using taskset

fc9019d

get allowed core

0551ab9

xuzhao9 had a problem deploying to pytorch-x-vllm January 14, 2026 01:51 — with GitHub Actions Failure

huydhn approved these changes Jan 14, 2026

View reviewed changes

get allowed core

427c09c

xuzhao9 had a problem deploying to pytorch-x-vllm January 14, 2026 01:54 — with GitHub Actions Failure

nWEIdia approved these changes Jan 14, 2026

View reviewed changes

xuzhao9 closed this Jan 14, 2026

xuzhao9 deleted the xz9/tritonbench-fix branch January 15, 2026 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tritonbench] fix tritonbench noise issue #132

[tritonbench] fix tritonbench noise issue #132

xuzhao9 commented Jan 13, 2026 •

edited

Loading

Uh oh!

nWEIdia commented Jan 13, 2026

Uh oh!

xuzhao9 commented Jan 13, 2026

Uh oh!

nWEIdia commented Jan 13, 2026

Uh oh!

huydhn commented Jan 14, 2026 •

edited

Loading

Uh oh!

xuzhao9 commented Jan 14, 2026

Uh oh!

huydhn left a comment

Uh oh!

nWEIdia left a comment

Uh oh!

xuzhao9 commented Jan 14, 2026 •

edited

Loading

Uh oh!

xuzhao9 commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[tritonbench] fix tritonbench noise issue #132

[tritonbench] fix tritonbench noise issue #132

Conversation

xuzhao9 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nWEIdia commented Jan 13, 2026

Uh oh!

xuzhao9 commented Jan 13, 2026

Uh oh!

nWEIdia commented Jan 13, 2026

Uh oh!

huydhn commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuzhao9 commented Jan 14, 2026

Uh oh!

huydhn left a comment

Choose a reason for hiding this comment

Uh oh!

nWEIdia left a comment

Choose a reason for hiding this comment

Uh oh!

xuzhao9 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuzhao9 commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xuzhao9 commented Jan 13, 2026 •

edited

Loading

huydhn commented Jan 14, 2026 •

edited

Loading

xuzhao9 commented Jan 14, 2026 •

edited

Loading