Add SM->GPC mapping query and ubench tool by William-An · Pull Request #87 · accel-sim/gpu-app-collection

William-An · 2026-05-06T15:23:05Z

Summary

Refactor queryGrInfo() in hw_def/common/gpuConfig.h so the RM ioctl scaffold (/dev/nvidiactl open, NV01_ROOT_CLIENT → NV01_DEVICE_0 → NV20_SUBDEVICE_0 alloc chain, control, teardown) is extracted into a single rmSubdeviceControl() helper. Both the existing LITTER_NUM_* query and any new RM control share it.
Add querySmToGpcMapping() exposing NV2080_CTRL_CMD_GR_GET_SM_TO_GPC_TPC_MAPPINGS (control id 0x2080120f, defined in open-gpu-kernel-modules/.../ctrl2080gr.h:752-776). Returns a std::vector<SmGpcTpcEntry> indexed by physical SM id (matches PTX %smid).
Add NUM_GPCS to GpuConfig, populated via NV2080_CTRL_GR_INFO_INDEX_LITTER_NUM_GPCS (0x14).
Add new ubench ubench/system/sm_gpc_mapping/ that dumps the per-SM (GPC, TPC) table, captures %smid from a kernel for cross-validation, and sweeps thread-block cluster shapes (1x1x1, 1x2x1, 2x1x1, 2x2x1, 1x4x1, 4x1x1, 2x4x1, 4x2x1, 8x1x1, 1x8x1) using cudaLaunchKernelEx + cudaLaunchAttributeClusterDimension. Verifies single-GPC-per-cluster invariant and prints both per-cluster summary and a stable CSV.

Why

Need an SM→GPC table to translate %smid captures from NVBit-instrumented kernels into GPC ids for exact SM scheduling as hardware. The RM ioctl path already used by this repo for FBP_COUNT / L2_BANKS is the right place to extend.

Tested on

Driver 575.51.03 / CUDA 12.8 toolkit
NVIDIA H100 80GB HBM3 (sm_90)
Result: 132 SMs across 8 GPCs (6×16 + 2×18), all sanity invariants pass, full kernel-side %smid coverage, single-GPC-per-cluster holds for all 10 cluster shapes.

Known caveats

RM struct layouts mirror open-gpu-kernel-modules branch 580.95.05; tested working against 575.51.03 driver, but no wider sweep yet.
Cluster kernels and %cluster_ctarank reads need sm_90+; the new ubench's Makefile sets NVCC_FLAGS += -arch=sm_90.
Hardcoded RM client handles (0xCAFE0001..3) inherited from existing pattern; could be hardened to kernel-generated handles.

Test plan

Run bin/sm_gpc_mapping on additional SM_90+ GPUs (H100 SXM4, GH200, B100/B200) and confirm # CHECK lines pass
Re-run bin/system_config on each and confirm FBP_COUNT / L2_BANKS outputs unchanged
Cross-check SM→GPC distribution against published topology
Decide on Accel-Sim integration point — to be filled in before un-drafting

🤖 Generated with Claude Code

Refactor queryGrInfo() in hw_def/common/gpuConfig.h to share the RM ioctl scaffold (rmSubdeviceControl) so additional NV2080 control queries reuse the alloc/control/free chain. Add NV2080_CTRL_CMD_GR_GET_SM_TO_GPC_TPC_MAPPINGS and querySmToGpcMapping(), plus NUM_GPCS exposed in GpuConfig via NV2080_CTRL_GR_INFO_INDEX_LITTER_NUM_GPCS. Add ubench/system/sm_gpc_mapping which dumps the per-SM (gpcId, tpcId) table, validates against runtime %smid captures, and sweeps thread-block cluster shapes (1x1x1..8x1x1 and 1x8x1) with cudaLaunchKernelEx + cudaLaunchAttributeClusterDimension to expose cluster->GPC rasterization order on Hopper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add visualize.py post-processor (stdlib Python 3) that renders the sm_gpc_mapping cluster-shape sweep as nested ASCII boxes (GPC > TPC > SM), with each SM cell labeled by every (cluster_id, rank_in_cluster) that landed on it in dispatch order. Supports --shape filter, --style boxed|compact, optional --color, and reads from --input or stdin. Also report `active SMs (touched by >=1 block) / SM_NUMBER` per shape in sm_gpc_mapping.cu so the C++ output and Python rendering both surface the H100 CPC-exclusion behavior (sizes >=4 only touch 120/132 SMs because GPC 0's high CPC and GPC 6/7's 9th TPC are excluded from cluster placement). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

William-An · 2026-05-08T01:18:27Z

FYI for reviewers — pushed an additional commit (8b6d4a07) with a small post-processor visualize.py next to the binary that turns the cluster-sweep output into nested ASCII boxes (GPC ⊃ TPC ⊃ SM). It's stdlib-only Python 3, no extra deps.

Also added a # active SMs (touched by >=1 block) = N / SM_NUMBER line per shape in the C++ tool so the GPU coverage of each launch is visible at a glance.

Usage

# Pipe live
bin/sm_gpc_mapping | ubench/system/sm_gpc_mapping/visualize.py

# Or post-process a saved log
visualize.py --input run.log [--shape 4x2x1] [--style boxed|compact] [--color]

Example: `4x2x1` cluster shape on H100 (boxed style, default)

The CUDA programming guide guarantees all blocks of a cluster co-locate on one GPC, but it doesn't say which SMs within the GPC, and the reality on H100 is nontrivial. Here's GPCs 0-2 for the 4x2x1 sweep:

=== Cluster shape 4x2x1 (size=8, 17 clusters, 136 blocks) ===
--- active SMs (touched by >=1 block) = 120 / 132; max occupants per SM in this launch = 2 ---

+-- GPC 0 [16 SMs] -------------------------------+
| +- TPC 0 -+ +- TPC 2 -+ +- TPC 5 -+ +- TPC 1 -+ |
| |  sm  0  | |  sm 16  | |  sm 32  | |  sm 48  | |
| |  c6:0   | |  c6:2   | |  c6:4   | |  c6:6   | |
| |  c15:0  | |  c15:2  | |  c15:4  | |  c15:6  | |
| |---------| |---------| |---------| |---------| |
| |  sm  1  | |  sm 17  | |  sm 33  | |  sm 49  | |
| |  c6:1   | |  c6:3   | |  c6:5   | |  c6:7   | |
| |  c15:1  | |  c15:3  | |  c15:5  | |  c15:7  | |
| +---------+ +---------+ +---------+ +---------+ |
| +- TPC 3 -+ +- TPC 6 -+ +- TPC 4 -+ +- TPC 7 -+ |
| |  sm124  | |  sm126  | |  sm128  | |  sm130  | |
| |    -    | |    -    | |    -    | |    -    | |
| |         | |         | |         | |         | |
| |---------| |---------| |---------| |---------| |
| |  sm125  | |  sm127  | |  sm129  | |  sm131  | |
| |    -    | |    -    | |    -    | |    -    | |
| |         | |         | |         | |         | |
| +---------+ +---------+ +---------+ +---------+ |
+-------------------------------------------------+

+-- GPC 1 [16 SMs] -------------------------------+
| +- TPC 0 -+ +- TPC 3 -+ +- TPC 6 -+ +- TPC 1 -+ |
| |  sm  2  | |  sm 18  | |  sm 34  | |  sm 50  | |
| |  c7:0   | |  c7:2   | |  c7:4   | |  c7:6   | |
| |  c16:0  | |  c16:2  | |  c16:4  | |  c16:6  | |
| |---------| |---------| |---------| |---------| |
| |  sm  3  | |  sm 19  | |  sm 35  | |  sm 51  | |
| |  c7:1   | |  c7:3   | |  c7:5   | |  c7:7   | |
| |  c16:1  | |  c16:3  | |  c16:5  | |  c16:7  | |
| +---------+ +---------+ +---------+ +---------+ |
| +- TPC 4 -+ +- TPC 7 -+ +- TPC 2 -+ +- TPC 5 -+ |
| |  sm 64  | |  sm 78  | |  sm 92  | |  sm106  | |
| |  c14:0  | |  c14:2  | |  c14:4  | |  c14:6  | |
| |         | |         | |         | |         | |
| |---------| |---------| |---------| |---------| |
| |  sm 65  | |  sm 79  | |  sm 93  | |  sm107  | |
| |  c14:1  | |  c14:3  | |  c14:5  | |  c14:7  | |
| |         | |         | |         | |         | |
| +---------+ +---------+ +---------+ +---------+ |
+-------------------------------------------------+

+-- GPC 2 [16 SMs] -------------------------------+
| +- TPC 0 -+ +- TPC 3 -+ +- TPC 6 -+ +- TPC 1 -+ |
| |  sm  4  | |  sm 20  | |  sm 36  | |  sm 52  | |
| |  c0:0   | |  c0:2   | |  c0:4   | |  c0:6   | |
| |         | |         | |         | |         | |
| |---------| |---------| |---------| |---------| |
| |  sm  5  | |  sm 21  | |  sm 37  | |  sm 53  | |
| |  c0:1   | |  c0:3   | |  c0:5   | |  c0:7   | |
| |         | |         | |         | |         | |
| +---------+ +---------+ +---------+ +---------+ |
| +- TPC 4 -+ +- TPC 7 -+ +- TPC 2 -+ +- TPC 5 -+ |
| |  sm 66  | |  sm 80  | |  sm 94  | |  sm108  | |
| |  c8:0   | |  c8:2   | |  c8:4   | |  c8:6   | |
| |         | |         | |         | |         | |
| |---------| |---------| |---------| |---------| |
| |  sm 67  | |  sm 81  | |  sm 95  | |  sm109  | |
| |  c8:1   | |  c8:3   | |  c8:5   | |  c8:7   | |
| |         | |         | |         | |         | |
| +---------+ +---------+ +---------+ +---------+ |
+-------------------------------------------------+

William-An and others added 2 commits May 5, 2026 20:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SM->GPC mapping query and ubench tool#87

Add SM->GPC mapping query and ubench tool#87
William-An wants to merge 2 commits into
accel-sim:devfrom
purdue-aalp:add-sm-gpc-mapping-tool

William-An commented May 6, 2026 •

edited

Loading

Uh oh!

William-An commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

William-An commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Tested on

Known caveats

Test plan

Uh oh!

William-An commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

Example: 4x2x1 cluster shape on H100 (boxed style, default)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

William-An commented May 6, 2026 •

edited

Loading

William-An commented May 8, 2026 •

edited

Loading

Example: `4x2x1` cluster shape on H100 (boxed style, default)