Skip to content

Commit da36475

Browse files
committed
Docs: Naming temporary compilation results
1 parent 7f63ef2 commit da36475

File tree

5 files changed

+277
-191
lines changed

5 files changed

+277
-191
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ build_release/
77

88
# Temporary binaries
99
/tmp/
10-
less_slow_from_ptx.cubin
1110
less_slow_from_cu.cubin
1211
less_slow_from_cu.ptx
12+
less_slow_sm70_from_ptx.cubin
13+
less_slow_sm80_from_ptx.cubin
14+
less_slow_sm90a_from_ptx.cubin

less_slow.cu

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,14 @@
2323
* $ nvcc -arch=sm_90a -Xptxas -v -lineinfo -cubin -o less_slow_from_cu.cubin less_slow.cu
2424
* $ cuobjdump -sass less_slow_from_cu.cubin | grep -i mma
2525
*
26+
* Assuming how aggressively NVCC unrolls loops and the number of kernels in
27+
* this file, you may want to deduplicate them:
28+
*
29+
* $ cuobjdump -sass less_slow_from_cu.cubin | grep -i mma | \
30+
* $ sed -r 's/\/\*[^*]+\*\///g' | \
31+
* $ sed -r 's/^[[:space:]]+//; s/[[:space:]]+$//' | \
32+
* $ sort -u
33+
*
2634
* Keep in mind the following TC generations:
2735
*
2836
* - Volta SM70: 1st generation of TCs, server V100 cards.

less_slow_sm70.ptx

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,16 @@
1818
* You can validate this file by asking the Nvidia PTX Assembler to compile it
1919
* to `.cubin` for some target architecture:
2020
*
21-
* $ ptxas -o less_slow_from_ptx.cubin -arch=sm_70 less_slow_sm70.ptx
22-
* $ cuobjdump -sass less_slow_from_ptx.cubin | grep -i mma
21+
* $ ptxas -o less_slow_sm70_from_ptx.cubin -arch=sm_70 less_slow_sm70.ptx
22+
* $ cuobjdump -sass less_slow_sm70_from_ptx.cubin | grep -i mma
23+
*
24+
* Assuming how aggressively NVCC unrolls loops and the number of kernels in
25+
* this file, you may want to deduplicate them:
26+
*
27+
* $ cuobjdump -sass less_slow_sm70_from_ptx.cubin | grep -i mma | \
28+
* $ sed -r 's/\/\*[^*]+\*\///g' | \
29+
* $ sed -r 's/^[[:space:]]+//; s/[[:space:]]+$//' | \
30+
* $ sort -u
2331
*
2432
* @section Register File
2533
*

less_slow_sm80.ptx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,13 @@
1313
* You can validate this file by asking the Nvidia PTX Assembler to compile it
1414
* to `.cubin` for some target architecture:
1515
*
16-
* $ ptxas -o less_slow_from_ptx.cubin -arch=sm_80 less_slow_sm80.ptx
17-
* $ cuobjdump -sass less_slow_from_ptx.cubin | grep -i mma
16+
* $ ptxas -o less_slow_sm80_from_ptx.cubin -arch=sm_80 less_slow_sm80.ptx
17+
* $ cuobjdump -sass less_slow_sm80_from_ptx.cubin | grep -i mma
1818
*
1919
* Assuming how aggressively NVCC unrolls loops and the number of kernels in
2020
* this file, you may want to deduplicate them:
2121
*
22-
* $ cuobjdump -sass less_slow_from_ptx.cubin | grep -i mma | \
22+
* $ cuobjdump -sass less_slow_sm80_from_ptx.cubin | grep -i mma | \
2323
* $ sed -r 's/\/\*[^*]+\*\///g' | \
2424
* $ sed -r 's/^[[:space:]]+//; s/[[:space:]]+$//' | \
2525
* $ sort -u

0 commit comments

Comments
 (0)