@@ -29,48 +29,54 @@ Module | Self CPU total | CPU total | CUDA total | Occurrences
2929---------------|----------------|-----------|------------|------------
3030AlexNet | | | |
3131├── features | | | |
32- │├── 0 | 1.671ms | 6.589ms | 6.701ms | 1
33- │├── 1 | 62.430us | 62.430us | 63.264us | 1
34- │├── 2 | 62.909us | 109.948us | 112.640us | 1
35- │├── 3 | 225.389us | 858.376us | 1.814ms | 1
36- │├── 4 | 18.999us | 18.999us | 19.456us | 1
37- │├── 5 | 29.560us | 52.720us | 54.272us | 1
38- │├── 6 | 136.959us | 511.216us | 707.360us | 1
39- │├── 7 | 18.480us | 18.480us | 18.624us | 1
40- │├── 8 | 84.380us | 300.700us | 590.688us | 1
41- │├── 9 | 18.249us | 18.249us | 17.632us | 1
42- │├── 10 | 81.289us | 289.946us | 470.016us | 1
43- │├── 11 | 17.850us | 17.850us | 18.432us | 1
44- │└── 12 | 29.350us | 52.260us | 52.288us | 1
45- ├── avgpool | 41.840us | 70.840us | 76.832us | 1
32+ │├── 0 | 1.636ms | 6.466ms | 6.447ms | 1
33+ │├── 1 | 61.320us | 92.700us | 94.016us | 1
34+ │├── 2 | 87.680us | 177.270us | 163.744us | 1
35+ │├── 3 | 291.539us | 1.225ms | 1.966ms | 1
36+ │├── 4 | 34.550us | 48.850us | 50.112us | 1
37+ │├── 5 | 63.220us | 131.670us | 121.888us | 1
38+ │├── 6 | 202.009us | 768.135us | 846.048us | 1
39+ │├── 7 | 40.440us | 58.130us | 59.264us | 1
40+ │├── 8 | 183.129us | 690.816us | 854.016us | 1
41+ │├── 9 | 35.580us | 50.360us | 51.200us | 1
42+ │├── 10 | 167.769us | 631.019us | 701.088us | 1
43+ │├── 11 | 34.450us | 48.730us | 50.048us | 1
44+ │└── 12 | 64.509us | 134.508us | 123.040us | 1
45+ ├── avgpool | 67.200us | 131.190us | 122.880us | 1
4646└── classifier | | | |
47- ├── 0 | 66.400us | 122.110us | 125.920us | 1
48- ├── 1 | 293.658us | 293.658us | 664.704us | 1
49- ├── 2 | 17.600us | 17.600us | 18.432us | 1
50- ├── 3 | 27.920us | 49.030us | 51.168us | 1
51- ├── 4 | 40.590us | 40.590us | 208.672us | 1
52- ├── 5 | 17.570us | 17.570us | 18.432us | 1
53- └── 6 | 40.489us | 40.489us | 81.920us | 1
47+ ├── 0 | 82.110us | 172.480us | 150.848us | 1
48+ ├── 1 | 470.078us | 490.848us | 815.104us | 1
49+ ├── 2 | 44.269us | 68.289us | 59.424us | 1
50+ ├── 3 | 59.339us | 125.977us | 109.568us | 1
51+ ├── 4 | 72.319us | 86.819us | 219.136us | 1
52+ ├── 5 | 34.780us | 49.340us | 49.152us | 1
53+ └── 6 | 70.070us | 85.290us | 95.232us | 1
5454```
5555
5656To see the low level operations that occur within each layer, print the contents of ` prof.display(show_events=True) ` .
5757
5858``` text
59- Module | Self CPU total | CPU total | CUDA total | Occurrences
60- ------------------------------|----------------|-----------|------------|------------
61- AlexNet | | | |
62- ├── features | | | |
63- │├── 0 | | | |
64- ││├── conv2d | 13.370us | 1.671ms | 1.698ms | 1
65- ││├── convolution | 12.730us | 1.658ms | 1.685ms | 1
66- ││├── _convolution | 30.660us | 1.645ms | 1.673ms | 1
67- ││├── contiguous | 6.970us | 6.970us | 7.136us | 1
68- ││└── cudnn_convolution | 1.608ms | 1.608ms | 1.638ms | 1
69- │├── 1 | | | |
70- ││└── relu_ | 62.430us | 62.430us | 63.264us | 1
71- │├── 2 | | | |
72- ││├── max_pool2d | 15.870us | 62.909us | 63.488us | 1
73- ││└── max_pool2d_with_indices | 47.039us | 47.039us | 49.152us | 1
59+ Module | Self CPU total | CPU total | CUDA total | Occurrences
60+ ------------------------------------|----------------|-----------|------------|------------
61+ AlexNet | | | |
62+ ├── features | | | |
63+ │├── 0 | | | |
64+ ││├── aten::conv2d | 16.320us | 1.636ms | 1.636ms | 1
65+ ││├── aten::convolution | 11.710us | 1.619ms | 1.620ms | 1
66+ ││├── aten::_convolution | 40.950us | 1.607ms | 1.608ms | 1
67+ ││├── aten::contiguous | 2.920us | 2.920us | 2.720us | 1
68+ ││├── aten::cudnn_convolution | 1.467ms | 1.493ms | 1.554ms | 1
69+ ││├── aten::empty | 6.160us | 6.160us | 0.000us | 1
70+ ││├── aten::resize_ | 0.490us | 0.490us | 0.000us | 1
71+ ││├── aten::stride | 2.380us | 2.380us | 0.000us | 4
72+ ││├── aten::reshape | 6.820us | 18.640us | 2.048us | 1
73+ ││├── aten::view | 11.820us | 11.820us | 0.000us | 1
74+ ││└── aten::add_ | 51.060us | 51.060us | 18.432us | 1
75+ │├── 1 | | | |
76+ ││├── aten::relu_ | 29.940us | 61.320us | 61.408us | 1
77+ ││└── aten::threshold_ | 31.380us | 31.380us | 32.608us | 1
78+ │├── 2 | | | |
79+ ││├── aten::max_pool2d | 14.680us | 87.680us | 86.016us | 1
7480...
7581```
7682
@@ -85,17 +91,30 @@ print(trace[2])
8591print (event_lists_dict[trace[2 ].path][0 ])
8692```
8793``` text
88- --------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
89- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
90- --------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
91- conv2d 0.80% 13.370us 100.00% 1.671ms 1.671ms 25.34% 1.698ms 1.698ms 1 []
92- convolution 0.76% 12.730us 99.20% 1.658ms 1.658ms 25.15% 1.685ms 1.685ms 1 []
93- _convolution 1.83% 30.660us 98.44% 1.645ms 1.645ms 24.97% 1.673ms 1.673ms 1 []
94- contiguous 0.42% 6.970us 0.42% 6.970us 6.970us 0.11% 7.136us 7.136us 1 []
95- cudnn_convolution 96.19% 1.608ms 96.19% 1.608ms 1.608ms 24.44% 1.638ms 1.638ms 1 []
96- --------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
97- Self CPU time total: 1.671ms
98- CUDA time total: 6.701ms
94+ --------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
95+ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
96+ --------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
97+ aten::conv2d 1.00% 16.320us 100.00% 1.636ms 1.636ms 16.032us 0.98% 1.636ms 1.636ms 1
98+ aten::convolution 0.72% 11.710us 99.00% 1.619ms 1.619ms 12.064us 0.74% 1.620ms 1.620ms 1
99+ aten::_convolution 2.50% 40.950us 98.29% 1.607ms 1.607ms 29.088us 1.78% 1.608ms 1.608ms 1
100+ aten::contiguous 0.25% 4.090us 0.25% 4.090us 4.090us 4.032us 0.25% 4.032us 4.032us 1
101+ aten::cudnn_convolution 89.71% 1.467ms 91.27% 1.493ms 1.493ms 1.548ms 94.64% 1.554ms 1.554ms 1
102+ aten::empty 0.28% 4.590us 0.28% 4.590us 4.590us 0.000us 0.00% 0.000us 0.000us 1
103+ aten::contiguous 0.22% 3.530us 0.22% 3.530us 3.530us 3.200us 0.20% 3.200us 3.200us 1
104+ aten::resize_ 0.33% 5.390us 0.33% 5.390us 5.390us 0.000us 0.00% 0.000us 0.000us 1
105+ aten::contiguous 0.18% 2.920us 0.18% 2.920us 2.920us 2.720us 0.17% 2.720us 2.720us 1
106+ aten::resize_ 0.03% 0.490us 0.03% 0.490us 0.490us 0.000us 0.00% 0.000us 0.000us 1
107+ aten::stride 0.09% 1.460us 0.09% 1.460us 1.460us 0.000us 0.00% 0.000us 0.000us 1
108+ aten::stride 0.02% 0.320us 0.02% 0.320us 0.320us 0.000us 0.00% 0.000us 0.000us 1
109+ aten::stride 0.02% 0.300us 0.02% 0.300us 0.300us 0.000us 0.00% 0.000us 0.000us 1
110+ aten::stride 0.02% 0.300us 0.02% 0.300us 0.300us 0.000us 0.00% 0.000us 0.000us 1
111+ aten::empty 0.38% 6.160us 0.38% 6.160us 6.160us 0.000us 0.00% 0.000us 0.000us 1
112+ aten::reshape 0.42% 6.820us 1.14% 18.640us 18.640us 2.048us 0.13% 2.048us 2.048us 1
113+ aten::view 0.72% 11.820us 0.72% 11.820us 11.820us 0.000us 0.00% 0.000us 0.000us 1
114+ aten::add_ 3.12% 51.060us 3.12% 51.060us 51.060us 18.432us 1.13% 18.432us 18.432us 1
115+ --------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
116+ Self CPU time total: 1.636ms
117+ CUDA time total: 1.636ms
99118
100119```
101120
@@ -122,7 +141,7 @@ AlexNet | | | |
122141│├── 0 | | | |
123142│├── 1 | | | |
124143│├── 2 | | | |
125- │├── 3 | 3.189ms | 12.717ms | 0.000us | 1
144+ │├── 3 | 2.908ms | 11.604ms | 0.000us | 1
126145│├── 4 | | | |
127146│├── 5 | | | |
128147│├── 6 | | | |
@@ -133,7 +152,7 @@ AlexNet | | | |
133152│├── 11 | | | |
134153│└── 12 | | | |
135154├── avgpool | | | |
136- └── classifier | 13.403ms | 14.011ms | 0.000us | 1
155+ └── classifier | 12.311ms | 13.077ms | 0.000us | 1
137156 ├── 0 | | | |
138157 ├── 1 | | | |
139158 ├── 2 | | | |
0 commit comments