Skip to content
This repository was archived by the owner on May 24, 2021. It is now read-only.

Commit 7b274ef

Browse files
committed
v1.2.0
1 parent 8b52d15 commit 7b274ef

File tree

2 files changed

+69
-50
lines changed

2 files changed

+69
-50
lines changed

README.md

Lines changed: 68 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -29,48 +29,54 @@ Module | Self CPU total | CPU total | CUDA total | Occurrences
2929
---------------|----------------|-----------|------------|------------
3030
AlexNet | | | |
3131
├── features | | | |
32-
│├── 0 | 1.671ms | 6.589ms | 6.701ms | 1
33-
│├── 1 | 62.430us | 62.430us | 63.264us | 1
34-
│├── 2 | 62.909us | 109.948us | 112.640us | 1
35-
│├── 3 | 225.389us | 858.376us | 1.814ms | 1
36-
│├── 4 | 18.999us | 18.999us | 19.456us | 1
37-
│├── 5 | 29.560us | 52.720us | 54.272us | 1
38-
│├── 6 | 136.959us | 511.216us | 707.360us | 1
39-
│├── 7 | 18.480us | 18.480us | 18.624us | 1
40-
│├── 8 | 84.380us | 300.700us | 590.688us | 1
41-
│├── 9 | 18.249us | 18.249us | 17.632us | 1
42-
│├── 10 | 81.289us | 289.946us | 470.016us | 1
43-
│├── 11 | 17.850us | 17.850us | 18.432us | 1
44-
│└── 12 | 29.350us | 52.260us | 52.288us | 1
45-
├── avgpool | 41.840us | 70.840us | 76.832us | 1
32+
│├── 0 | 1.636ms | 6.466ms | 6.447ms | 1
33+
│├── 1 | 61.320us | 92.700us | 94.016us | 1
34+
│├── 2 | 87.680us | 177.270us | 163.744us | 1
35+
│├── 3 | 291.539us | 1.225ms | 1.966ms | 1
36+
│├── 4 | 34.550us | 48.850us | 50.112us | 1
37+
│├── 5 | 63.220us | 131.670us | 121.888us | 1
38+
│├── 6 | 202.009us | 768.135us | 846.048us | 1
39+
│├── 7 | 40.440us | 58.130us | 59.264us | 1
40+
│├── 8 | 183.129us | 690.816us | 854.016us | 1
41+
│├── 9 | 35.580us | 50.360us | 51.200us | 1
42+
│├── 10 | 167.769us | 631.019us | 701.088us | 1
43+
│├── 11 | 34.450us | 48.730us | 50.048us | 1
44+
│└── 12 | 64.509us | 134.508us | 123.040us | 1
45+
├── avgpool | 67.200us | 131.190us | 122.880us | 1
4646
└── classifier | | | |
47-
├── 0 | 66.400us | 122.110us | 125.920us | 1
48-
├── 1 | 293.658us | 293.658us | 664.704us | 1
49-
├── 2 | 17.600us | 17.600us | 18.432us | 1
50-
├── 3 | 27.920us | 49.030us | 51.168us | 1
51-
├── 4 | 40.590us | 40.590us | 208.672us | 1
52-
├── 5 | 17.570us | 17.570us | 18.432us | 1
53-
└── 6 | 40.489us | 40.489us | 81.920us | 1
47+
├── 0 | 82.110us | 172.480us | 150.848us | 1
48+
├── 1 | 470.078us | 490.848us | 815.104us | 1
49+
├── 2 | 44.269us | 68.289us | 59.424us | 1
50+
├── 3 | 59.339us | 125.977us | 109.568us | 1
51+
├── 4 | 72.319us | 86.819us | 219.136us | 1
52+
├── 5 | 34.780us | 49.340us | 49.152us | 1
53+
└── 6 | 70.070us | 85.290us | 95.232us | 1
5454
```
5555

5656
To see the low level operations that occur within each layer, print the contents of `prof.display(show_events=True)`.
5757

5858
```text
59-
Module | Self CPU total | CPU total | CUDA total | Occurrences
60-
------------------------------|----------------|-----------|------------|------------
61-
AlexNet | | | |
62-
├── features | | | |
63-
│├── 0 | | | |
64-
││├── conv2d | 13.370us | 1.671ms | 1.698ms | 1
65-
││├── convolution | 12.730us | 1.658ms | 1.685ms | 1
66-
││├── _convolution | 30.660us | 1.645ms | 1.673ms | 1
67-
││├── contiguous | 6.970us | 6.970us | 7.136us | 1
68-
││└── cudnn_convolution | 1.608ms | 1.608ms | 1.638ms | 1
69-
│├── 1 | | | |
70-
││└── relu_ | 62.430us | 62.430us | 63.264us | 1
71-
│├── 2 | | | |
72-
││├── max_pool2d | 15.870us | 62.909us | 63.488us | 1
73-
││└── max_pool2d_with_indices | 47.039us | 47.039us | 49.152us | 1
59+
Module | Self CPU total | CPU total | CUDA total | Occurrences
60+
------------------------------------|----------------|-----------|------------|------------
61+
AlexNet | | | |
62+
├── features | | | |
63+
│├── 0 | | | |
64+
││├── aten::conv2d | 16.320us | 1.636ms | 1.636ms | 1
65+
││├── aten::convolution | 11.710us | 1.619ms | 1.620ms | 1
66+
││├── aten::_convolution | 40.950us | 1.607ms | 1.608ms | 1
67+
││├── aten::contiguous | 2.920us | 2.920us | 2.720us | 1
68+
││├── aten::cudnn_convolution | 1.467ms | 1.493ms | 1.554ms | 1
69+
││├── aten::empty | 6.160us | 6.160us | 0.000us | 1
70+
││├── aten::resize_ | 0.490us | 0.490us | 0.000us | 1
71+
││├── aten::stride | 2.380us | 2.380us | 0.000us | 4
72+
││├── aten::reshape | 6.820us | 18.640us | 2.048us | 1
73+
││├── aten::view | 11.820us | 11.820us | 0.000us | 1
74+
││└── aten::add_ | 51.060us | 51.060us | 18.432us | 1
75+
│├── 1 | | | |
76+
││├── aten::relu_ | 29.940us | 61.320us | 61.408us | 1
77+
││└── aten::threshold_ | 31.380us | 31.380us | 32.608us | 1
78+
│├── 2 | | | |
79+
││├── aten::max_pool2d | 14.680us | 87.680us | 86.016us | 1
7480
...
7581
```
7682

@@ -85,17 +91,30 @@ print(trace[2])
8591
print(event_lists_dict[trace[2].path][0])
8692
```
8793
```text
88-
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
89-
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls Input Shapes
90-
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
91-
conv2d 0.80% 13.370us 100.00% 1.671ms 1.671ms 25.34% 1.698ms 1.698ms 1 []
92-
convolution 0.76% 12.730us 99.20% 1.658ms 1.658ms 25.15% 1.685ms 1.685ms 1 []
93-
_convolution 1.83% 30.660us 98.44% 1.645ms 1.645ms 24.97% 1.673ms 1.673ms 1 []
94-
contiguous 0.42% 6.970us 0.42% 6.970us 6.970us 0.11% 7.136us 7.136us 1 []
95-
cudnn_convolution 96.19% 1.608ms 96.19% 1.608ms 1.608ms 24.44% 1.638ms 1.638ms 1 []
96-
--------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- -----------------------------------
97-
Self CPU time total: 1.671ms
98-
CUDA time total: 6.701ms
94+
--------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
95+
Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls
96+
--------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
97+
aten::conv2d 1.00% 16.320us 100.00% 1.636ms 1.636ms 16.032us 0.98% 1.636ms 1.636ms 1
98+
aten::convolution 0.72% 11.710us 99.00% 1.619ms 1.619ms 12.064us 0.74% 1.620ms 1.620ms 1
99+
aten::_convolution 2.50% 40.950us 98.29% 1.607ms 1.607ms 29.088us 1.78% 1.608ms 1.608ms 1
100+
aten::contiguous 0.25% 4.090us 0.25% 4.090us 4.090us 4.032us 0.25% 4.032us 4.032us 1
101+
aten::cudnn_convolution 89.71% 1.467ms 91.27% 1.493ms 1.493ms 1.548ms 94.64% 1.554ms 1.554ms 1
102+
aten::empty 0.28% 4.590us 0.28% 4.590us 4.590us 0.000us 0.00% 0.000us 0.000us 1
103+
aten::contiguous 0.22% 3.530us 0.22% 3.530us 3.530us 3.200us 0.20% 3.200us 3.200us 1
104+
aten::resize_ 0.33% 5.390us 0.33% 5.390us 5.390us 0.000us 0.00% 0.000us 0.000us 1
105+
aten::contiguous 0.18% 2.920us 0.18% 2.920us 2.920us 2.720us 0.17% 2.720us 2.720us 1
106+
aten::resize_ 0.03% 0.490us 0.03% 0.490us 0.490us 0.000us 0.00% 0.000us 0.000us 1
107+
aten::stride 0.09% 1.460us 0.09% 1.460us 1.460us 0.000us 0.00% 0.000us 0.000us 1
108+
aten::stride 0.02% 0.320us 0.02% 0.320us 0.320us 0.000us 0.00% 0.000us 0.000us 1
109+
aten::stride 0.02% 0.300us 0.02% 0.300us 0.300us 0.000us 0.00% 0.000us 0.000us 1
110+
aten::stride 0.02% 0.300us 0.02% 0.300us 0.300us 0.000us 0.00% 0.000us 0.000us 1
111+
aten::empty 0.38% 6.160us 0.38% 6.160us 6.160us 0.000us 0.00% 0.000us 0.000us 1
112+
aten::reshape 0.42% 6.820us 1.14% 18.640us 18.640us 2.048us 0.13% 2.048us 2.048us 1
113+
aten::view 0.72% 11.820us 0.72% 11.820us 11.820us 0.000us 0.00% 0.000us 0.000us 1
114+
aten::add_ 3.12% 51.060us 3.12% 51.060us 51.060us 18.432us 1.13% 18.432us 18.432us 1
115+
--------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
116+
Self CPU time total: 1.636ms
117+
CUDA time total: 1.636ms
99118
100119
```
101120

@@ -122,7 +141,7 @@ AlexNet | | | |
122141
│├── 0 | | | |
123142
│├── 1 | | | |
124143
│├── 2 | | | |
125-
│├── 3 | 3.189ms | 12.717ms | 0.000us | 1
144+
│├── 3 | 2.908ms | 11.604ms | 0.000us | 1
126145
│├── 4 | | | |
127146
│├── 5 | | | |
128147
│├── 6 | | | |
@@ -133,7 +152,7 @@ AlexNet | | | |
133152
│├── 11 | | | |
134153
│└── 12 | | | |
135154
├── avgpool | | | |
136-
└── classifier | 13.403ms | 14.011ms | 0.000us | 1
155+
└── classifier | 12.311ms | 13.077ms | 0.000us | 1
137156
├── 0 | | | |
138157
├── 1 | | | |
139158
├── 2 | | | |

torchprof/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,5 @@
33
name = "torchprof"
44

55
__all__ = ["Profile"]
6-
__version__ = "1.1.1"
6+
__version__ = "1.2.0"
77

0 commit comments

Comments
 (0)