1
- This tutorial introduces techniques we used to profile and tune the
1
+ This tutorial introduces techniques we use to profile and tune the
2
2
CPU performance of PaddlePaddle. We will use Python packages
3
- ` cProfile ` and ` yep ` , and Google ` perftools ` .
3
+ ` cProfile ` and ` yep ` , and Google's ` perftools ` .
4
4
5
- Profiling is the process that reveals the performance bottlenecks,
5
+ Profiling is the process that reveals performance bottlenecks,
6
6
which could be very different from what's in the developers' mind.
7
- Performance tuning is to fix the bottlenecks. Performance optimization
7
+ Performance tuning is done to fix these bottlenecks. Performance optimization
8
8
repeats the steps of profiling and tuning alternatively.
9
9
10
- PaddlePaddle users program AI by calling the Python API, which calls
10
+ PaddlePaddle users program AI applications by calling the Python API, which calls
11
11
into ` libpaddle.so. ` written in C++. In this tutorial, we focus on
12
12
the profiling and tuning of
13
13
@@ -82,7 +82,7 @@ focus on. We can sort above profiling file by tottime:
82
82
83
83
We can see that the most time-consuming function is the `built-in
84
84
method run` , which is a C++ function in ` libpaddle.so`. We will
85
- explain how to profile C++ code in the next section. At the right
85
+ explain how to profile C++ code in the next section. At this
86
86
moment, let's look into the third function ` sync_with_cpp ` , which is a
87
87
Python function. We can click it to understand more about it:
88
88
@@ -135,8 +135,8 @@ to generate the profiling file. The default filename is
135
135
` main.py.prof ` .
136
136
137
137
Please be aware of the ` -v ` command line option, which prints the
138
- analysis results after generating the profiling file. By taking a
139
- glance at the print result, we'd know that if we stripped debug
138
+ analysis results after generating the profiling file. By examining the
139
+ the print result, we'd know that if we stripped debug
140
140
information from ` libpaddle.so ` at build time. The following hints
141
141
help make sure that the analysis results are readable:
142
142
@@ -155,9 +155,9 @@ help make sure that the analysis results are readable:
155
155
variable ` OMP_NUM_THREADS=1 ` to prevents OpenMP from automatically
156
156
starting multiple threads.
157
157
158
- ### Look into the Profiling File
158
+ ### Examining the Profiling File
159
159
160
- The tool we used to look into the profiling file generated by
160
+ The tool we used to examine the profiling file generated by
161
161
` perftools ` is [ ` pprof ` ] ( https://github.com/google/pprof ) , which
162
162
provides a Web-based GUI like ` cprofilev ` .
163
163
@@ -194,4 +194,4 @@ time, and `MomentumOp` takes about 17%. Obviously, we'd want to
194
194
optimize ` MomentumOp ` .
195
195
196
196
` pprof ` would mark performance critical parts of the program in
197
- red. It's a good idea to follow the hint .
197
+ red. It's a good idea to follow the hints .
0 commit comments