@@ -656,6 +656,62 @@ scaledcount(counter::Counter) = counter.value * (counter.enabled / counter.runni
656
656
@pstats [options] expr
657
657
658
658
Run `expr` and gather its performance statistics.
659
+
660
+ This macro basically measures the number of occurrences of events such as CPU
661
+ cycles, branch prediction misses, page faults, and so on. The list of
662
+ supported events can be shown by calling the `LinuxPerf.list` function.
663
+
664
+ Due to the resource limitation of performance measuring units (PMUs)
665
+ installed in a CPU core, all events may not be measured simultaneously,
666
+ resulting in multiplexing several groups of events in a single measurement.
667
+ If the running time is extremely short, some event groups may not be measured
668
+ at all.
669
+
670
+ The result is shown in a table. Each row consists of four columns: an event
671
+ group indicator, an event name, a scaled count and a running rate. A comment
672
+ may follow these columns after a hash (#) character.
673
+ 1. The event group indicated by a bracket is a set of events that are
674
+ measured simultaneously so that their count statistics can be meaningfully
675
+ compared.
676
+ 2. The event name is a conventional name of the measured event.
677
+ 3. The scaled count is the number of occurrences of the event, scaled by the
678
+ reciprocal of the running rate.
679
+ 4. The running rate is the ratio of the time of running and enabled.
680
+
681
+ The macro can take some options. If a string object is passed, it is a
682
+ comma-separated list of event names to measure. An event group can be
683
+ indicated by a pair of parentheses.
684
+
685
+ # Examples
686
+
687
+ ```
688
+ julia> xs = randn(1_000_000);
689
+
690
+ julia> sort(xs[1:9]); # compile
691
+
692
+ julia> @pstats sort(xs)
693
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
694
+ ┌ cpu-cycles 2.57e+08 48.6% # 3.8 cycles per ns
695
+ │ stalled-cycles-frontend 1.10e+07 48.6% # 4.3% of cycles
696
+ └ stalled-cycles-backend 2.48e+06 48.6% # 1.0% of cycles
697
+ ┌ instructions 1.84e+08 51.4% # 0.7 insns per cycle
698
+ │ branch-instructions 3.73e+07 51.4% # 20.2% of instructions
699
+ └ branch-misses 7.92e+06 51.4% # 21.2% of branch instructions
700
+ ┌ task-clock 6.75e+07 100.0%
701
+ │ context-switches 0.00e+00 100.0%
702
+ │ cpu-migrations 0.00e+00 100.0%
703
+ └ page-faults 1.95e+03 100.0%
704
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
705
+
706
+ julia> @pstats "(cpu-cycles,instructions,branch-instructions,branch-misses),page-faults" sort(xs)
707
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
708
+ ┌ cpu-cycles 2.61e+08 100.0% # 3.9 cycles per ns
709
+ │ instructions 1.80e+08 100.0% # 0.7 insns per cycle
710
+ │ branch-instructions 3.64e+07 100.0% # 20.2% of instructions
711
+ └ branch-misses 8.32e+06 100.0% # 22.8% of branch instructions
712
+ ╶ page-faults 0.00e+00 100.0%
713
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
714
+ ```
659
715
"""
660
716
macro pstats (args... )
661
717
if isempty (args)
0 commit comments