Skip to content

Commit 5b19c75

Browse files
authored
gh-138122: Make the tachyon profiler opcode-aware (#142394)
1 parent fa44845 commit 5b19c75

36 files changed

+3983
-507
lines changed

Doc/library/profiling.sampling.rst

Lines changed: 90 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313

1414
**Source code:** :source:`Lib/profiling/sampling/`
1515

16+
.. program:: profiling.sampling
17+
1618
--------------
1719

1820
.. image:: tachyon-logo.png
@@ -146,6 +148,10 @@ Generate a line-by-line heatmap::
146148

147149
python -m profiling.sampling run --heatmap script.py
148150

151+
Enable opcode-level profiling to see which bytecode instructions are executing::
152+
153+
python -m profiling.sampling run --opcodes --flamegraph script.py
154+
149155

150156
Commands
151157
========
@@ -308,7 +314,7 @@ The two most fundamental parameters are the sampling interval and duration.
308314
Together, these determine how many samples will be collected during a profiling
309315
session.
310316

311-
The ``--interval`` option (``-i``) sets the time between samples in
317+
The :option:`--interval` option (:option:`-i`) sets the time between samples in
312318
microseconds. The default is 100 microseconds, which produces approximately
313319
10,000 samples per second::
314320

@@ -319,7 +325,7 @@ cost of slightly higher profiler CPU usage. Higher intervals reduce profiler
319325
overhead but may miss short-lived functions. For most applications, the
320326
default interval provides a good balance between accuracy and overhead.
321327

322-
The ``--duration`` option (``-d``) sets how long to profile in seconds. The
328+
The :option:`--duration` option (:option:`-d`) sets how long to profile in seconds. The
323329
default is 10 seconds::
324330

325331
python -m profiling.sampling run -d 60 script.py
@@ -337,8 +343,8 @@ Python programs often use multiple threads, whether explicitly through the
337343
:mod:`threading` module or implicitly through libraries that manage thread
338344
pools.
339345

340-
By default, the profiler samples only the main thread. The ``--all-threads``
341-
option (``-a``) enables sampling of all threads in the process::
346+
By default, the profiler samples only the main thread. The :option:`--all-threads`
347+
option (:option:`-a`) enables sampling of all threads in the process::
342348

343349
python -m profiling.sampling run -a script.py
344350

@@ -357,7 +363,7 @@ additional context about what the interpreter is doing at the moment each
357363
sample is taken. These synthetic frames help distinguish different types of
358364
execution that would otherwise be invisible.
359365

360-
The ``--native`` option adds ``<native>`` frames to indicate when Python has
366+
The :option:`--native` option adds ``<native>`` frames to indicate when Python has
361367
called into C code (extension modules, built-in functions, or the interpreter
362368
itself)::
363369

@@ -369,7 +375,7 @@ in the Python function that made the call. This is useful when optimizing
369375
code that makes heavy use of C extensions like NumPy or database drivers.
370376

371377
By default, the profiler includes ``<GC>`` frames when garbage collection is
372-
active. The ``--no-gc`` option suppresses these frames::
378+
active. The :option:`--no-gc` option suppresses these frames::
373379

374380
python -m profiling.sampling run --no-gc script.py
375381

@@ -379,10 +385,48 @@ see substantial time in ``<GC>`` frames, consider investigating object
379385
allocation rates or using object pooling.
380386

381387

388+
Opcode-aware profiling
389+
----------------------
390+
391+
The :option:`--opcodes` option enables instruction-level profiling that captures
392+
which Python bytecode instructions are executing at each sample::
393+
394+
python -m profiling.sampling run --opcodes --flamegraph script.py
395+
396+
This feature provides visibility into Python's bytecode execution, including
397+
adaptive specialization optimizations. When a generic instruction like
398+
``LOAD_ATTR`` is specialized at runtime into a more efficient variant like
399+
``LOAD_ATTR_INSTANCE_VALUE``, the profiler shows both the specialized name
400+
and the base instruction.
401+
402+
Opcode information appears in several output formats:
403+
404+
- **Live mode**: An opcode panel shows instruction-level statistics for the
405+
selected function, accessible via keyboard navigation
406+
- **Flame graphs**: Nodes display opcode information when available, helping
407+
identify which instructions consume the most time
408+
- **Heatmap**: Expandable bytecode panels per source line show instruction
409+
breakdown with specialization percentages
410+
- **Gecko format**: Opcode transitions are emitted as interval markers in the
411+
Firefox Profiler timeline
412+
413+
This level of detail is particularly useful for:
414+
415+
- Understanding the performance impact of Python's adaptive specialization
416+
- Identifying hot bytecode instructions that might benefit from optimization
417+
- Analyzing the effectiveness of different code patterns at the instruction level
418+
- Debugging performance issues that occur at the bytecode level
419+
420+
The :option:`--opcodes` option is compatible with :option:`--live`, :option:`--flamegraph`,
421+
:option:`--heatmap`, and :option:`--gecko` formats. It requires additional memory to store
422+
opcode information and may slightly reduce sampling performance, but provides
423+
unprecedented visibility into Python's execution model.
424+
425+
382426
Real-time statistics
383427
--------------------
384428

385-
The ``--realtime-stats`` option displays sampling rate statistics during
429+
The :option:`--realtime-stats` option displays sampling rate statistics during
386430
profiling::
387431

388432
python -m profiling.sampling run --realtime-stats script.py
@@ -434,7 +478,7 @@ CPU execution time, or time spent holding the global interpreter lock.
434478
Wall-clock mode
435479
---------------
436480

437-
Wall-clock mode (``--mode=wall``) captures all samples regardless of what the
481+
Wall-clock mode (:option:`--mode`\ ``=wall``) captures all samples regardless of what the
438482
thread is doing. This is the default mode and provides a complete picture of
439483
where time passes during program execution::
440484

@@ -454,7 +498,7 @@ latency.
454498
CPU mode
455499
--------
456500

457-
CPU mode (``--mode=cpu``) records samples only when the thread is actually
501+
CPU mode (:option:`--mode`\ ``=cpu``) records samples only when the thread is actually
458502
executing on a CPU core::
459503

460504
python -m profiling.sampling run --mode=cpu script.py
@@ -488,7 +532,7 @@ connection pooling, or reducing wait time instead.
488532
GIL mode
489533
--------
490534

491-
GIL mode (``--mode=gil``) records samples only when the thread holds Python's
535+
GIL mode (:option:`--mode`\ ``=gil``) records samples only when the thread holds Python's
492536
global interpreter lock::
493537

494538
python -m profiling.sampling run --mode=gil script.py
@@ -520,7 +564,7 @@ output goes to stdout, a file, or a directory depending on the format.
520564
pstats format
521565
-------------
522566

523-
The pstats format (``--pstats``) produces a text table similar to what
567+
The pstats format (:option:`--pstats`) produces a text table similar to what
524568
deterministic profilers generate. This is the default output format::
525569

526570
python -m profiling.sampling run script.py
@@ -567,31 +611,31 @@ interesting functions that highlights:
567611
samples (high cumulative/direct multiplier). These are frequently-nested
568612
functions that appear deep in many call chains.
569613

570-
Use ``--no-summary`` to suppress both the legend and summary sections.
614+
Use :option:`--no-summary` to suppress both the legend and summary sections.
571615

572616
To save pstats output to a file instead of stdout::
573617

574618
python -m profiling.sampling run -o profile.txt script.py
575619

576620
The pstats format supports several options for controlling the display.
577-
The ``--sort`` option determines the column used for ordering results::
621+
The :option:`--sort` option determines the column used for ordering results::
578622

579623
python -m profiling.sampling run --sort=tottime script.py
580624
python -m profiling.sampling run --sort=cumtime script.py
581625
python -m profiling.sampling run --sort=nsamples script.py
582626

583-
The ``--limit`` option restricts output to the top N entries::
627+
The :option:`--limit` option restricts output to the top N entries::
584628

585629
python -m profiling.sampling run --limit=30 script.py
586630

587-
The ``--no-summary`` option suppresses the header summary that precedes the
631+
The :option:`--no-summary` option suppresses the header summary that precedes the
588632
statistics table.
589633

590634

591635
Collapsed stacks format
592636
-----------------------
593637

594-
Collapsed stacks format (``--collapsed``) produces one line per unique call
638+
Collapsed stacks format (:option:`--collapsed`) produces one line per unique call
595639
stack, with a count of how many times that stack was sampled::
596640

597641
python -m profiling.sampling run --collapsed script.py
@@ -621,7 +665,7 @@ visualization where you can click to zoom into specific call paths.
621665
Flame graph format
622666
------------------
623667

624-
Flame graph format (``--flamegraph``) produces a self-contained HTML file with
668+
Flame graph format (:option:`--flamegraph`) produces a self-contained HTML file with
625669
an interactive flame graph visualization::
626670

627671
python -m profiling.sampling run --flamegraph script.py
@@ -667,7 +711,7 @@ or through their callees.
667711
Gecko format
668712
------------
669713

670-
Gecko format (``--gecko``) produces JSON output compatible with the Firefox
714+
Gecko format (:option:`--gecko`) produces JSON output compatible with the Firefox
671715
Profiler::
672716

673717
python -m profiling.sampling run --gecko script.py
@@ -694,14 +738,14 @@ Firefox Profiler timeline:
694738
- **Code type markers**: distinguish Python code from native (C extension) code
695739
- **GC markers**: indicate garbage collection activity
696740

697-
For this reason, the ``--mode`` option is not available with Gecko format;
741+
For this reason, the :option:`--mode` option is not available with Gecko format;
698742
all relevant data is captured automatically.
699743

700744

701745
Heatmap format
702746
--------------
703747

704-
Heatmap format (``--heatmap``) generates an interactive HTML visualization
748+
Heatmap format (:option:`--heatmap`) generates an interactive HTML visualization
705749
showing sample counts at the source line level::
706750

707751
python -m profiling.sampling run --heatmap script.py
@@ -744,7 +788,7 @@ interpretation of hierarchical visualizations.
744788
Live mode
745789
=========
746790

747-
Live mode (``--live``) provides a terminal-based real-time view of profiling
791+
Live mode (:option:`--live`) provides a terminal-based real-time view of profiling
748792
data, similar to the ``top`` command for system processes::
749793

750794
python -m profiling.sampling run --live script.py
@@ -760,6 +804,11 @@ and thread status statistics (GIL held percentage, CPU usage, GC time). The
760804
main table shows function statistics with the currently sorted column indicated
761805
by an arrow (▼).
762806

807+
When :option:`--opcodes` is enabled, an additional opcode panel appears below the
808+
main table, showing instruction-level statistics for the currently selected
809+
function. This panel displays which bytecode instructions are executing most
810+
frequently, including specialized variants and their base opcodes.
811+
763812

764813
Keyboard commands
765814
-----------------
@@ -813,20 +862,25 @@ Within live mode, keyboard commands control the display:
813862
:kbd:`h` or :kbd:`?`
814863
Show the help screen with all available commands.
815864

865+
:kbd:`j` / :kbd:`k` (or :kbd:`Up` / :kbd:`Down`)
866+
Navigate through opcode entries in the opcode panel (when ``--opcodes`` is
867+
enabled). These keys scroll through the instruction-level statistics for the
868+
currently selected function.
869+
816870
When profiling finishes (duration expires or target process exits), the display
817871
shows a "PROFILING COMPLETE" banner and freezes the final results. You can
818872
still navigate, sort, and filter the results before pressing :kbd:`q` to exit.
819873

820-
Live mode is incompatible with output format options (``--collapsed``,
821-
``--flamegraph``, and so on) because it uses an interactive terminal
874+
Live mode is incompatible with output format options (:option:`--collapsed`,
875+
:option:`--flamegraph`, and so on) because it uses an interactive terminal
822876
interface rather than producing file output.
823877

824878

825879
Async-aware profiling
826880
=====================
827881

828882
For programs using :mod:`asyncio`, the profiler offers async-aware mode
829-
(``--async-aware``) that reconstructs call stacks based on the task structure
883+
(:option:`--async-aware`) that reconstructs call stacks based on the task structure
830884
rather than the raw Python frames::
831885

832886
python -m profiling.sampling run --async-aware async_script.py
@@ -846,16 +900,16 @@ and presenting stacks that reflect the ``await`` chain.
846900
Async modes
847901
-----------
848902

849-
The ``--async-mode`` option controls which tasks appear in the profile::
903+
The :option:`--async-mode` option controls which tasks appear in the profile::
850904

851905
python -m profiling.sampling run --async-aware --async-mode=running async_script.py
852906
python -m profiling.sampling run --async-aware --async-mode=all async_script.py
853907

854-
With ``--async-mode=running`` (the default), only the task currently executing
908+
With :option:`--async-mode`\ ``=running`` (the default), only the task currently executing
855909
on the CPU is profiled. This shows where your program is actively spending time
856910
and is the typical choice for performance analysis.
857911

858-
With ``--async-mode=all``, tasks that are suspended (awaiting I/O, locks, or
912+
With :option:`--async-mode`\ ``=all``, tasks that are suspended (awaiting I/O, locks, or
859913
other tasks) are also included. This mode is useful for understanding what your
860914
program is waiting on, but produces larger profiles since every suspended task
861915
appears in each sample.
@@ -884,8 +938,8 @@ Option restrictions
884938
-------------------
885939

886940
Async-aware mode uses a different stack reconstruction mechanism and is
887-
incompatible with: ``--native``, ``--no-gc``, ``--all-threads``, and
888-
``--mode=cpu`` or ``--mode=gil``.
941+
incompatible with: :option:`--native`, :option:`--no-gc`, :option:`--all-threads`, and
942+
:option:`--mode`\ ``=cpu`` or :option:`--mode`\ ``=gil``.
889943

890944

891945
Command-line interface
@@ -939,6 +993,13 @@ Sampling options
939993

940994
Enable async-aware profiling for asyncio programs.
941995

996+
.. option:: --opcodes
997+
998+
Gather bytecode opcode information for instruction-level profiling. Shows
999+
which bytecode instructions are executing, including specializations.
1000+
Compatible with ``--live``, ``--flamegraph``, ``--heatmap``, and ``--gecko``
1001+
formats only.
1002+
9421003

9431004
Mode options
9441005
------------

0 commit comments

Comments
 (0)