Tracing tools for ROS 2.
ros2_tracing
provides tracing instrumentation for the core ROS 2 packages.
It also provides tools to configure tracing through a launch action and a ros2
CLI command.
For more information about tracing, see the What is tracing? section.
ros2_tracing
currently only supports the LTTng tracer.
Consequently, it currently only supports Linux.
Note
Make sure to use the right branch, depending on the ROS 2 distro: use rolling
for Rolling, galactic
for Galactic, etc.
Read the ros2_tracing
paper!
If you use or refer to ros2_tracing
, please cite:
-
C. Bédard, I. Lütkebohle, and M. Dagenais, "ros2_tracing: Multipurpose Low-Overhead Framework for Real-Time Tracing of ROS 2," IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 6511–6518, 2022.
BibTeX
@article{bedard2022ros2tracing, title={ros2\_tracing: Multipurpose Low-Overhead Framework for Real-Time Tracing of ROS 2}, author={B{\'e}dard, Christophe and L{\"u}tkebohle, Ingo and Dagenais, Michel}, journal={IEEE Robotics and Automation Letters}, year={2022}, volume={7}, number={3}, pages={6511--6518}, doi={10.1109/LRA.2022.3174346} }
This other paper leverages ros2_tracing
to analyze and visualize the flow of messages across distributed ROS 2 systems:
-
C. Bédard, P.-Y. Lajoie, G. Beltrame, and M. Dagenais, "Message Flow Analysis with Complex Causal Links for Distributed ROS 2 Systems," Robotics and Autonomous Systems, vol. 161, p. 104361, 2023.
BibTeX
@article{bedard2023messageflow, title={Message flow analysis with complex causal links for distributed {ROS} 2 systems}, author={B{\'e}dard, Christophe and Lajoie, Pierre-Yves and Beltrame, Giovanni and Dagenais, Michel}, journal={Robotics and Autonomous Systems}, year={2023}, volume={161}, pages={104361}, doi={10.1016/j.robot.2022.104361} }
Finally, check out the following presentations:
- ROSCon 2023: "Improving Your Application's Algorithms and Optimizing Performance Using Trace Data" (video, slides)
- ROS World 2021: "Tracing ROS 2 with ros2_tracing" (video, slides)
- ROS 2 documentation:
- ROS World 2021 demo: github.com/christophebedard/ros-world-2021-demo
Starting from ROS 2 Iron Irwini, the LTTng tracer is a ROS 2 dependency.
Therefore, ROS 2 can be traced out-of-the-box on Linux; this package does not need to be re-built.
The following rmw
implementations are supported:
rmw_connextdds
rmw_cyclonedds_cpp
rmw_fastrtps_cpp
rmw_fastrtps_dynamic_cpp
rmw_zenoh_cpp
To make sure that the instrumentation and tracepoints are available:
$ source /opt/ros/rolling/setup.bash # With a binary install
$ source ./install/setup.bash # When building from source
$ ros2 run tracetools status
Tracing enabled
A ROS 2 installation only includes the LTTng userspace tracer (LTTng-UST), which is all that is needed to trace ROS 2. To trace the Linux kernel, the LTTng kernel tracer must be installed separately:
$ sudo apt-get update
$ sudo apt-get install lttng-modules-dkms
For more information about LTTng, refer to its documentation.
To avoid loading the tracer at runtime (and therefore disable all instrumentation), set the TRACETOOLS_RUNTIME_DISABLE
environment variable to 1
:
$ export TRACETOOLS_RUNTIME_DISABLE=1
$ ros2 run tracetools status
Tracing disabled
To build and remove all instrumentation, use TRACETOOLS_DISABLED
:
$ colcon build --cmake-args -DTRACETOOLS_DISABLED=ON
This will remove all instrumentation from the core ROS 2 packages, and thus they will not depend on or link against the shared library provided by the tracetools
package.
This also means that LTTng is not required at build-time or at runtime.
Alternatively, to only exclude the actual tracepoints, use TRACETOOLS_TRACEPOINTS_EXCLUDED
:
$ colcon build --packages-select tracetools --cmake-clean-cache --cmake-args -DTRACETOOLS_TRACEPOINTS_EXCLUDED=ON
This will keep the instrumentation but remove all tracepoints.
This also means that LTTng is not required at build-time or at runtime.
This option can be useful, since tracepoints can be added back in or removed by simply replacing/re-building the shared library provided by the tracetools
package.
Software tracing is a method of collecting low-level runtime data to understand a system's execution. This is achieved by instrumenting the code using tracepoints, for example in ROS 2, the Linux kernel, or any other application. When a tracepoint is executed, it generates information that is collected by a tracer into a trace. Tracers are usually low-overhead to avoid affecting the execution. Traces can then be analyzed to help understand the execution, fix bugs, improve performance, etc. While logs are typically high-level enough for a user to read and understand, trace data is low-level & high-rate and therefore usually needs to be processed to be useful.
For more information, see the following introductions on tracing:
- LTTng tracer documentation
- Tracing tutorial
- Eclipse Trace Compass (trace analysis tool) documentation
By default, trace data will not be generated, and thus these packages will have virtually no impact on execution. LTTng has to be configured for tracing. The packages in this repo provide two options: a command and a launch file action.
Note
Tracing must be started before the application is launched. Metadata is recorded during the initialization phase of the application. This metadata is needed to understand the rest of the trace data, so if tracing is started after the application started executing, then the trace data might be unusable. For more information, refer to the design document. The launch file action is designed to automatically start tracing before the application launches.
Tip
Configuring a tracing session in snapshot mode or dual session mode stores trace data in memory without writing to disk until demanded. This can be leveraged to start recording traces as needed after application launch without losing initialization data, eliminating the need to start recording all trace data before application launch and preventing accumulation of unwanted trace files when not actively analyzing.
The tracing directory can be configured using command/launch action parameters, or through environment variables with the following logic:
- Use
$ROS_TRACE_DIR
ifROS_TRACE_DIR
is set and not empty. - Otherwise, use
$ROS_HOME/tracing
, using~/.ros
forROS_HOME
if not set or if empty.
Additionally, if you're using kernel tracing with a non-root user, make sure that the tracing
group exists and that your user is added to it.
# Create group if it doesn't exist
$ sudo groupadd -r tracing
# Add user to the group
$ sudo usermod -aG tracing $USER
The first option is to use the ros2 trace
command.
$ ros2 trace
By default, it will enable all ROS 2 tracepoints.
The trace will be written to ~/.ros/tracing/session-YYYYMMDDHHMMSS
.
Run the command with -h
for more information.
The ros2 trace
command requires user interaction to start and then stop tracing.
To trace without user interaction (e.g., in scripts), or for finer-grained tracing control, the following sub-commands can be used:
$ ros2 trace start session_name # Configure tracing session and start tracing
$ ros2 trace pause session_name # Pause tracing after starting
$ ros2 trace resume session_name # Resume tracing after pausing
$ ros2 trace stop session_name # Stop tracing after starting or resuming
Run each command with -h
for more information.
You must install the kernel tracer if you want to enable kernel events (using the -k
/--kernel-events
option) or syscalls (using the --syscalls
option).
If you have installed the kernel tracer, use kernel tracing, and still encounter an error here, make sure to add your user to the tracing
group.
Another option is to use the Trace
action in a Python, XML, or YAML launch file along with your Node
action(s).
This way, tracing automatically starts when launching the launch file and ends when it exits or when terminated.
$ ros2 launch tracetools_launch example.launch.py
The Trace
action will also set the LD_PRELOAD
environment to preload LTTng's userspace tracing helper(s) if the corresponding event(s) are enabled.
For more information, see this example launch file and the Trace
action.
You must install the kernel tracer if you want to enable kernel events (events_kernel
in Python, events-kernel
in XML or YAML) or syscalls (syscalls
in Python, XML, or YAML).
If you have installed the kernel tracer, use kernel tracing, and still encounter an error here, make sure to add your user to the tracing
group.
By default, tracing sessions write trace data continuously to disk. Tracing sessions in snapshot mode store trace data in memory and only write to disk when a snapshot is taken. When memory buffers fill up, the oldest data is discarded, maintaining a rolling history whose size can be controlled by configuring sub-buffer size. This "flight recorder" mode is useful for capturing trace data only when something interesting occurs, avoiding continuous disk writes and thus lowering the runtime performance impact even more.
Use the --snapshot-mode
option while starting a tracing session to configure it in snapshot mode:
$ ros2 trace --snapshot-mode # requires user interaction
$ ros2 trace start session_name --snapshot-mode
By default all ROS 2 tracepoints are enabled.
Run the commands with -h
for more information.
Use record_snapshot
to take a snapshot and write the trace data to disk:
$ ros2 trace record_snapshot session_name
All other session control commands work as usual:
$ ros2 trace pause session_name
$ ros2 trace resume session_name
$ ros2 trace stop session_name
Set snapshot_mode=True
in the Trace
action to configure the tracing session in snapshot mode using launch files.
The session starts when launching the launch file and ends when it exits or when terminated.
$ ros2 launch tracetools_launch example_snapshot_mode.launch.py
Use record_snapshot
to take a snapshot and write the trace data to disk, as stated earlier.
Note
In snapshot mode, high-frequency runtime events may overwrite initialization trace data in memory before a snapshot is taken, potentially making the trace data unusable. This is more likely to occur if sub-buffer sizes are too small. Support for configuring separate channels for initialization and runtime tracepoints is planned to address this limitation (see #199).
Dual session mode solves the problem of losing initialization trace data by using two separate tracing sessions: one for initialization events in snapshot mode, and another normal tracing session for runtime events. This allows starting to actively record trace data at any point without losing initialization data.
Use the Trace
action to start the initialization session in snapshot mode:
$ ros2 launch tracetools_launch example_dual_session.launch.py
Then use the trace commands with --dual-session
option to start and control the sessions.
To take a snapshot of the initialization session and start the runtime session:
$ ros2 trace -s session_name --dual-session # requires user interaction
$ ros2 trace start session_name --dual-session
By default tracepoints corresponding to initialization ROS 2 events are enabled for the initialization (snapshot) session and all ROS 2 tracepoints are enabled for the runtime session.
The snapshot trace will be written to ~/.ros/tracing/session_name/snapshot
and the runtime trace will be written to ~/.ros/tracing/session_name/runtime
.
Run the commands with -h
for more information.
Other session control commands work as follows:
$ ros2 trace pause session_name --dual-session # Pause runtime session after starting
$ ros2 trace resume session_name --dual-session # Take a snapshot of initialization session
and resume runtime session after pausing
$ ros2 trace stop session_name --dual-session # Stop runtime session
Note
In dual session mode, the session name used in both the Trace
action and the trace commands must be the same.
See the design document.
LTTng-UST, the current default userspace tracer used for tracing ROS 2, was designed for real-time production applications. It is a low-overhead tracer with many important real-time compatible features:
- userspace tracer completely implemented in userspace, independent from the kernel
- reentrant, thread-safe, signal-safe, non-blocking
- no system calls in the fast path
- no copies of the trace data
However, some settings need to be tuned for it to be fully real-time safe and for performance to be optimal for your use-case:
- timers1: use read timer to avoid a write(2) call
- sub-buffer1 count and size:
- see documentation for sub-buffer count and size tuning tips based on your use-case
- minimize sub-buffer count to minimize sub-buffer switching overhead
- one-time memory allocation/lock/syscall per thread:
- usually done the first time a tracepoint is executed within a thread for URCU thread registration, but registration can be manually performed to force it to be done during your application's initialization
- see this LTTng mailing list message
For further reading:
- LTTng documentation
- Combined Tracing of the Kernel and Applications with LTTng: LTTng-UST architecture and design goals (section 3)
- Survey and Analysis of Kernel and Userspace Tracers on Linux: Design, Implementation, and Overhead: LTTng-UST overhead and design compared to other kernel and userspace tracers (table 6: average latency overhead per tracepoint of 158 ns)
The LTTng kernel tracer has a similar implementation, but is separate from the userspace tracer.
Package containing liblttng-ctl
Python bindings.
Package containing a ros2cli
extension to enable tracing.
Library to support instrumenting ROS packages, including core packages.
This package claims to be in the Quality Level 1 category, see the Quality Declaration for more details.
See the API documentation.
Package containing tools to enable tracing through launch files.
Package containing tools to read traces.
Package containing tools for tracing-related tests.
Package containing tools to enable tracing.
Package containing system tests for ros2trace
.
Package containing unit and system tests for tracetools
.
Package containing system tests for tracetools_launch
.
See tracetools_analysis
.
Footnotes
-
this setting cannot currently be set through the
Trace
launch file action or theros2 trace
command, see #20 ↩ ↩2