Skip to content

Does it work and how does it work

Eric Flumerfelt edited this page Jun 17, 2022 · 1 revision

Does it work and how does it work

OS and compiler features used

Shared memory buffer

TRACE uses UNIX mmap to allow multiple processes to access a shared memory area.
After the file is created, the first portion of the memory is mapped read-only to protect
all processes from memory buffer corruption which may cause a program to access beyond the
limits of the memory buffer.

Atomics, TLS

Atomic access (across threads and processes) is part of the C/C standard since C/C++11.
TRACE uses this for when a particular TRACE call get it’s “slot” in the memory buffer.
TLS, also part of the standard, is used to store the “Trace ID” associated with each TRACE.
Also, for the “slow-destination message limiting” feature, the current timespan start, state, and
count are in TLS.

Variadic Macros

ariadic Macros are used to manage possible variable number of arguments.
There is a limit to the number of arguments supported. This is arbitrary and there is
no increased runtime overhead for supporting more.
There are macros to separate the format from the rest of the optional arguments.
There is a macro which counts the number of arguments.

OS Virtual System Call.

An important aspect to memory tracing is that no (real) system call occur. Besides significant overhead,
the possibility of interrupt program flow can happen when a system call occurs. Modern UNIX OSes use
virtual system call for getting time and process/thread ID information.

Delayed formatting

Delayed formatting of integers and floats has been seen to be a significant feature.
There is an architecture-specific component to this. Knowledge of the specific architects
ABI is used to implement this feature. Basically, after a certain number of integer and floating-point arguments are passed in registers, the remaining arguments go in a contiguous block of
stack memory. The macro system is used to inject unused floating-point parameters into the trace
function calls forcing the user-supplied parameters to be written to the contiguous stack memory.
The macro system is also used to count parameters. This allows for the important case where
the number of arguments is 0 and therefore no formatting need be done. This is important for when
the message is likely to include embedded format specifiers*.

* For example, when using the TLOG macro to include string variables which may include ‘%’; the LOG macro can be given an optional parameter to turn off delayed formatting to avoid this issue.

asm for tsc

C variable length array

This is used to provide “strings helper” macro for sprintf-ing strings into the message format in C.
Efficiency aside, this is not needed for C as strings can easily to included in the message using
basic std::string append/concatenate functionality.

Linux Kernel module

The TRACE kernel module can create a virtual file that can be memory mapped by user space
processes. The Kernel (starting somewhere in the 2.6 kernels) allow for “os hook” functions
to be registers — i.e. a hook for process/thread scheduling, a hook for system calls and a hook
for hw/sw interrupts can be registered which do TRACEs.

TRACE is meant to be used multi-process and multi-threaded

First, multi-threaded (w/default 4 threads)

setup TRACE                   # set TRACE; specify a version if "current" not defined
rm -f /tmp/trace_buffer_$USER # clear any existing buffer (and config (buffer will automatically be created)

export TRACE_MSGMAX=0 TRACE_LVLM=0x3f # enable circular memory and make sure trace levels 0-5 for memory are enabled
treset; tcntl -x2 test-threads >|messages.out; : -x2 causes a TRACE "name" to be assigned to each thread;\
cat messages.out | wc -l;\
tshow | wc -l;\
tlvls

The last four lines (one continued command) should produce the following:

18
44
mode:                         M=1                S=1
TID       NAME              maskM              maskS              maskT
--- ---------- ------------------ ------------------ ------------------
  0 TEST_THR_1 0x000000000000003f 0x0000000000000007 0x0000000000000000
  1 TEST_THR_3 0x000000000000003f 0x0000000000000007 0x0000000000000000
  2 TEST_THR_0 0x000000000000003f 0x0000000000000007 0x0000000000000000
  3 TEST_THR_2 0x000000000000003f 0x0000000000000007 0x0000000000000000
125      TRACE 0x000000000000003f 0x0000000000000007 0x0000000000000000
126    _TRACE_ 0x000000000000003f 0x0000000000000007 0x0000000000000000

There should be 18 lines in the messages.out file and 42+2 lines in the memory buffer.
18 = 4 threads * (2 loops * 2 TRACEs each) + 2 TRACEs from the main program. 4*2*2 + 2
44 = 4 threads * (2 loops * 5 TRACEs each) + 2 TRACEs from the main program + 2 tshow header lines. 4*2*5 + 2 +2

There are fewer lines in the memory buffer than in the output file because there are fewer bits set
in the maskM column than in the maskS column.

More TRACE lines (messages) can be generated by increasing the thread and loop counts:

$ treset; tcntl -x2 test-threads 80 -l100 >|messages.out;\
> cat messages.out | wc -l;\
> tshow | wc -l
16004
40004

40,002 (plus 2 tshow header) lines were delivered to the default 50,000 line circular buffer.

Multi-thread + Multi-process

3 * 1,600 and 3 * 40,002 lines will occur:

$ treset;\
> ( tcntl -x2 test-threads 80 -l100 &\
>   tcntl -x2 test-threads 80 -l100 &\
>   tcntl -x2 test-threads 80 -l100 ) >|messages.out;\
> cat messages.out | wc -l;\
> tshow | wc -l;\
> tinfo | grep used
48012
50002
writeIdxCount     =0x0001d4c6 entries used: 50000

The hex value 0x0001d4c6 = 120006 (from “tinfo”), which is 3 * 40,002,
indicates that the circular memory buffer wrapped around a couple of times.

Clone this wiki locally