-
Notifications
You must be signed in to change notification settings - Fork 106
Description
This is probably not an issue in this library, but rather a gcc bug, but I am nevertheless filing this issue for visibility:
Consider the following files:
app.c:
#include <dlfcn.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int foo, bar;
void *handle = dlopen("./lib.so", RTLD_LAZY);
int64_t (*hdr_test)(int *, int *) = dlsym(handle, "hdr_test");
int64_t count = hdr_test(&foo, &bar);
printf("%ld\n", count);
}lib.c:
#include <stdint.h>
#include "hdr/hdr_histogram.h"
int64_t hdr_test(int *dummy, int *dummy2)
{
struct hdr_histogram *hdr;
int res = hdr_init(1, 60000, 2, &hdr);
hdr->counts[100] = 5;
hdr->counts[121] = 1;
hdr_reset_internal_counters(hdr);
return hdr->total_count;
}When compiling HdrHistogram_c as follows with gcc 12.2 (e.g. within a Debian 12 Docker Container):
env CFLAGS="-O2 -ggdb" cmake .
make
make install
And then compiling lib.c and app.c as follows:
gcc -shared -g -O0 lib.c -lhdr_histogram_static -lm -o lib.so
gcc app.c -o app
Then executing ./app will return some arbitrary value, instead of the expected value of 6:
$ ./app
129089121466008
$ ./app
126604316296856
$ ./app
131478882424472
I was able to track this down to the ipa-ra optimization of gcc, which is defined as:
Use caller save registers for allocation if those registers are not used by any called function. In that case it is not necessary to save and restore them around calls. This is only possible if called functions are part of same compilation unit as current function and they are compiled before it.
Enabled at levels -O2, -O3, -Os, however the option is disabled if generated code will be instrumented for profiling (-p, or -pg) or if callee’s register usage cannot be known exactly (this happens on targets that do not expose prologues and epilogues in RTL).
and it's enabled by default for -O2.
CFLAGS="-O2 -fno-ipa-ra -ggdb"prevents the issue.CFLAGS="-O1 -fipa-ra -ggdb"causes the issue.
The following GitHub Action workflow tests against both gcc and clang and different optimization flags:
name: Test
on:
push:
pull_request:
permissions:
contents: read
jobs:
Test:
name: Test
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v4
with:
repository: HdrHistogram/HdrHistogram_c
path: HdrHistogram_c
- run: |
sudo apt-get update
sudo apt-get install -y clang
- run: |
for CC in clang gcc; do
for optimization in "-O0" "-O1" "-O2" "-O1 -fipa-ra" "-O2 -fno-ipa-ra"; do
pushd HdrHistogram_c > /dev/null
git clean -fdx > /dev/null
if ! env CFLAGS="$optimization -ggdb" "CC=$CC" cmake . > /dev/null 2> /dev/null; then
popd > /dev/null
continue
fi
make > /dev/null
sudo make install > /dev/null
popd > /dev/null
gcc -shared -g -O0 lib.c -lhdr_histogram_static -lm -o lib.so
gcc app.c -o app
printf "CC=%s O=%s -> " "$CC" "$optimization"
./app
done
doneAn example run can be found here: https://github.com/TimWolla/hdrhistogram-gcc/actions/runs/8643179941/job/23695657623. For posterity, the output is as follows:
CC=clang O=-O0 -> 6
CC=clang O=-O1 -> 6
CC=clang O=-O2 -> 6
CC=gcc O=-O0 -> 6
CC=gcc O=-O1 -> 6
CC=gcc O=-O2 -> 140067486107456
CC=gcc O=-O1 -fipa-ra -> 140009274693666
CC=gcc O=-O2 -fno-ipa-ra -> 6
For me observed_total_count is stored in the R10 register. Watching it in gdb exposes what is happening:
(gdb) break hdr_reset_internal_counters
Function "hdr_reset_internal_counters" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (hdr_reset_internal_counters) pending.
(gdb) run
Starting program: /pwd/app
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, hdr_reset_internal_counters (h=0x576dd996e1d0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:281
281 for (i = 0; i < h->counts_len; i++)
(gdb) watch $r10
Watchpoint 2: $r10
(gdb) cont
Continuing.
Watchpoint 2: $r10
Old value = 137376174487648
New value = 0
0x00007cf16214c8b5 in hdr_reset_internal_counters (h=0x576dd996e1d0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:277
277 int max_index = -1;
(gdb) cont
Continuing.
Watchpoint 2: $r10
Old value = 0
New value = 5
hdr_reset_internal_counters (h=0x576dd996e1d0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:289
289 if (min_non_zero_index == -1 && i != 0)
(gdb) cont
Continuing.
Watchpoint 2: $r10
Old value = 5
New value = 6
hdr_reset_internal_counters (h=0x576dd996e1d0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:289
289 if (min_non_zero_index == -1 && i != 0)
(gdb) cont
Continuing.
Watchpoint 2: $r10
Old value = 6
New value = 137376174486472
_dl_fixup (l=0x576dd996a2c0, reloc_arg=14) at ../sysdeps/generic/ldsodefs.h:81
81 ../sysdeps/generic/ldsodefs.h: No such file or directory.
(gdb) bt
#0 _dl_fixup (l=0x576dd996a2c0, reloc_arg=14) at ../sysdeps/generic/ldsodefs.h:81
#1 0x00007cf16234e02a in _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:130
#2 0x00007cf16214c91c in highest_equivalent_value (value=<optimized out>, h=0x576dd996a2c0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:256
#3 hdr_reset_internal_counters (h=0x576dd996a2c0) at /pwd/HdrHistogram_c/src/hdr_histogram.c:303
#4 0x00007cf16214c27d in hdr_test (dummy=0x7ffe295437d4, dummy2=0x7ffe295437d0) at lib.c:12
#5 0x0000576dd95551a7 in main ()
The dl-trampoline changes the value and gcc did not save the register on the stack due to the optimization and it not realizing that the call to hdr_next_non_equivalent_value in highest_equivalent_value will clobber the register due to the trampoline.