Skip to content

Orchagent crash due to segment fault in sairedis::Recorder #1646

@Stephenxf

Description

@Stephenxf

We observed an orchagent crash with segment fault within the sairedis lib.

Aug  9 15:10:02.971911 sonic INFO kernel: [64367534.989279] orchagent[9422]: segfault at 2967013e33cd ip 00007f0a667b45d0 sp 00007f0a65a8a598 error 4 in libc-2.28.so[7f0a6667a000+147000]
Aug  9 15:10:02.971939 sonic INFO kernel: [64367534.989288] Code: 7f 07 c5 fe 7f 4f 20 c5 fe 7f 54 17 e0 c5 fe 7f 5c 17 c0 c5 f8 77 c3 48 39 f7 0f 87 ab 00 00 00 0f 84 e5 fe ff ff c5 fe 6f 26 <c5> fe 6f 6c 16 e0 c5 fe 6f 74 16 c0 c5 fe
 6f 7c 16 a0 c5 7e 6f 44

From the stacktrace, it looks like a race between record writing (thread 1) and log rotation/reopen (thread 2) inside sairedis::Recorder.

(gdb) thread apply all bt

Thread 3 (Thread 0x7f0a6628c700 (LWP 57)):
#0  0x00007f0a6675138f in epoll_wait (epfd=6, events=0x7f0a60002260, maxevents=1, timeout=1000) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00007f0a66c6841f in swss::Select::poll_descriptors(swss::Selectable**, unsigned int) () from /usr/lib/x86_64-linux-gnu/libswsscommon.so.0
#2  0x00007f0a66c6864b in swss::Select::select(swss::Selectable**, int) () from /usr/lib/x86_64-linux-gnu/libswsscommon.so.0
#3  0x00007f0a66c32b53 in swss::Logger::settingThread() () from /usr/lib/x86_64-linux-gnu/libswsscommon.so.0
#4  0x00007f0a66a72b2f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f0a66facfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f0a6675106f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f0a6628fbc0 (LWP 42)):
#0  __lseek64 (fd=22, offset=0, whence=2) at ../sysdeps/unix/sysv/linux/lseek64.c:36
#1  0x00007f0a666d3ff6 in __GI__IO_file_open (fp=fp@entry=0x55a3674ded30, filename=<optimized out>, posix_mode=<optimized out>, prot=prot@entry=438, read_write=<optimized out>, is32not64=is32not64@entry=1) at libioP.h:839
#2  0x00007f0a666d413b in _IO_new_file_fopen (fp=fp@entry=0x55a3674ded30, filename=filename@entry=0x55a36770a520 "/var/log/swss/sairedis.rec", mode=<optimized out>, mode@entry=0x7f0a66aed581 "a", is32not64=is32not64@entry=1) at fileops.c:281
#3  0x00007f0a666c8289 in __fopen_internal (filename=0x55a36770a520 "/var/log/swss/sairedis.rec", mode=0x7f0a66aed581 "a", is32=1) at iofopen.c:75
#4  0x00007f0a66a69ee0 in std::__basic_file<char>::open(char const*, std::_Ios_Openmode, int) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f0a66aa680a in std::basic_filebuf<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f0a66f48c02 in sairedis::Recorder::recordingFileReopen() () from /usr/lib/x86_64-linux-gnu/libsairedis.so.0
#7  0x00007f0a66f48dac in sairedis::Recorder::requestLogRotate() () from /usr/lib/x86_64-linux-gnu/libsairedis.so.0
#8  0x00007f0a66f5504a in sairedis::RedisRemoteSaiInterface::setRedisExtensionAttribute(_sai_object_type_t, unsigned long, _sai_attribute_t const*) () from /usr/lib/x86_64-linux-gnu/libsairedis.so.0
#9  0x00007f0a66f55610 in sairedis::RedisRemoteSaiInterface::set(_sai_object_type_t, unsigned long, _sai_attribute_t const*) () from /usr/lib/x86_64-linux-gnu/libsairedis.so.0
#10 0x00007f0a66f379fa in sairedis::Sai::set(_sai_object_type_t, unsigned long, _sai_attribute_t const*) () from /usr/lib/x86_64-linux-gnu/libsairedis.so.0
#11 0x00007f0a66f32b0d in ?? () from /usr/lib/x86_64-linux-gnu/libsairedis.so.0
#12 0x000055a3649efe31 in ?? ()
#13 0x000055a3649f0528 in ?? ()
#14 0x000055a3649cd31d in ?? ()
#15 0x00007f0a6667c09b in __libc_start_main (main=0x55a3649cc6c0, argc=8, argv=0x7fff1cde6c58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff1cde6c48) at ../csu/libc-start.c:308
#16 0x000055a3649e649a in ?? ()

Thread 1 (Thread 0x7f0a65a8b700 (LWP 59)):
#0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:384
#1  0x00007f0a66ad3998 in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007f0a66ac4b34 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f0a66f421e6 in sairedis::Recorder::recordLine(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/x86_64-linux-gnu/libsairedis.so.0
#4  0x00007f0a66f467f7 in sairedis::Recorder::recordNotification(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) () from /usr/lib/x86_64-linux-gnu/libsairedis.so.0
#5  0x00007f0a66f4c801 in sairedis::RedisRemoteSaiInterface::handleNotification(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) () from /usr/lib/x86_64-linux-gnu/libsairedis.so.0
#6  0x00007f0a66f6b7b1 in sairedis::RedisChannel::notificationThreadFunction() () from /usr/lib/x86_64-linux-gnu/libsairedis.so.0
#7  0x00007f0a66a72b2f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007f0a66facfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#9  0x00007f0a6675106f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This is with 202012 codebase. I can probably dig further into the core and get additional information, but are there known issues related to this? Any enhancements introduced to later releases to, say, guard the access to filebuf by the same mutex lock? Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions