MINOR: Add le2dec function #4

Garfield96 · 2024-03-04T11:53:37Z

No description provided.

Due to QUIC packet reordering, a stream may be opened via a new RESET_STREAM or STOP_SENDING frame. This would cause either Tx or Rx channel to be immediately closed. This can cause an issue with current QUIC MUX implementation with QCS purging. QCS are inserted into QCC purge list when transfer could be considered as completed. In most cases, this happens after full request/response exchange. However, it can also happens after request reception if RESET_STREAM/STOP_SENDING are received first. A BUG_ON() crash will occur if a STREAM frame is received after. In this case, streamdesc instance will be attached via qcs_attach_sc() to handle the new request. However, QCS is already considered eligible to purging. It could cause it to be released while its streamdesc instance remains. A BUG_ON() crash detects this problem in qcc_purge_streams(). To fix this, extend qcc_decode_qcs() to skip app proto rcv_buf invokation if QCS is considered completed. A similar condition was already implemented when read was previously aborted after a STOP_SENDING emission by QUIC MUX. This crash was reproduced on haproxy.org. Here is the output of the backtrace : Core was generated by `./haproxy-dev -db -f /etc/haproxy/haproxy-current.cfg -sf 16495'. Program terminated with signal SIGILL, Illegal instruction. #0 0x00000000004e442b in qcc_purge_streams (qcc=0x774cca0) at src/mux_quic.c:2661 2661 BUG_ON_HOT(!qcs_is_completed(qcs)); [Current thread is 1 (LWP 1457)] [ ## gdb ## ] bt #0 0x00000000004e442b in qcc_purge_streams (qcc=0x774cca0) at src/mux_quic.c:2661 #1 0x00000000004e4db7 in qcc_io_process (qcc=0x774cca0) at src/mux_quic.c:2744 #2 0x00000000004e5a54 in qcc_io_cb (t=0x7f71193940c0, ctx=0x774cca0, status=573504) at src/mux_quic.c:2886 #3 0x0000000000b4f792 in run_tasks_from_lists (budgets=0x7ffdcea1e670) at src/task.c:603 #4 0x0000000000b5012f in process_runnable_tasks () at src/task.c:883 #5 0x00000000007de4a3 in run_poll_loop () at src/haproxy.c:2771 #6 0x00000000007deb9f in run_thread_poll_loop (data=0x1335a00 <ha_thread_info>) at src/haproxy.c:2985 #7 0x00000000007dfd8d in main (argc=6, argv=0x7ffdcea1e958) at src/haproxy.c:3570 This BUG_ON() crash can only happen since 3.1 refactoring. Indeed, purge list was only implemented on this version. As such, please backport it on 3.1 immediately. However, a logic issue remains for older version as a stream could be attached on a fully closed QCS. Thus, it should be backported up to 2.8, this time after a period of observation.

Garfield96 · 2025-08-22T10:21:46Z

Was merged to HAProxy master via haproxy@ffbb3cc.

…ll() In the following trace trying to abuse the watchdog from the CLI's "debug dev loop" command running in parallel to "show threads" loops, it's clear that some re-entrance may happen in ha_thread_dump_fill(). A first minimal fix consists in using a test-and-set on the flag indicating that the function is currently dumping threads, so that the one from the signal just returns. However the caller should be made more reliable to serialize all of this, that's for future work. Here's an example capture of 7 threads stuck waiting for each other: (gdb) bt #0 0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6 #1 0x0000000000674a05 in ha_thread_relax () at src/thread.c:356 #2 0x00000000005ba4f5 in ha_thread_dump_fill (thr=2, buf=0x7ffdd8e08ab0) at src/debug.c:402 #3 ha_thread_dump_fill (buf=0x7ffdd8e08ab0, thr=<optimized out>) at src/debug.c:384 #4 0x00000000005baac4 in ha_stuck_warning (thr=thr@entry=2) at src/debug.c:840 #5 0x00000000006a360d in wdt_handler (sig=<optimized out>, si=<optimized out>, arg=<optimized out>) at src/wdt.c:156 #6 <signal handler called> #7 0x00007fe78d78e147 in sched_yield () from /lib64/libc.so.6 haproxy#8 0x0000000000674a05 in ha_thread_relax () at src/thread.c:356 haproxy#9 0x00000000005ba4c2 in ha_thread_dump_fill (thr=2, buf=0x7fe78f2d6420) at src/debug.c:426 haproxy#10 ha_thread_dump_fill (buf=0x7fe78f2d6420, thr=2) at src/debug.c:384 haproxy#11 0x00000000005ba7c6 in cli_io_handler_show_threads (appctx=0x2a89ab0) at src/debug.c:548 haproxy#12 0x000000000057ea43 in cli_io_handler (appctx=0x2a89ab0) at src/cli.c:1176 haproxy#13 0x00000000005d7885 in task_process_applet (t=0x2a82730, context=0x2a89ab0, state=<optimized out>) at src/applet.c:920 haproxy#14 0x0000000000659002 in run_tasks_from_lists (budgets=budgets@entry=0x7ffdd8e0a5c0) at src/task.c:644 haproxy#15 0x0000000000659bd7 in process_runnable_tasks () at src/task.c:886 haproxy#16 0x00000000005cdcc9 in run_poll_loop () at src/haproxy.c:2858 haproxy#17 0x00000000005ce457 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3075 haproxy#18 0x0000000000430628 in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3665

Stephen Farrell reported in issue haproxy#2942 that recent haproxy versions crash if there's no resolv.conf. A quick bisect with his reproducer showed that it started with commit 4194f75 ("MEDIUM: tree-wide: avoid manually initializing proxies") which reorders the proxies initialization sequence a bit. The crash shows a corrupted tree, typically indicating a use-after-free. With the help of ASAN it was possible to find that a resolver proxy had been destroyed and freed before the name insertion that causes the crash, very likely caused by the absence of the needed resolv.conf: #0 0x7ffff72a82f7 in free (/usr/local/lib64/libasan.so.5+0x1062f7) #1 0x94c1fd in free_proxy src/proxy.c:436 #2 0x9355d1 in resolvers_destroy src/resolvers.c:2604 #3 0x93e899 in resolvers_create_default src/resolvers.c:3892 #4 0xc6ed29 in httpclient_resolve_init src/http_client.c:1170 #5 0xc6fbcf in httpclient_create_proxy src/http_client.c:1310 #6 0x4ae9da in ssl_ocsp_update_precheck src/ssl_ocsp.c:1452 #7 0xa1b03f in step_init_2 src/haproxy.c:2050 But free_proxy() doesn't delete the ebpt_node that carries the name, which perfectly explains the situation. This patch simply deletes the name node and Stephen confirmed that it fixed the problem for him as well. Let's also free it since the key points to p->id which is never freed either in this function! No backport is needed since the patch above was first merged into 3.2-dev10.

For frontend side, quic_conn is only released if MUX wasn't allocated, either due to handshake abort, in which case upper layer is never allocated, or after transfer completion when full conn + MUX layers are already released. On the backend side, initialization is not performed in the same order. Indeed, in this case, connection is first instantiated, the nthe quic_conn is created to execute the handshake, while MUX is still only allocated on handshake completion. As such, it is not possible anymore to free immediately quic_conn on handshake failure. Else, this can cause crash if the connection try to reaccess to its transport layer after quic_conn release. Such crash can easily be reproduced in case of connection error to the QUIC server. Here is an example of an experienced backtrace. Thread 1 "haproxy" received signal SIGSEGV, Segmentation fault. 0x0000555555739733 in quic_close (conn=0x55555734c0d0, xprt_ctx=0x5555573a6e50) at src/xprt_quic.c:28 28 qc->conn = NULL; [ ## gdb ## ] bt #0 0x0000555555739733 in quic_close (conn=0x55555734c0d0, xprt_ctx=0x5555573a6e50) at src/xprt_quic.c:28 #1 0x00005555559c9708 in conn_xprt_close (conn=0x55555734c0d0) at include/haproxy/connection.h:162 #2 0x00005555559c97d2 in conn_full_close (conn=0x55555734c0d0) at include/haproxy/connection.h:206 #3 0x00005555559d01a9 in sc_detach_endp (scp=0x7fffffffd648) at src/stconn.c:451 #4 0x00005555559d05b9 in sc_reset_endp (sc=0x55555734bf00) at src/stconn.c:533 #5 0x000055555598281d in back_handle_st_cer (s=0x55555734adb0) at src/backend.c:2754 #6 0x000055555588158a in process_stream (t=0x55555734be10, context=0x55555734adb0, state=516) at src/stream.c:1907 #7 0x0000555555dc31d9 in run_tasks_from_lists (budgets=0x7fffffffdb30) at src/task.c:655 haproxy#8 0x0000555555dc3dd3 in process_runnable_tasks () at src/task.c:889 haproxy#9 0x0000555555a1daae in run_poll_loop () at src/haproxy.c:2865 haproxy#10 0x0000555555a1e20c in run_thread_poll_loop (data=0x5555569d1c00 <ha_thread_info>) at src/haproxy.c:3081 haproxy#11 0x0000555555a1f66b in main (argc=5, argv=0x7fffffffde18) at src/haproxy.c:3671 To fix this, change the condition prior to calling quic_conn release. If <conn> member is not NULL, delay the release, similarly to the case when MUX is allocated. This allows connection to be freed first, and detach from quic_conn layer through close xprt operation. No need to backport.

Following c24de07 ("OPTIM: stats: store fast sharded counters pointers at session and stream level") some crashes were observed in connect_server(): #0 0x00000000007ba39c in connect_server (s=0x65117b0) at src/backend.c:2101 2101 _HA_ATOMIC_INC(&s->sv_tgcounters->connect); Missing separate debuginfos, use: debuginfo-install glibc-2.17-325.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 nss-softokn-freebl-3.67.0-3.el7_9.x86_64 pcre-8.32-17.el7.x86_64 (gdb) bt #0 0x00000000007ba39c in connect_server (s=0x65117b0) at src/backend.c:2101 #1 0x00000000007baff8 in back_try_conn_req (s=0x65117b0) at src/backend.c:2378 #2 0x00000000006c0e9f in process_stream (t=0x650f180, context=0x65117b0, state=8196) at src/stream.c:2366 #3 0x0000000000bd3e51 in run_tasks_from_lists (budgets=0x7ffd592752e0) at src/task.c:655 #4 0x0000000000bd49ef in process_runnable_tasks () at src/task.c:889 #5 0x0000000000851169 in run_poll_loop () at src/haproxy.c:2834 #6 0x0000000000851865 in run_thread_poll_loop (data=0x1a03580 <ha_thread_info>) at src/haproxy.c:3050 #7 0x0000000000852a53 in main (argc=7, argv=0x7ffd592755f8) at src/haproxy.c:3637 Here the crash occurs during the atomic inc of a sv_tgcounters metric from the stream pointer, which tells us the pointer is likely garbage. In fact, we assign s->sv_tgcounters each time the stream target is set to a valid server. For that we use stream_set_srv_target() helper which does assigment for us. By reviewing the code, in turns out we forgot to call stream_set_srv_target() in pendconn_dequeue(), where the stream target is set to the server who picked the pendconn. Let's fix the bug by using stream_set_srv_target() there. No backport needed unless c24de07 is.

When an SNI is set on a QUIC server line, ssl_sock_set_servername() is called from connect_server() (backend.c). This leads some BUG_ON() to be triggered because the CO_FL_WAIT_L6_CONN | CO_FL_SSL_WAIT_HS were not set. This must be done into the ->init() xprt callback. This patch move the flags settings from ->start() to ->init() callback. Indeed, connect_server() calls these functions in this order: ->init(), ssl_sock_set_servername() # => crash if CO_FL_WAIT_L6_CONN | CO_FL_SSL_WAIT_HS not set ->start() Furthermore ssl_sock_set_servername() has a side effect to reset the SSL_SESSION object (attached to SSL object) calling SSL_set_session(), leading to crashes as follows: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `./haproxy -f quic_srv.cfg'. Program terminated with signal SIGSEGV, Segmentation fault. #0 tls_process_server_hello (s=0x560c259733b0, pkt=0x7fffac239f20) at ssl/statem/statem_clnt.c:1624 1624 if (s->session->session_id_length > 0) { [Current thread is 1 (Thread 0x7fc364e53dc0 (LWP 35514))] (gdb) bt #0 tls_process_server_hello (s=0x560c259733b0, pkt=0x7fffac239f20) at ssl/statem/statem_clnt.c:1624 #1 0x00007fc36540fba4 in ossl_statem_client_process_message (s=0x560c259733b0, pkt=0x7fffac239f20) at ssl/statem/statem_clnt.c:1042 #2 0x00007fc36540d028 in read_state_machine (s=0x560c259733b0) at ssl/statem/statem.c:646 #3 0x00007fc36540ca70 in state_machine (s=0x560c259733b0, server=0) at ssl/statem/statem.c:439 #4 0x00007fc36540c576 in ossl_statem_connect (s=0x560c259733b0) at ssl/statem/statem.c:250 #5 0x00007fc3653f1698 in SSL_do_handshake (s=0x560c259733b0) at ssl/ssl_lib.c:3835 #6 0x0000560c22620327 in qc_ssl_do_hanshake (qc=qc@entry=0x560c25961f60, ctx=ctx@entry=0x560c25963020) at src/quic_ssl.c:863 #7 0x0000560c226210be in qc_ssl_provide_quic_data (len=90, data=<optimized out>, ctx=0x560c25963020, level=ssl_encryption_initial, ncbuf=0x560c2588bb18) at src/quic_ssl.c:1071 haproxy#8 qc_ssl_provide_all_quic_data (qc=qc@entry=0x560c25961f60, ctx=0x560c25963020) at src/quic_ssl.c:1123 haproxy#9 0x0000560c2260ca5f in quic_conn_io_cb (t=0x560c25962f80, context=0x560c25961f60, state=<optimized out>) at src/quic_conn.c:791 haproxy#10 0x0000560c228255ed in run_tasks_from_lists (budgets=<optimized out>) at src/task.c:648 haproxy#11 0x0000560c22825f7a in process_runnable_tasks () at src/task.c:889 haproxy#12 0x0000560c22793dc7 in run_poll_loop () at src/haproxy.c:2836 haproxy#13 0x0000560c22794481 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3056 haproxy#14 0x0000560c2259082d in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3667 <s> is the SSL object, and <s->session> is the SSL_SESSION object. For the client, this is the first call do SSL_do_handshake() which initializes this SSL_SESSION object from ->init() xpt callback. Then it is reset by ssl_sock_set_servername(), then tls_process_server_hello() TLS stack is called with NULL value for s->session when receiving the ServerHello TLS message. To fix this, simply move the first call to SSL_do_handshake to ->start xprt call back (qc_xprt_start()). No need to backport.

This bug arrived with this commit: MINOR: quic: implement cc-algo server keyword where <srv> keywords list with a missing array NULL termination inside was introduced to parse the QUIC backend CC algorithms. Detected by ASAN during ssl/add_ssl_crt-list.vtc execution as follows: *** h1 debug|==4066081==ERROR: AddressSanitizer: global-buffer-overflow on address 0x5562e31dedb8 at pc 0x5562e298951f bp 0x7ffe9f9f2b40 sp 0x7ffe9f9f2b38 *** h1 debug|READ of size 8 at 0x5562e31dedb8 thread T0 **** dT 0.173 *** h1 debug| #0 0x5562e298951e in srv_find_kw src/server.c:789 *** h1 debug| #1 0x5562e2989630 in _srv_parse_kw src/server.c:3847 *** h1 debug| #2 0x5562e299db1f in parse_server src/server.c:4024 *** h1 debug| #3 0x5562e2c86ea4 in cfg_parse_listen src/cfgparse-listen.c:593 *** h1 debug| #4 0x5562e2b0ede9 in parse_cfg src/cfgparse.c:2708 *** h1 debug| #5 0x5562e2c47d48 in read_cfg src/haproxy.c:1077 *** h1 debug| #6 0x5562e2682055 in main src/haproxy.c:3366 *** h1 debug| #7 0x7ff3ff867249 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 *** h1 debug| haproxy#8 0x7ff3ff867304 in __libc_start_main_impl ../csu/libc-start.c:360 *** h1 debug| haproxy#9 0x5562e26858d0 in _start (/home/flecaille/src/haproxy/haproxy+0x2638d0) *** h1 debug| *** h1 debug|0x5562e31dedb8 is located 40 bytes to the left of global variable 'bind_kws' defined in 'src/cfgparse-quic.c:255:28' (0x5562e31dede0) of size 120 *** h1 debug|0x5562e31dedb8 is located 0 bytes to the right of global variable 'srv_kws' defined in 'src/cfgparse-quic.c:264:27' (0x5562e31ded80) of size 56 *** h1 debug|SUMMARY: AddressSanitizer: global-buffer-overflow src/server.c:789 in srv_find_kw *** h1 debug|Shadow bytes around the buggy address: *** h1 debug| 0x0aacdc633d60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *** h1 debug| 0x0aacdc633d70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *** h1 debug| 0x0aacdc633d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *** h1 debug| 0x0aacdc633d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *** h1 debug| 0x0aacdc633da0: 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 *** h1 debug|=>0x0aacdc633db0: 00 00 00 00 00 00 00[f9]f9 f9 f9 f9 00 00 00 00 *** h1 debug| 0x0aacdc633dc0: 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 *** h1 debug| 0x0aacdc633dd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *** h1 debug| 0x0aacdc633de0: 00 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 f9 *** h1 debug| 0x0aacdc633df0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 *** h1 debug| 0x0aacdc633e00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 *** h1 debug|Shadow byte legend (one shadow byte represents 8 application bytes): This should be backported where the commit above is supposed to be backported.

Garfield96 self-assigned this Mar 4, 2024

MINOR: Add le2dec function

b7a8266

Garfield96 force-pushed the le2dec branch from 81a16c6 to b7a8266 Compare March 4, 2024 16:27

Add test

c8f9e15

trxa approved these changes Mar 11, 2024

View reviewed changes

Merge remote-tracking branch 'origin/master' into le2dec

77ebd88

Garfield96 closed this Aug 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MINOR: Add le2dec function #4

MINOR: Add le2dec function #4

Uh oh!

Garfield96 commented Mar 4, 2024

Uh oh!

Garfield96 commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MINOR: Add le2dec function #4

MINOR: Add le2dec function #4

Uh oh!

Conversation

Garfield96 commented Mar 4, 2024

Uh oh!

Garfield96 commented Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants