Skip to content

Commit 51a011b

Browse files
Christopher Friedtcfriedt
authored andcommitted
tests: logging: add log_blocking testsuite to demonstrate bugfix
Add a testsuite to demonstrate proper functionality of blocking in the log processing thread. This serves to demonstrate the previous bugfix, to provide missing test coverage for the option, and also to document what pitfalls to avoid when blocking in the log processing thread. Signed-off-by: Christopher Friedt <[email protected]>
1 parent ae8c328 commit 51a011b

File tree

6 files changed

+423
-0
lines changed

6 files changed

+423
-0
lines changed
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# SPDX-License-Identifier: Apache-2.0
2+
3+
cmake_minimum_required(VERSION 3.20.0)
4+
find_package(Zephyr REQUIRED HINTS $ENV{ZEPHYR_BASE})
5+
project(log_blocking)
6+
7+
target_sources(app PRIVATE src/main.c)
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Copyright (c) 2023 Meta
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
source "Kconfig.zephyr"
5+
6+
config TEST_NUM_LOGS
7+
int "Number of log messages per test"
8+
default 5000
9+
10+
config TEST_INPUT_LOG_RATE
11+
int "Maximal input log rate for the test (in msg/s)"
12+
default 1000
13+
help
14+
Specify the maximum rate at which messages will be logged from the
15+
application.
16+
17+
config TEST_OUTPUT_LOG_RATE
18+
int "Maximal output log rate for the test (in msg/s)"
19+
default 1000
20+
help
21+
Specify the maximum rate at which log messages will be handled by
22+
the log backend.
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Blocking the Log Processing Thread
2+
3+
## Overview
4+
5+
When the core log buffer becomes full, the logging subsystem can be configured to
6+
7+
* Drop older log messages with `CONFIG_LOG_MODE_OVERFLOW=y` (**default**)
8+
* Drop newer log messages with `CONFIG_LOG_MODE_OVERFLOW=n`, or
9+
* Drop no log messages at all with `CONFIG_LOG_BLOCK_IN_THREAD=y`, `CONFIG_LOG_BLOCK_IN_THREAD_TIMEOUT_MS=-1`.
10+
11+
In the last configuration, the log processing thread will block until space
12+
becomes available again in the core log buffer.
13+
14+
> Warning ⚠️: Blocking the log processing thread is generally not recommended
15+
> and should only be attempted in advanced use cases.
16+
17+
## Logging and Flow Rates
18+
19+
There are roughly 4 scenarios we care about testing with
20+
`CONFIG_LOG_BLOCK_IN_THREAD`, and they can all be characterized by comparing
21+
log message flow rates. Typically, one would describe log message flow rates
22+
with units such as `[msg/s]`. However, in the table below, we are mainly
23+
concerned with the ratio of the Output Rate to the Input Rate, and in that
24+
case, the units themselves cancel-out. In the table we assume there exists an
25+
`N` such that `N > 1`.
26+
27+
| Name | Input Rate | Output Rate | Rate |
28+
|----------------|------------|-------------|------|
29+
| Input-Limited | 1 | N | 1 |
30+
| Matched | 1 | 1 | 1 |
31+
| Output-Limited | 1 | 1/N | 1/N |
32+
| Stalled | 0 | 0 | 0 |
33+
34+
The resultant _Rate_ is always `Rate = MIN(Input Rate, Output Rate)`.
35+
36+
Rate-limiting of any kind can be described approximately as _back pressure_.
37+
Back pressure is fine in short bursts but it can cause delays in application
38+
and driver code if the pressure is not relieved promptly.
39+
40+
## Physical Sources of Backpressure
41+
42+
Many log backends, such as UARTs, have a built-in hardware FIFO that
43+
inherently provides back-pressure; output log processing is rate-limited
44+
based on the baud rate of the UART. Other backends, such as UDP sockets or
45+
DMA, can provide significantly higher throughput but are still inherently
46+
rate-limited by the physical layer over which they operate, be it Gigabit
47+
Ethernet or PCI express.
48+
49+
Even a trivial _message source_ or _message sink_ is still rate-limited by
50+
memory or the CPU. From that perspective, we can infer that there is a finite
51+
limit in the log processing rate for practical systems. That may be
52+
comforting to know, even if it is something astronomical like 1G `[msg/s]`.
53+
54+
## Input-Limited Log Rate
55+
56+
The ideal scenario is when the output "bandwidth" exceeds the input rate. If
57+
so configured, we minimize the liklihood that the log processing thread will
58+
stall. We can also be sure that the output will be able to relieve
59+
backpressure (i.e. the core log buffer usage will tend to zero over time).
60+
61+
## Rate-Matched Input and Output
62+
63+
When the input rate and output rates are equal, one might think this is the
64+
ideal scenario. In reality, it is not. The rates could be matched, but a
65+
sustained increase (or several small increases) in the input log rate, could
66+
cause the core log buffer to approach 100% capacity. Since the output log rate
67+
is still only matched with the input log rate, the core log buffer capacity
68+
would not decrease from 100%, and it would remain saturated.
69+
70+
Logging has a tendency to be bursty, so it is definitely preferable to
71+
operate in the _Input-limited Log Rate_ regime.
72+
73+
## Output-Limited Log Rate
74+
75+
If the rate of output processing is less than the rate of input processing,
76+
the core log buffer will approach 100% capacity and, eventually, stall the
77+
log processing thread.
78+
79+
## Stalling the Log Processing Thread
80+
81+
When any log backend is unable to process logs for whatever reason,
82+
the output rate approaches 0 `[msg/s]`. If application or
83+
driver code continue to submit logs, the core log buffer approaches 100%
84+
capacity. Once the core log buffer is full, the log processing thread is
85+
unable to allocate new log messages and it will be stalled.
86+
87+
Stalling a real-time application produces unexpected behaviour, so it is
88+
advised to avoid this for any non-negligible amount of time.
89+
90+
It is absolutely critical that the log backend is capable of operating
91+
correctly _even when the log processing thread is blocking_ in order to
92+
automatically recover from a stall.
93+
94+
On a live system, it may be necessary to manually perform remediation of log
95+
backends that are unable to recover from stalling the log processing thread.
96+
Remediation could involve disabling the log backend and freeing any in-use
97+
buffers.
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
CONFIG_ZTEST=y
2+
3+
CONFIG_TEST_LOGGING_DEFAULTS=n
4+
CONFIG_LOG=y
5+
CONFIG_LOG_PRINTK=n
6+
CONFIG_LOG_BUFFER_SIZE=512
7+
CONFIG_ASSERT=y
8+
CONFIG_MAIN_STACK_SIZE=2048
9+
10+
# Disable any logs that could interfere.
11+
CONFIG_KERNEL_LOG_LEVEL_OFF=y
12+
CONFIG_SOC_LOG_LEVEL_OFF=y
13+
CONFIG_ARCH_LOG_LEVEL_OFF=y
14+
CONFIG_LOG_FUNC_NAME_PREFIX_DBG=n
15+
CONFIG_LOG_PROCESS_THREAD=y
16+
17+
# Disable all potential default backends
18+
CONFIG_LOG_BACKEND_UART=n
19+
CONFIG_LOG_BACKEND_NATIVE_POSIX=n
20+
CONFIG_LOG_BACKEND_RTT=n
21+
CONFIG_LOG_BACKEND_XTENSA_SIM=n
22+
23+
# Enable blocking in thread
24+
CONFIG_LOG_MODE_DEFERRED=y
25+
CONFIG_LOG_BLOCK_IN_THREAD=y
26+
27+
# Block in thread indefinitely
28+
CONFIG_LOG_BLOCK_IN_THREAD_TIMEOUT_MS=-1
Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
/*
2+
* Copyright (c) 2022 Nordic Semiconductor ASA
3+
* Copyright (c) 2023 Meta
4+
*
5+
* SPDX-License-Identifier: Apache-2.0
6+
*/
7+
8+
#include <stdbool.h>
9+
#include <stdint.h>
10+
#include <string.h>
11+
12+
#include <zephyr/kernel.h>
13+
#include <zephyr/logging/log.h>
14+
#include <zephyr/logging/log_ctrl.h>
15+
#include <zephyr/logging/log_backend.h>
16+
#include <zephyr/random/random.h>
17+
#include <zephyr/sys/util.h>
18+
#include <zephyr/ztest.h>
19+
20+
/* maximum time to wait before aborting thread in case of 0 log messages / second */
21+
#define MAX_STALL_TIMEOUT_S 3
22+
/* maximum time (seconds) to wait for logging thread to finish */
23+
#define MAX_JOIN_TIMEOUT_S 1
24+
25+
#define WILL_STALL (CONFIG_TEST_INPUT_LOG_RATE == 0 || CONFIG_TEST_OUTPUT_LOG_RATE == 0)
26+
27+
#define MODULE_NAME test
28+
29+
LOG_MODULE_REGISTER(MODULE_NAME);
30+
31+
struct mock_log_backend {
32+
uint32_t dropped;
33+
uint32_t handled;
34+
};
35+
36+
static uint32_t end_ms;
37+
static uint32_t start_ms;
38+
static uint32_t test_source_id;
39+
static struct mock_log_backend mock_backend;
40+
41+
static inline uint32_t then(void)
42+
{
43+
return start_ms;
44+
}
45+
46+
static inline uint32_t now(void)
47+
{
48+
/* some platforms currently _not_ starting uptime at 0!! */
49+
return k_uptime_get_32();
50+
}
51+
52+
static inline uint32_t end(void)
53+
{
54+
return end_ms;
55+
}
56+
57+
static inline void create_start_end(void)
58+
{
59+
start_ms = k_uptime_get_32();
60+
end_ms = start_ms;
61+
/* some "fuzz" in ms to account for odd variances */
62+
end_ms += MAX_STALL_TIMEOUT_S * MSEC_PER_SEC;
63+
64+
#if WILL_STALL
65+
end_ms += MAX_STALL_TIMEOUT_S * MSEC_PER_SEC;
66+
#elif CONFIG_TEST_INPUT_LOG_RATE > 0 && CONFIG_TEST_INPUT_LOG_RATE <= CONFIG_TEST_OUTPUT_LOG_RATE
67+
end_ms += MSEC_PER_SEC * DIV_ROUND_UP(CONFIG_TEST_NUM_LOGS, CONFIG_TEST_INPUT_LOG_RATE);
68+
#elif CONFIG_TEST_OUTPUT_LOG_RATE > 0 && CONFIG_TEST_INPUT_LOG_RATE > CONFIG_TEST_OUTPUT_LOG_RATE
69+
end_ms += MSEC_PER_SEC * DIV_ROUND_UP(CONFIG_TEST_NUM_LOGS, CONFIG_TEST_OUTPUT_LOG_RATE);
70+
#else
71+
#error "Impossible scenario"
72+
#endif
73+
74+
TC_PRINT("Start time: %u ms\n", start_ms);
75+
TC_PRINT("End time: %u ms\n", end_ms);
76+
}
77+
78+
static void handle_output(uint32_t i)
79+
{
80+
while (true) {
81+
if (i + 1 <= (CONFIG_TEST_OUTPUT_LOG_RATE * (now() - then())) / MSEC_PER_SEC) {
82+
break;
83+
}
84+
k_msleep(1);
85+
}
86+
87+
++mock_backend.handled;
88+
}
89+
90+
static void handle_input(void)
91+
{
92+
for (int i = 0; i < CONFIG_TEST_NUM_LOGS; i++) {
93+
while (true) {
94+
if (i + 1 <= CONFIG_TEST_INPUT_LOG_RATE * (now() - then()) / MSEC_PER_SEC) {
95+
break;
96+
}
97+
zassert_true(now() <= end());
98+
k_msleep(1);
99+
}
100+
101+
LOG_INF("%u", i);
102+
}
103+
}
104+
105+
static void process(const struct log_backend *const backend, union log_msg_generic *msg)
106+
{
107+
size_t len;
108+
uint8_t *package = log_msg_get_package(&msg->log, &len);
109+
110+
package += 2 * sizeof(void *);
111+
112+
handle_output(*(uint32_t *)package);
113+
}
114+
115+
static void mock_init(struct log_backend const *const backend)
116+
{
117+
}
118+
119+
static void panic(struct log_backend const *const backend)
120+
{
121+
#if WILL_STALL
122+
/* Don't panic! */
123+
return;
124+
#endif
125+
126+
zassert_true(false);
127+
}
128+
129+
static void dropped(const struct log_backend *const backend, uint32_t cnt)
130+
{
131+
mock_backend.dropped += cnt;
132+
}
133+
134+
static const struct log_backend_api log_blocking_api = {
135+
.process = process,
136+
.panic = panic,
137+
.init = mock_init,
138+
.dropped = dropped,
139+
};
140+
141+
LOG_BACKEND_DEFINE(blocking_log_backend, log_blocking_api, true, NULL);
142+
143+
BUILD_ASSERT(CONFIG_TEST_INPUT_LOG_RATE >= 0);
144+
BUILD_ASSERT(CONFIG_TEST_OUTPUT_LOG_RATE >= 0);
145+
146+
static void print_input(void)
147+
{
148+
TC_PRINT("CONFIG_TEST_NUM_LOGS: %d\n", CONFIG_TEST_NUM_LOGS);
149+
TC_PRINT("CONFIG_TEST_INPUT_LOG_RATE: %d\n", CONFIG_TEST_INPUT_LOG_RATE);
150+
TC_PRINT("CONFIG_TEST_OUTPUT_LOG_RATE: %d\n", CONFIG_TEST_OUTPUT_LOG_RATE);
151+
}
152+
153+
static void print_output(void)
154+
{
155+
TC_PRINT("Log backend dropped %u messages\n", mock_backend.dropped);
156+
TC_PRINT("Log backend handled %u messages\n", mock_backend.handled);
157+
}
158+
159+
static void test_blocking_thread_entry(void *p1, void *p2, void *p3)
160+
{
161+
ARG_UNUSED(p1);
162+
ARG_UNUSED(p2);
163+
ARG_UNUSED(p3);
164+
165+
handle_input();
166+
}
167+
K_THREAD_DEFINE(test_blocking_thread, 4096, test_blocking_thread_entry, NULL, NULL, NULL,
168+
K_HIGHEST_THREAD_PRIO, 0, UINT32_MAX);
169+
170+
#if WILL_STALL
171+
ZTEST_EXPECT_FAIL(log_blocking, test_blocking);
172+
#endif
173+
ZTEST(log_blocking, test_blocking)
174+
{
175+
#if WILL_STALL
176+
/*
177+
* This is a workaround for a possible bug in the testing subsys:
178+
* - comment-out ztest_test_fail() below
179+
* - run with:
180+
* west build -p auto -b qemu_riscv64 -t run \
181+
* -T tests/subsys/logging/log_blocking/logging.blocking.rate.stalled
182+
* - observe "Assertion failed at..."
183+
* - technically, testsuite should pass. Since ZTEST_EXPECT_FAIL() is set. Never gets there.
184+
* - run with:
185+
* twister -i -p qemu_riscv64 -T tests/subsys/logging/log_blocking/
186+
* - observe "..FAILED : Timeout"
187+
* - possible conclusions:
188+
* - test thread has not properly longjumped?
189+
* - twister not detecting assertion failures?
190+
* - twister expecting some other string and never sees it?
191+
*/
192+
ztest_test_fail();
193+
#endif
194+
195+
create_start_end();
196+
k_thread_start(test_blocking_thread);
197+
k_msleep(end() - now());
198+
199+
#if WILL_STALL
200+
k_thread_abort(test_blocking_thread);
201+
#endif
202+
zassert_ok(k_thread_join(test_blocking_thread, K_SECONDS(MAX_JOIN_TIMEOUT_S)));
203+
204+
print_output();
205+
206+
zassert_equal(mock_backend.dropped, 0, "dropped %u / %u logs", mock_backend.dropped,
207+
CONFIG_TEST_NUM_LOGS);
208+
209+
zassert_equal(mock_backend.handled, CONFIG_TEST_NUM_LOGS, "handled %u / %u logs",
210+
mock_backend.handled, CONFIG_TEST_NUM_LOGS);
211+
}
212+
213+
static void *setup(void)
214+
{
215+
/*
216+
* This testsuite was added mainly to address a regression caused
217+
* by this subtle, but very different interpretation.
218+
*/
219+
__ASSERT(K_TIMEOUT_EQ(K_NO_WAIT, K_MSEC(-1)), "K_NO_WAIT should be equal to K_MSEC(-1)");
220+
__ASSERT(!K_TIMEOUT_EQ(K_FOREVER, K_MSEC(-1)),
221+
"K_FOREVER should not be equal to K_MSEC(-1)");
222+
223+
test_source_id = log_source_id_get(STRINGIFY(MODULE_NAME));
224+
225+
print_input();
226+
227+
return NULL;
228+
}
229+
230+
static void before(void *arg)
231+
{
232+
memset(&mock_backend, 0, sizeof(mock_backend));
233+
}
234+
235+
static void teardown(void *data)
236+
{
237+
ARG_UNUSED(data);
238+
log_backend_disable(&blocking_log_backend);
239+
}
240+
241+
ZTEST_SUITE(log_blocking, NULL, setup, before, NULL, teardown);

0 commit comments

Comments
 (0)