Skip to content

Commit 17894c2

Browse files
committed
Merge tag 'trace-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt: - Snapshot buffer issues: 1. When instances started allowing latency tracers, it uses a snapshot buffer (another buffer that is not written to but swapped with the main buffer that is). The snapshot buffer needs to be the same size as the main buffer. But when the snapshot buffers were added to instances, the code to make the snapshot equal to the main buffer still was only doing it for the main buffer and not the instances. 2. Need to stop the current tracer when resizing the buffers. Otherwise there can be a race if the tracer decides to make a snapshot between resizing the main buffer and the snapshot buffer. 3. When a tracer is "stopped" in disables both the main buffer and the snapshot buffer. This needs to be done for instances and not only the main buffer, now that instances also have a snapshot buffer. - Buffered event for filtering issues: When filtering is enabled, because events can be dropped often, it is quicker to copy the event into a temp buffer and write that into the main buffer if it is not filtered or just drop the event if it is, than to write the event into the ring buffer and then try to discard it. This temp buffer is allocated and needs special synchronization to do so. But there were some issues with that: 1. When disabling the filter and freeing the buffer, a call to all CPUs is required to stop each per_cpu usage. But the code called smp_call_function_many() which does not include the current CPU. If the task is migrated to another CPU when it enables the CPUs via smp_call_function_many(), it will not enable the one it is currently on and this causes issues later on. Use on_each_cpu_mask() instead, which includes the current CPU. 2.When the allocation of the buffered event fails, it can give a warning. But the buffered event is just an optimization (it's still OK to write to the ring buffer and free it). Do not WARN in this case. 3.The freeing of the buffer event requires synchronization. First a counter is decremented to zero so that no new uses of it will happen. Then it sets the buffered event to NULL, and finally it frees the buffered event. There's a synchronize_rcu() between the counter decrement and the setting the variable to NULL, but only a smp_wmb() between that and the freeing of the buffer. It is theoretically possible that a user missed seeing the decrement, but will use the buffer after it is free. Another synchronize_rcu() is needed in place of that smp_wmb(). - ring buffer timestamps on 32 bit machines The ring buffer timestamp on 32 bit machines has to break the 64 bit number into multiple values as cmpxchg is required on it, and a 64 bit cmpxchg on 32 bit architectures is very slow. The code use to just use two 32 bit values and make it a 60 bit timestamp where the other 4 bits were used as counters for synchronization. It later came known that the timestamp on 32 bit still need all 64 bits in some cases. So 3 words were created to handle the 64 bits. But issues arised with this: 1. The synchronization logic still only compared the counter with the first two, but not with the third number, so the synchronization could fail unknowingly. 2. A check on discard of an event could race if an event happened between the discard and updating one of the counters. The counter needs to be updated (forcing an absolute timestamp and not to use a delta) before the actual discard happens. * tag 'trace-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Test last update in 32bit version of __rb_time_read() ring-buffer: Force absolute timestamp on discard of event tracing: Fix a possible race when disabling buffered events tracing: Fix a warning when allocating buffered events fails tracing: Fix incomplete locking when disabling buffered events tracing: Disable snapshot buffer when stopping instance tracers tracing: Stop current tracer when resizing buffer tracing: Always update snapshot buffer size
2 parents 8e819a7 + f458a14 commit 17894c2

File tree

2 files changed

+69
-112
lines changed

2 files changed

+69
-112
lines changed

kernel/trace/ring_buffer.c

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -644,8 +644,8 @@ static inline bool __rb_time_read(rb_time_t *t, u64 *ret, unsigned long *cnt)
644644

645645
*cnt = rb_time_cnt(top);
646646

647-
/* If top and bottom counts don't match, this interrupted a write */
648-
if (*cnt != rb_time_cnt(bottom))
647+
/* If top and msb counts don't match, this interrupted a write */
648+
if (*cnt != rb_time_cnt(msb))
649649
return false;
650650

651651
/* The shift to msb will lose its cnt bits */
@@ -3030,22 +3030,19 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
30303030
local_read(&bpage->write) & ~RB_WRITE_MASK;
30313031
unsigned long event_length = rb_event_length(event);
30323032

3033+
/*
3034+
* For the before_stamp to be different than the write_stamp
3035+
* to make sure that the next event adds an absolute
3036+
* value and does not rely on the saved write stamp, which
3037+
* is now going to be bogus.
3038+
*/
3039+
rb_time_set(&cpu_buffer->before_stamp, 0);
3040+
30333041
/* Something came in, can't discard */
30343042
if (!rb_time_cmpxchg(&cpu_buffer->write_stamp,
30353043
write_stamp, write_stamp - delta))
30363044
return false;
30373045

3038-
/*
3039-
* It's possible that the event time delta is zero
3040-
* (has the same time stamp as the previous event)
3041-
* in which case write_stamp and before_stamp could
3042-
* be the same. In such a case, force before_stamp
3043-
* to be different than write_stamp. It doesn't
3044-
* matter what it is, as long as its different.
3045-
*/
3046-
if (!delta)
3047-
rb_time_set(&cpu_buffer->before_stamp, 0);
3048-
30493046
/*
30503047
* If an event were to come in now, it would see that the
30513048
* write_stamp and the before_stamp are different, and assume

kernel/trace/trace.c

Lines changed: 59 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -2360,133 +2360,91 @@ int is_tracing_stopped(void)
23602360
return global_trace.stop_count;
23612361
}
23622362

2363-
/**
2364-
* tracing_start - quick start of the tracer
2365-
*
2366-
* If tracing is enabled but was stopped by tracing_stop,
2367-
* this will start the tracer back up.
2368-
*/
2369-
void tracing_start(void)
2363+
static void tracing_start_tr(struct trace_array *tr)
23702364
{
23712365
struct trace_buffer *buffer;
23722366
unsigned long flags;
23732367

23742368
if (tracing_disabled)
23752369
return;
23762370

2377-
raw_spin_lock_irqsave(&global_trace.start_lock, flags);
2378-
if (--global_trace.stop_count) {
2379-
if (global_trace.stop_count < 0) {
2371+
raw_spin_lock_irqsave(&tr->start_lock, flags);
2372+
if (--tr->stop_count) {
2373+
if (WARN_ON_ONCE(tr->stop_count < 0)) {
23802374
/* Someone screwed up their debugging */
2381-
WARN_ON_ONCE(1);
2382-
global_trace.stop_count = 0;
2375+
tr->stop_count = 0;
23832376
}
23842377
goto out;
23852378
}
23862379

23872380
/* Prevent the buffers from switching */
2388-
arch_spin_lock(&global_trace.max_lock);
2381+
arch_spin_lock(&tr->max_lock);
23892382

2390-
buffer = global_trace.array_buffer.buffer;
2383+
buffer = tr->array_buffer.buffer;
23912384
if (buffer)
23922385
ring_buffer_record_enable(buffer);
23932386

23942387
#ifdef CONFIG_TRACER_MAX_TRACE
2395-
buffer = global_trace.max_buffer.buffer;
2388+
buffer = tr->max_buffer.buffer;
23962389
if (buffer)
23972390
ring_buffer_record_enable(buffer);
23982391
#endif
23992392

2400-
arch_spin_unlock(&global_trace.max_lock);
2401-
2402-
out:
2403-
raw_spin_unlock_irqrestore(&global_trace.start_lock, flags);
2404-
}
2405-
2406-
static void tracing_start_tr(struct trace_array *tr)
2407-
{
2408-
struct trace_buffer *buffer;
2409-
unsigned long flags;
2410-
2411-
if (tracing_disabled)
2412-
return;
2413-
2414-
/* If global, we need to also start the max tracer */
2415-
if (tr->flags & TRACE_ARRAY_FL_GLOBAL)
2416-
return tracing_start();
2417-
2418-
raw_spin_lock_irqsave(&tr->start_lock, flags);
2419-
2420-
if (--tr->stop_count) {
2421-
if (tr->stop_count < 0) {
2422-
/* Someone screwed up their debugging */
2423-
WARN_ON_ONCE(1);
2424-
tr->stop_count = 0;
2425-
}
2426-
goto out;
2427-
}
2428-
2429-
buffer = tr->array_buffer.buffer;
2430-
if (buffer)
2431-
ring_buffer_record_enable(buffer);
2393+
arch_spin_unlock(&tr->max_lock);
24322394

24332395
out:
24342396
raw_spin_unlock_irqrestore(&tr->start_lock, flags);
24352397
}
24362398

24372399
/**
2438-
* tracing_stop - quick stop of the tracer
2400+
* tracing_start - quick start of the tracer
24392401
*
2440-
* Light weight way to stop tracing. Use in conjunction with
2441-
* tracing_start.
2402+
* If tracing is enabled but was stopped by tracing_stop,
2403+
* this will start the tracer back up.
24422404
*/
2443-
void tracing_stop(void)
2405+
void tracing_start(void)
2406+
2407+
{
2408+
return tracing_start_tr(&global_trace);
2409+
}
2410+
2411+
static void tracing_stop_tr(struct trace_array *tr)
24442412
{
24452413
struct trace_buffer *buffer;
24462414
unsigned long flags;
24472415

2448-
raw_spin_lock_irqsave(&global_trace.start_lock, flags);
2449-
if (global_trace.stop_count++)
2416+
raw_spin_lock_irqsave(&tr->start_lock, flags);
2417+
if (tr->stop_count++)
24502418
goto out;
24512419

24522420
/* Prevent the buffers from switching */
2453-
arch_spin_lock(&global_trace.max_lock);
2421+
arch_spin_lock(&tr->max_lock);
24542422

2455-
buffer = global_trace.array_buffer.buffer;
2423+
buffer = tr->array_buffer.buffer;
24562424
if (buffer)
24572425
ring_buffer_record_disable(buffer);
24582426

24592427
#ifdef CONFIG_TRACER_MAX_TRACE
2460-
buffer = global_trace.max_buffer.buffer;
2428+
buffer = tr->max_buffer.buffer;
24612429
if (buffer)
24622430
ring_buffer_record_disable(buffer);
24632431
#endif
24642432

2465-
arch_spin_unlock(&global_trace.max_lock);
2433+
arch_spin_unlock(&tr->max_lock);
24662434

24672435
out:
2468-
raw_spin_unlock_irqrestore(&global_trace.start_lock, flags);
2436+
raw_spin_unlock_irqrestore(&tr->start_lock, flags);
24692437
}
24702438

2471-
static void tracing_stop_tr(struct trace_array *tr)
2439+
/**
2440+
* tracing_stop - quick stop of the tracer
2441+
*
2442+
* Light weight way to stop tracing. Use in conjunction with
2443+
* tracing_start.
2444+
*/
2445+
void tracing_stop(void)
24722446
{
2473-
struct trace_buffer *buffer;
2474-
unsigned long flags;
2475-
2476-
/* If global, we need to also stop the max tracer */
2477-
if (tr->flags & TRACE_ARRAY_FL_GLOBAL)
2478-
return tracing_stop();
2479-
2480-
raw_spin_lock_irqsave(&tr->start_lock, flags);
2481-
if (tr->stop_count++)
2482-
goto out;
2483-
2484-
buffer = tr->array_buffer.buffer;
2485-
if (buffer)
2486-
ring_buffer_record_disable(buffer);
2487-
2488-
out:
2489-
raw_spin_unlock_irqrestore(&tr->start_lock, flags);
2447+
return tracing_stop_tr(&global_trace);
24902448
}
24912449

24922450
static int trace_save_cmdline(struct task_struct *tsk)
@@ -2770,8 +2728,11 @@ void trace_buffered_event_enable(void)
27702728
for_each_tracing_cpu(cpu) {
27712729
page = alloc_pages_node(cpu_to_node(cpu),
27722730
GFP_KERNEL | __GFP_NORETRY, 0);
2773-
if (!page)
2774-
goto failed;
2731+
/* This is just an optimization and can handle failures */
2732+
if (!page) {
2733+
pr_err("Failed to allocate event buffer\n");
2734+
break;
2735+
}
27752736

27762737
event = page_address(page);
27772738
memset(event, 0, sizeof(*event));
@@ -2785,10 +2746,6 @@ void trace_buffered_event_enable(void)
27852746
WARN_ON_ONCE(1);
27862747
preempt_enable();
27872748
}
2788-
2789-
return;
2790-
failed:
2791-
trace_buffered_event_disable();
27922749
}
27932750

27942751
static void enable_trace_buffered_event(void *data)
@@ -2823,11 +2780,9 @@ void trace_buffered_event_disable(void)
28232780
if (--trace_buffered_event_ref)
28242781
return;
28252782

2826-
preempt_disable();
28272783
/* For each CPU, set the buffer as used. */
2828-
smp_call_function_many(tracing_buffer_mask,
2829-
disable_trace_buffered_event, NULL, 1);
2830-
preempt_enable();
2784+
on_each_cpu_mask(tracing_buffer_mask, disable_trace_buffered_event,
2785+
NULL, true);
28312786

28322787
/* Wait for all current users to finish */
28332788
synchronize_rcu();
@@ -2836,17 +2791,19 @@ void trace_buffered_event_disable(void)
28362791
free_page((unsigned long)per_cpu(trace_buffered_event, cpu));
28372792
per_cpu(trace_buffered_event, cpu) = NULL;
28382793
}
2794+
28392795
/*
2840-
* Make sure trace_buffered_event is NULL before clearing
2841-
* trace_buffered_event_cnt.
2796+
* Wait for all CPUs that potentially started checking if they can use
2797+
* their event buffer only after the previous synchronize_rcu() call and
2798+
* they still read a valid pointer from trace_buffered_event. It must be
2799+
* ensured they don't see cleared trace_buffered_event_cnt else they
2800+
* could wrongly decide to use the pointed-to buffer which is now freed.
28422801
*/
2843-
smp_wmb();
2802+
synchronize_rcu();
28442803

2845-
preempt_disable();
2846-
/* Do the work on each cpu */
2847-
smp_call_function_many(tracing_buffer_mask,
2848-
enable_trace_buffered_event, NULL, 1);
2849-
preempt_enable();
2804+
/* For each CPU, relinquish the buffer */
2805+
on_each_cpu_mask(tracing_buffer_mask, enable_trace_buffered_event, NULL,
2806+
true);
28502807
}
28512808

28522809
static struct trace_buffer *temp_buffer;
@@ -6387,13 +6344,15 @@ static int __tracing_resize_ring_buffer(struct trace_array *tr,
63876344
if (!tr->array_buffer.buffer)
63886345
return 0;
63896346

6347+
/* Do not allow tracing while resizng ring buffer */
6348+
tracing_stop_tr(tr);
6349+
63906350
ret = ring_buffer_resize(tr->array_buffer.buffer, size, cpu);
63916351
if (ret < 0)
6392-
return ret;
6352+
goto out_start;
63936353

63946354
#ifdef CONFIG_TRACER_MAX_TRACE
6395-
if (!(tr->flags & TRACE_ARRAY_FL_GLOBAL) ||
6396-
!tr->current_trace->use_max_tr)
6355+
if (!tr->current_trace->use_max_tr)
63976356
goto out;
63986357

63996358
ret = ring_buffer_resize(tr->max_buffer.buffer, size, cpu);
@@ -6418,7 +6377,7 @@ static int __tracing_resize_ring_buffer(struct trace_array *tr,
64186377
WARN_ON(1);
64196378
tracing_disabled = 1;
64206379
}
6421-
return ret;
6380+
goto out_start;
64226381
}
64236382

64246383
update_buffer_entries(&tr->max_buffer, cpu);
@@ -6427,7 +6386,8 @@ static int __tracing_resize_ring_buffer(struct trace_array *tr,
64276386
#endif /* CONFIG_TRACER_MAX_TRACE */
64286387

64296388
update_buffer_entries(&tr->array_buffer, cpu);
6430-
6389+
out_start:
6390+
tracing_start_tr(tr);
64316391
return ret;
64326392
}
64336393

0 commit comments

Comments
 (0)