Skip to content

Commit 0d24f65

Browse files
a-darwishPeter Zijlstra
authored andcommitted
Documentation: locking: Describe seqlock design and usage
Proper documentation for the design and usage of sequence counters and sequential locks does not exist. Complete the seqlock.h documentation as follows: - Divide all documentation on a seqcount_t vs. seqlock_t basis. The description for both mechanisms was intermingled, which is incorrect since the usage constrains for each type are vastly different. - Add an introductory paragraph describing the internal design of, and rationale for, sequence counters. - Document seqcount_t writer non-preemptibility requirement, which was not previously documented anywhere, and provide a clear rationale. - Provide template code for seqcount_t and seqlock_t initialization and reader/writer critical sections. - Recommend using seqlock_t by default. It implicitly handles the serialization and non-preemptibility requirements of writers. At seqlock.h: - Remove references to brlocks as they've long been removed from the kernel. - Remove references to gcc-3.x since the kernel's minimum supported gcc version is 4.9. References: 0f6ed63 ("no need to keep brlock macros anymore...") References: 6ec4476 ("Raise gcc version requirement to 4.9") Signed-off-by: Ahmed S. Darwish <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
1 parent f05d671 commit 0d24f65

File tree

3 files changed

+211
-45
lines changed

3 files changed

+211
-45
lines changed

Documentation/locking/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ locking
1414
mutex-design
1515
rt-mutex-design
1616
rt-mutex
17+
seqlock
1718
spinlocks
1819
ww-mutex-design
1920
preempt-locking

Documentation/locking/seqlock.rst

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
======================================
2+
Sequence counters and sequential locks
3+
======================================
4+
5+
Introduction
6+
============
7+
8+
Sequence counters are a reader-writer consistency mechanism with
9+
lockless readers (read-only retry loops), and no writer starvation. They
10+
are used for data that's rarely written to (e.g. system time), where the
11+
reader wants a consistent set of information and is willing to retry if
12+
that information changes.
13+
14+
A data set is consistent when the sequence count at the beginning of the
15+
read side critical section is even and the same sequence count value is
16+
read again at the end of the critical section. The data in the set must
17+
be copied out inside the read side critical section. If the sequence
18+
count has changed between the start and the end of the critical section,
19+
the reader must retry.
20+
21+
Writers increment the sequence count at the start and the end of their
22+
critical section. After starting the critical section the sequence count
23+
is odd and indicates to the readers that an update is in progress. At
24+
the end of the write side critical section the sequence count becomes
25+
even again which lets readers make progress.
26+
27+
A sequence counter write side critical section must never be preempted
28+
or interrupted by read side sections. Otherwise the reader will spin for
29+
the entire scheduler tick due to the odd sequence count value and the
30+
interrupted writer. If that reader belongs to a real-time scheduling
31+
class, it can spin forever and the kernel will livelock.
32+
33+
This mechanism cannot be used if the protected data contains pointers,
34+
as the writer can invalidate a pointer that the reader is following.
35+
36+
37+
.. _seqcount_t:
38+
39+
Sequence counters (``seqcount_t``)
40+
==================================
41+
42+
This is the the raw counting mechanism, which does not protect against
43+
multiple writers. Write side critical sections must thus be serialized
44+
by an external lock.
45+
46+
If the write serialization primitive is not implicitly disabling
47+
preemption, preemption must be explicitly disabled before entering the
48+
write side section. If the read section can be invoked from hardirq or
49+
softirq contexts, interrupts or bottom halves must also be respectively
50+
disabled before entering the write section.
51+
52+
If it's desired to automatically handle the sequence counter
53+
requirements of writer serialization and non-preemptibility, use
54+
:ref:`seqlock_t` instead.
55+
56+
Initialization::
57+
58+
/* dynamic */
59+
seqcount_t foo_seqcount;
60+
seqcount_init(&foo_seqcount);
61+
62+
/* static */
63+
static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
64+
65+
/* C99 struct init */
66+
struct {
67+
.seq = SEQCNT_ZERO(foo.seq),
68+
} foo;
69+
70+
Write path::
71+
72+
/* Serialized context with disabled preemption */
73+
74+
write_seqcount_begin(&foo_seqcount);
75+
76+
/* ... [[write-side critical section]] ... */
77+
78+
write_seqcount_end(&foo_seqcount);
79+
80+
Read path::
81+
82+
do {
83+
seq = read_seqcount_begin(&foo_seqcount);
84+
85+
/* ... [[read-side critical section]] ... */
86+
87+
} while (read_seqcount_retry(&foo_seqcount, seq));
88+
89+
90+
.. _seqlock_t:
91+
92+
Sequential locks (``seqlock_t``)
93+
================================
94+
95+
This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
96+
embedded spinlock for writer serialization and non-preemptibility.
97+
98+
If the read side section can be invoked from hardirq or softirq context,
99+
use the write side function variants which disable interrupts or bottom
100+
halves respectively.
101+
102+
Initialization::
103+
104+
/* dynamic */
105+
seqlock_t foo_seqlock;
106+
seqlock_init(&foo_seqlock);
107+
108+
/* static */
109+
static DEFINE_SEQLOCK(foo_seqlock);
110+
111+
/* C99 struct init */
112+
struct {
113+
.seql = __SEQLOCK_UNLOCKED(foo.seql)
114+
} foo;
115+
116+
Write path::
117+
118+
write_seqlock(&foo_seqlock);
119+
120+
/* ... [[write-side critical section]] ... */
121+
122+
write_sequnlock(&foo_seqlock);
123+
124+
Read path, three categories:
125+
126+
1. Normal Sequence readers which never block a writer but they must
127+
retry if a writer is in progress by detecting change in the sequence
128+
number. Writers do not wait for a sequence reader::
129+
130+
do {
131+
seq = read_seqbegin(&foo_seqlock);
132+
133+
/* ... [[read-side critical section]] ... */
134+
135+
} while (read_seqretry(&foo_seqlock, seq));
136+
137+
2. Locking readers which will wait if a writer or another locking reader
138+
is in progress. A locking reader in progress will also block a writer
139+
from entering its critical section. This read lock is
140+
exclusive. Unlike rwlock_t, only one locking reader can acquire it::
141+
142+
read_seqlock_excl(&foo_seqlock);
143+
144+
/* ... [[read-side critical section]] ... */
145+
146+
read_sequnlock_excl(&foo_seqlock);
147+
148+
3. Conditional lockless reader (as in 1), or locking reader (as in 2),
149+
according to a passed marker. This is used to avoid lockless readers
150+
starvation (too much retry loops) in case of a sharp spike in write
151+
activity. First, a lockless read is tried (even marker passed). If
152+
that trial fails (odd sequence counter is returned, which is used as
153+
the next iteration marker), the lockless read is transformed to a
154+
full locking read and no retry loop is necessary::
155+
156+
/* marker; even initialization */
157+
int seq = 0;
158+
do {
159+
read_seqbegin_or_lock(&foo_seqlock, &seq);
160+
161+
/* ... [[read-side critical section]] ... */
162+
163+
} while (need_seqretry(&foo_seqlock, seq));
164+
done_seqretry(&foo_seqlock, seq);
165+
166+
167+
API documentation
168+
=================
169+
170+
.. kernel-doc:: include/linux/seqlock.h

include/linux/seqlock.h

Lines changed: 40 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,15 @@
11
/* SPDX-License-Identifier: GPL-2.0 */
22
#ifndef __LINUX_SEQLOCK_H
33
#define __LINUX_SEQLOCK_H
4+
45
/*
5-
* Reader/writer consistent mechanism without starving writers. This type of
6-
* lock for data where the reader wants a consistent set of information
7-
* and is willing to retry if the information changes. There are two types
8-
* of readers:
9-
* 1. Sequence readers which never block a writer but they may have to retry
10-
* if a writer is in progress by detecting change in sequence number.
11-
* Writers do not wait for a sequence reader.
12-
* 2. Locking readers which will wait if a writer or another locking reader
13-
* is in progress. A locking reader in progress will also block a writer
14-
* from going forward. Unlike the regular rwlock, the read lock here is
15-
* exclusive so that only one locking reader can get it.
16-
*
17-
* This is not as cache friendly as brlock. Also, this may not work well
18-
* for data that contains pointers, because any writer could
19-
* invalidate a pointer that a reader was following.
20-
*
21-
* Expected non-blocking reader usage:
22-
* do {
23-
* seq = read_seqbegin(&foo);
24-
* ...
25-
* } while (read_seqretry(&foo, seq));
26-
*
27-
*
28-
* On non-SMP the spin locks disappear but the writer still needs
29-
* to increment the sequence variables because an interrupt routine could
30-
* change the state of the data.
31-
*
32-
* Based on x86_64 vsyscall gettimeofday
33-
* by Keith Owens and Andrea Arcangeli
6+
* seqcount_t / seqlock_t - a reader-writer consistency mechanism with
7+
* lockless readers (read-only retry loops), and no writer starvation.
8+
*
9+
* See Documentation/locking/seqlock.rst
10+
*
11+
* Copyrights:
12+
* - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
3413
*/
3514

3615
#include <linux/spinlock.h>
@@ -41,25 +20,39 @@
4120
#include <asm/processor.h>
4221

4322
/*
44-
* The seqlock interface does not prescribe a precise sequence of read
45-
* begin/retry/end. For readers, typically there is a call to
23+
* The seqlock seqcount_t interface does not prescribe a precise sequence of
24+
* read begin/retry/end. For readers, typically there is a call to
4625
* read_seqcount_begin() and read_seqcount_retry(), however, there are more
4726
* esoteric cases which do not follow this pattern.
4827
*
4928
* As a consequence, we take the following best-effort approach for raw usage
5029
* via seqcount_t under KCSAN: upon beginning a seq-reader critical section,
5130
* pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as
5231
* atomics; if there is a matching read_seqcount_retry() call, no following
53-
* memory operations are considered atomic. Usage of seqlocks via seqlock_t
54-
* interface is not affected.
32+
* memory operations are considered atomic. Usage of the seqlock_t interface
33+
* is not affected.
5534
*/
5635
#define KCSAN_SEQLOCK_REGION_MAX 1000
5736

5837
/*
59-
* Version using sequence counter only.
60-
* This can be used when code has its own mutex protecting the
61-
* updating starting before the write_seqcountbeqin() and ending
62-
* after the write_seqcount_end().
38+
* Sequence counters (seqcount_t)
39+
*
40+
* This is the raw counting mechanism, without any writer protection.
41+
*
42+
* Write side critical sections must be serialized and non-preemptible.
43+
*
44+
* If readers can be invoked from hardirq or softirq contexts,
45+
* interrupts or bottom halves must also be respectively disabled before
46+
* entering the write section.
47+
*
48+
* This mechanism can't be used if the protected data contains pointers,
49+
* as the writer can invalidate a pointer that a reader is following.
50+
*
51+
* If it's desired to automatically handle the sequence counter writer
52+
* serialization and non-preemptibility requirements, use a sequential
53+
* lock (seqlock_t) instead.
54+
*
55+
* See Documentation/locking/seqlock.rst
6356
*/
6457
typedef struct seqcount {
6558
unsigned sequence;
@@ -398,10 +391,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
398391
smp_wmb(); /* increment "sequence" before following stores */
399392
}
400393

401-
/*
402-
* Sequence counter only version assumes that callers are using their
403-
* own mutexing.
404-
*/
405394
static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
406395
{
407396
raw_write_seqcount_begin(s);
@@ -434,15 +423,21 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
434423
kcsan_nestable_atomic_end();
435424
}
436425

426+
/*
427+
* Sequential locks (seqlock_t)
428+
*
429+
* Sequence counters with an embedded spinlock for writer serialization
430+
* and non-preemptibility.
431+
*
432+
* For more info, see:
433+
* - Comments on top of seqcount_t
434+
* - Documentation/locking/seqlock.rst
435+
*/
437436
typedef struct {
438437
struct seqcount seqcount;
439438
spinlock_t lock;
440439
} seqlock_t;
441440

442-
/*
443-
* These macros triggered gcc-3.x compile-time problems. We think these are
444-
* OK now. Be cautious.
445-
*/
446441
#define __SEQLOCK_UNLOCKED(lockname) \
447442
{ \
448443
.seqcount = SEQCNT_ZERO(lockname), \

0 commit comments

Comments
 (0)