Skip to content

Commit c122052

Browse files
Dave ChinnerChandan Babu R
authored andcommitted
xfs: grant heads track byte counts, not LSNs
The grant heads in the log track the space reserved in the log for running transactions. They do this by tracking how far ahead of the tail that the reservation has reached, and the units for doing this are {cycle,bytes} for the reserve head rather than {cycle,blocks} which are normal used by LSNs. This is annoyingly complex because we have to split, crack and combined these tuples for any calculation we do to determine log space and targets. This is computationally expensive as well as difficult to do atomically and locklessly, as well as limiting the size of the log to 2^32 bytes. Really, though, all the grant heads are tracking is how much space is currently available for use in the log. We can track this as a simply byte count - we just don't care what the actual physical location in the log the head and tail are at, just how much space we have remaining before the head and tail overlap. So, convert the grant heads to track the byte reservations that are active rather than the current (cycle, offset) tuples. This means an empty log has zero bytes consumed, and a full log is when the reservations reach the size of the log minus the space consumed by the AIL. This greatly simplifies the accounting and checks for whether there is space available. We no longer need to crack or combine LSNs to determine how much space the log has left, nor do we need to look at the head or tail of the log to determine how close to full we are. There is, however, a complexity that needs to be handled. We know how much space is being tracked in the AIL now via log->l_tail_space and the log tickets track active reservations and return the unused portions to the grant heads when ungranted. Unfortunately, we don't track the used portion of the grant, so when we transfer log items from the CIL to the AIL, the space accounted to the grant heads is transferred to the log tail space. Hence when we move the AIL head forwards on item insert, we have to remove that space from the grant heads. We also remove the xlog_verify_grant_tail() debug function as it is no longer useful. The check it performs has been racy since delayed logging was introduced, but now it is clearly only detecting false positives so remove it. The result of this substantially simpler accounting algorithm is an increase in sustained transaction rate from ~1.3 million transactions/s to ~1.9 million transactions/s with no increase in CPU usage. We also remove the 32 bit space limitation on the grant heads, which will allow us to increase the journal size beyond 2GB in future. Note that this renames the sysfs files exposing the log grant space now that the values are exported in bytes. This allows xfstests to auto-detect the old or new ABI. [hch: move xlog_grant_sub_space out of line, update the xlog_grant_{add,sub}_space prototypes, rename the sysfs files to allow auto-detection in xfstests] Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
1 parent de302ce commit c122052

File tree

7 files changed

+138
-238
lines changed

7 files changed

+138
-238
lines changed

Documentation/ABI/testing/sysfs-fs-xfs

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -15,25 +15,23 @@ Description:
1515
The log sequence number (LSN) of the current tail of the
1616
log. The LSN is exported in "cycle:basic block" format.
1717

18-
What: /sys/fs/xfs/<disk>/log/reserve_grant_head
19-
Date: July 2014
20-
KernelVersion: 3.17
18+
What: /sys/fs/xfs/<disk>/log/reserve_grant_head_bytes
19+
Date: June 2024
20+
KernelVersion: 6.11
2121
2222
Description:
2323
The current state of the log reserve grant head. It
2424
represents the total log reservation of all currently
25-
outstanding transactions. The grant head is exported in
26-
"cycle:bytes" format.
25+
outstanding transactions in bytes.
2726
Users: xfstests
2827

29-
What: /sys/fs/xfs/<disk>/log/write_grant_head
30-
Date: July 2014
31-
KernelVersion: 3.17
28+
What: /sys/fs/xfs/<disk>/log/write_grant_head_bytes
29+
Date: June 2024
30+
KernelVersion: 6.11
3231
3332
Description:
3433
The current state of the log write grant head. It
3534
represents the total log reservation of all currently
3635
outstanding transactions, including regrants due to
37-
rolling transactions. The grant head is exported in
38-
"cycle:bytes" format.
36+
rolling transactions in bytes.
3937
Users: xfstests

fs/xfs/xfs_log.c

Lines changed: 88 additions & 158 deletions
Original file line numberDiff line numberDiff line change
@@ -53,9 +53,6 @@ xlog_sync(
5353
struct xlog_ticket *ticket);
5454
#if defined(DEBUG)
5555
STATIC void
56-
xlog_verify_grant_tail(
57-
struct xlog *log);
58-
STATIC void
5956
xlog_verify_iclog(
6057
struct xlog *log,
6158
struct xlog_in_core *iclog,
@@ -65,7 +62,6 @@ xlog_verify_tail_lsn(
6562
struct xlog *log,
6663
struct xlog_in_core *iclog);
6764
#else
68-
#define xlog_verify_grant_tail(a)
6965
#define xlog_verify_iclog(a,b,c)
7066
#define xlog_verify_tail_lsn(a,b)
7167
#endif
@@ -133,125 +129,64 @@ xlog_prepare_iovec(
133129
return buf;
134130
}
135131

136-
static void
132+
static inline void
137133
xlog_grant_sub_space(
138-
struct xlog *log,
139134
struct xlog_grant_head *head,
140-
int bytes)
135+
int64_t bytes)
141136
{
142-
int64_t head_val = atomic64_read(&head->grant);
143-
int64_t new, old;
144-
145-
do {
146-
int cycle, space;
147-
148-
xlog_crack_grant_head_val(head_val, &cycle, &space);
149-
150-
space -= bytes;
151-
if (space < 0) {
152-
space += log->l_logsize;
153-
cycle--;
154-
}
155-
156-
old = head_val;
157-
new = xlog_assign_grant_head_val(cycle, space);
158-
head_val = atomic64_cmpxchg(&head->grant, old, new);
159-
} while (head_val != old);
137+
atomic64_sub(bytes, &head->grant);
160138
}
161139

162-
static void
140+
static inline void
163141
xlog_grant_add_space(
164-
struct xlog *log,
165142
struct xlog_grant_head *head,
166-
int bytes)
143+
int64_t bytes)
167144
{
168-
int64_t head_val = atomic64_read(&head->grant);
169-
int64_t new, old;
170-
171-
do {
172-
int tmp;
173-
int cycle, space;
174-
175-
xlog_crack_grant_head_val(head_val, &cycle, &space);
176-
177-
tmp = log->l_logsize - space;
178-
if (tmp > bytes)
179-
space += bytes;
180-
else {
181-
space = bytes - tmp;
182-
cycle++;
183-
}
184-
185-
old = head_val;
186-
new = xlog_assign_grant_head_val(cycle, space);
187-
head_val = atomic64_cmpxchg(&head->grant, old, new);
188-
} while (head_val != old);
145+
atomic64_add(bytes, &head->grant);
189146
}
190147

191-
STATIC void
148+
static void
192149
xlog_grant_head_init(
193150
struct xlog_grant_head *head)
194151
{
195-
xlog_assign_grant_head(&head->grant, 1, 0);
152+
atomic64_set(&head->grant, 0);
196153
INIT_LIST_HEAD(&head->waiters);
197154
spin_lock_init(&head->lock);
198155
}
199156

157+
void
158+
xlog_grant_return_space(
159+
struct xlog *log,
160+
xfs_lsn_t old_head,
161+
xfs_lsn_t new_head)
162+
{
163+
int64_t diff = xlog_lsn_sub(log, new_head, old_head);
164+
165+
xlog_grant_sub_space(&log->l_reserve_head, diff);
166+
xlog_grant_sub_space(&log->l_write_head, diff);
167+
}
168+
200169
/*
201-
* Return the space in the log between the tail and the head. The head
202-
* is passed in the cycle/bytes formal parms. In the special case where
203-
* the reserve head has wrapped passed the tail, this calculation is no
204-
* longer valid. In this case, just return 0 which means there is no space
205-
* in the log. This works for all places where this function is called
206-
* with the reserve head. Of course, if the write head were to ever
207-
* wrap the tail, we should blow up. Rather than catch this case here,
208-
* we depend on other ASSERTions in other parts of the code. XXXmiken
209-
*
210-
* If reservation head is behind the tail, we have a problem. Warn about it,
211-
* but then treat it as if the log is empty.
212-
*
213-
* If the log is shut down, the head and tail may be invalid or out of whack, so
214-
* shortcut invalidity asserts in this case so that we don't trigger them
215-
* falsely.
170+
* Return the space in the log between the tail and the head. In the case where
171+
* we have overrun available reservation space, return 0. The memory barrier
172+
* pairs with the smp_wmb() in xlog_cil_ail_insert() to ensure that grant head
173+
* vs tail space updates are seen in the correct order and hence avoid
174+
* transients as space is transferred from the grant heads to the AIL on commit
175+
* completion.
216176
*/
217-
static int
177+
static uint64_t
218178
xlog_grant_space_left(
219179
struct xlog *log,
220180
struct xlog_grant_head *head)
221181
{
222-
int tail_bytes;
223-
int tail_cycle;
224-
int head_cycle;
225-
int head_bytes;
226-
227-
xlog_crack_grant_head(&head->grant, &head_cycle, &head_bytes);
228-
xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes);
229-
tail_bytes = BBTOB(tail_bytes);
230-
if (tail_cycle == head_cycle && head_bytes >= tail_bytes)
231-
return log->l_logsize - (head_bytes - tail_bytes);
232-
if (tail_cycle + 1 < head_cycle)
233-
return 0;
234-
235-
/* Ignore potential inconsistency when shutdown. */
236-
if (xlog_is_shutdown(log))
237-
return log->l_logsize;
238-
239-
if (tail_cycle < head_cycle) {
240-
ASSERT(tail_cycle == (head_cycle - 1));
241-
return tail_bytes - head_bytes;
242-
}
182+
int64_t free_bytes;
243183

244-
/*
245-
* The reservation head is behind the tail. In this case we just want to
246-
* return the size of the log as the amount of space left.
247-
*/
248-
xfs_alert(log->l_mp, "xlog_grant_space_left: head behind tail");
249-
xfs_alert(log->l_mp, " tail_cycle = %d, tail_bytes = %d",
250-
tail_cycle, tail_bytes);
251-
xfs_alert(log->l_mp, " GH cycle = %d, GH bytes = %d",
252-
head_cycle, head_bytes);
253-
ASSERT(0);
254-
return log->l_logsize;
184+
smp_rmb(); /* paired with smp_wmb in xlog_cil_ail_insert() */
185+
free_bytes = log->l_logsize - READ_ONCE(log->l_tail_space) -
186+
atomic64_read(&head->grant);
187+
if (free_bytes > 0)
188+
return free_bytes;
189+
return 0;
255190
}
256191

257192
STATIC void
@@ -453,9 +388,8 @@ xfs_log_regrant(
453388
if (error)
454389
goto out_error;
455390

456-
xlog_grant_add_space(log, &log->l_write_head, need_bytes);
391+
xlog_grant_add_space(&log->l_write_head, need_bytes);
457392
trace_xfs_log_regrant_exit(log, tic);
458-
xlog_verify_grant_tail(log);
459393
return 0;
460394

461395
out_error:
@@ -504,10 +438,9 @@ xfs_log_reserve(
504438
if (error)
505439
goto out_error;
506440

507-
xlog_grant_add_space(log, &log->l_reserve_head, need_bytes);
508-
xlog_grant_add_space(log, &log->l_write_head, need_bytes);
441+
xlog_grant_add_space(&log->l_reserve_head, need_bytes);
442+
xlog_grant_add_space(&log->l_write_head, need_bytes);
509443
trace_xfs_log_reserve_exit(log, tic);
510-
xlog_verify_grant_tail(log);
511444
return 0;
512445

513446
out_error:
@@ -1880,8 +1813,8 @@ xlog_sync(
18801813
if (ticket) {
18811814
ticket->t_curr_res -= roundoff;
18821815
} else {
1883-
xlog_grant_add_space(log, &log->l_reserve_head, roundoff);
1884-
xlog_grant_add_space(log, &log->l_write_head, roundoff);
1816+
xlog_grant_add_space(&log->l_reserve_head, roundoff);
1817+
xlog_grant_add_space(&log->l_write_head, roundoff);
18851818
}
18861819

18871820
/* put cycle number in every block */
@@ -2801,16 +2734,15 @@ xfs_log_ticket_regrant(
28012734
if (ticket->t_cnt > 0)
28022735
ticket->t_cnt--;
28032736

2804-
xlog_grant_sub_space(log, &log->l_reserve_head, ticket->t_curr_res);
2805-
xlog_grant_sub_space(log, &log->l_write_head, ticket->t_curr_res);
2737+
xlog_grant_sub_space(&log->l_reserve_head, ticket->t_curr_res);
2738+
xlog_grant_sub_space(&log->l_write_head, ticket->t_curr_res);
28062739
ticket->t_curr_res = ticket->t_unit_res;
28072740

28082741
trace_xfs_log_ticket_regrant_sub(log, ticket);
28092742

28102743
/* just return if we still have some of the pre-reserved space */
28112744
if (!ticket->t_cnt) {
2812-
xlog_grant_add_space(log, &log->l_reserve_head,
2813-
ticket->t_unit_res);
2745+
xlog_grant_add_space(&log->l_reserve_head, ticket->t_unit_res);
28142746
trace_xfs_log_ticket_regrant_exit(log, ticket);
28152747

28162748
ticket->t_curr_res = ticket->t_unit_res;
@@ -2857,8 +2789,8 @@ xfs_log_ticket_ungrant(
28572789
bytes += ticket->t_unit_res*ticket->t_cnt;
28582790
}
28592791

2860-
xlog_grant_sub_space(log, &log->l_reserve_head, bytes);
2861-
xlog_grant_sub_space(log, &log->l_write_head, bytes);
2792+
xlog_grant_sub_space(&log->l_reserve_head, bytes);
2793+
xlog_grant_sub_space(&log->l_write_head, bytes);
28622794

28632795
trace_xfs_log_ticket_ungrant_exit(log, ticket);
28642796

@@ -3331,42 +3263,27 @@ xlog_ticket_alloc(
33313263
}
33323264

33333265
#if defined(DEBUG)
3334-
/*
3335-
* Check to make sure the grant write head didn't just over lap the tail. If
3336-
* the cycles are the same, we can't be overlapping. Otherwise, make sure that
3337-
* the cycles differ by exactly one and check the byte count.
3338-
*
3339-
* This check is run unlocked, so can give false positives. Rather than assert
3340-
* on failures, use a warn-once flag and a panic tag to allow the admin to
3341-
* determine if they want to panic the machine when such an error occurs. For
3342-
* debug kernels this will have the same effect as using an assert but, unlinke
3343-
* an assert, it can be turned off at runtime.
3344-
*/
3345-
STATIC void
3346-
xlog_verify_grant_tail(
3347-
struct xlog *log)
3266+
static void
3267+
xlog_verify_dump_tail(
3268+
struct xlog *log,
3269+
struct xlog_in_core *iclog)
33483270
{
3349-
int tail_cycle, tail_blocks;
3350-
int cycle, space;
3351-
3352-
xlog_crack_grant_head(&log->l_write_head.grant, &cycle, &space);
3353-
xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_blocks);
3354-
if (tail_cycle != cycle) {
3355-
if (cycle - 1 != tail_cycle &&
3356-
!test_and_set_bit(XLOG_TAIL_WARN, &log->l_opstate)) {
3357-
xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES,
3358-
"%s: cycle - 1 != tail_cycle", __func__);
3359-
}
3360-
3361-
if (space > BBTOB(tail_blocks) &&
3362-
!test_and_set_bit(XLOG_TAIL_WARN, &log->l_opstate)) {
3363-
xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES,
3364-
"%s: space > BBTOB(tail_blocks)", __func__);
3365-
}
3366-
}
3367-
}
3368-
3369-
/* check if it will fit */
3271+
xfs_alert(log->l_mp,
3272+
"ran out of log space tail 0x%llx/0x%llx, head lsn 0x%llx, head 0x%x/0x%x, prev head 0x%x/0x%x",
3273+
iclog ? be64_to_cpu(iclog->ic_header.h_tail_lsn) : -1,
3274+
atomic64_read(&log->l_tail_lsn),
3275+
log->l_ailp->ail_head_lsn,
3276+
log->l_curr_cycle, log->l_curr_block,
3277+
log->l_prev_cycle, log->l_prev_block);
3278+
xfs_alert(log->l_mp,
3279+
"write grant 0x%llx, reserve grant 0x%llx, tail_space 0x%llx, size 0x%x, iclog flags 0x%x",
3280+
atomic64_read(&log->l_write_head.grant),
3281+
atomic64_read(&log->l_reserve_head.grant),
3282+
log->l_tail_space, log->l_logsize,
3283+
iclog ? iclog->ic_flags : -1);
3284+
}
3285+
3286+
/* Check if the new iclog will fit in the log. */
33703287
STATIC void
33713288
xlog_verify_tail_lsn(
33723289
struct xlog *log,
@@ -3375,21 +3292,34 @@ xlog_verify_tail_lsn(
33753292
xfs_lsn_t tail_lsn = be64_to_cpu(iclog->ic_header.h_tail_lsn);
33763293
int blocks;
33773294

3378-
if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) {
3379-
blocks =
3380-
log->l_logBBsize - (log->l_prev_block - BLOCK_LSN(tail_lsn));
3381-
if (blocks < BTOBB(iclog->ic_offset)+BTOBB(log->l_iclog_hsize))
3382-
xfs_emerg(log->l_mp, "%s: ran out of log space", __func__);
3383-
} else {
3384-
ASSERT(CYCLE_LSN(tail_lsn)+1 == log->l_prev_cycle);
3295+
if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) {
3296+
blocks = log->l_logBBsize -
3297+
(log->l_prev_block - BLOCK_LSN(tail_lsn));
3298+
if (blocks < BTOBB(iclog->ic_offset) +
3299+
BTOBB(log->l_iclog_hsize)) {
3300+
xfs_emerg(log->l_mp,
3301+
"%s: ran out of log space", __func__);
3302+
xlog_verify_dump_tail(log, iclog);
3303+
}
3304+
return;
3305+
}
33853306

3386-
if (BLOCK_LSN(tail_lsn) == log->l_prev_block)
3307+
if (CYCLE_LSN(tail_lsn) + 1 != log->l_prev_cycle) {
3308+
xfs_emerg(log->l_mp, "%s: head has wrapped tail.", __func__);
3309+
xlog_verify_dump_tail(log, iclog);
3310+
return;
3311+
}
3312+
if (BLOCK_LSN(tail_lsn) == log->l_prev_block) {
33873313
xfs_emerg(log->l_mp, "%s: tail wrapped", __func__);
3314+
xlog_verify_dump_tail(log, iclog);
3315+
return;
3316+
}
33883317

33893318
blocks = BLOCK_LSN(tail_lsn) - log->l_prev_block;
3390-
if (blocks < BTOBB(iclog->ic_offset) + 1)
3391-
xfs_emerg(log->l_mp, "%s: ran out of log space", __func__);
3392-
}
3319+
if (blocks < BTOBB(iclog->ic_offset) + 1) {
3320+
xfs_emerg(log->l_mp, "%s: ran out of iclog space", __func__);
3321+
xlog_verify_dump_tail(log, iclog);
3322+
}
33933323
}
33943324

33953325
/*

0 commit comments

Comments
 (0)