Skip to content

Commit 0dcd5a1

Browse files
Dave ChinnerChandan Babu R
authored andcommitted
xfs: l_last_sync_lsn is really AIL state
The current implementation of xlog_assign_tail_lsn() assumes that when the AIL is empty, the log tail matches the LSN of the last written commit record. This is recorded in xlog_state_set_callback() as log->l_last_sync_lsn when the iclog state changes to XLOG_STATE_CALLBACK. This change is then immediately followed by running the callbacks on the iclog which then insert the log items into the AIL at the "commit lsn" of that checkpoint. The AIL tracks log items via the start record LSN of the checkpoint, not the commit record LSN. This is because we can pipeline multiple checkpoints, and so the start record of checkpoint N+1 can be written before the commit record of checkpoint N. i.e: start N commit N +-------------+------------+----------------+ start N+1 commit N+1 The tail of the log cannot be moved to the LSN of commit N when all the items of that checkpoint are written back, because then the start record for N+1 is no longer in the active portion of the log and recovery will fail/corrupt the filesystem. Hence when all the log items in checkpoint N are written back, the tail of the log most now only move as far forwards as the start LSN of checkpoint N+1. Hence we cannot use the maximum start record LSN the AIL sees as a replacement the pointer to the current head of the on-disk log records. However, we currently only use the l_last_sync_lsn when the AIL is empty - when there is no start LSN remaining, the tail of the log moves to the LSN of the last commit record as this is where recovery needs to start searching for recoverable records. THe next checkpoint will have a start record LSN that is higher than l_last_sync_lsn, and so everything still works correctly when new checkpoints are written to an otherwise empty log. l_last_sync_lsn is an atomic variable because it is currently updated when an iclog with callbacks attached moves to the CALLBACK state. While we hold the icloglock at this point, we don't hold the AIL lock. When we assign the log tail, we hold the AIL lock, not the icloglock because we have to look up the AIL. Hence it is an atomic variable so it's not bound to a specific lock context. However, the iclog callbacks are only used for CIL checkpoints. We don't use callbacks with unmount record writes, so the l_last_sync_lsn variable only gets updated when we are processing CIL checkpoint callbacks. And those callbacks run under AIL lock contexts, not icloglock context. The CIL checkpoint already knows what the LSN of the iclog the commit record was written to (obtained when written into the iclog before submission) and so we can update the l_last_sync_lsn under the AIL lock in this callback. No other iclog callbacks will run until the currently executing one completes, and hence we can update the l_last_sync_lsn under the AIL lock safely. This means l_last_sync_lsn can move to the AIL as the "ail_head_lsn" and it can be used to replace the atomic l_last_sync_lsn in the iclog code. This makes tracking the log tail belong entirely to the AIL, rather than being smeared across log, iclog and AIL state and locking. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
1 parent a07776a commit 0dcd5a1

File tree

8 files changed

+102
-109
lines changed

8 files changed

+102
-109
lines changed

fs/xfs/xfs_log.c

Lines changed: 9 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1230,47 +1230,6 @@ xfs_log_cover(
12301230
return error;
12311231
}
12321232

1233-
/*
1234-
* We may be holding the log iclog lock upon entering this routine.
1235-
*/
1236-
xfs_lsn_t
1237-
xlog_assign_tail_lsn_locked(
1238-
struct xfs_mount *mp)
1239-
{
1240-
struct xlog *log = mp->m_log;
1241-
struct xfs_log_item *lip;
1242-
xfs_lsn_t tail_lsn;
1243-
1244-
assert_spin_locked(&mp->m_ail->ail_lock);
1245-
1246-
/*
1247-
* To make sure we always have a valid LSN for the log tail we keep
1248-
* track of the last LSN which was committed in log->l_last_sync_lsn,
1249-
* and use that when the AIL was empty.
1250-
*/
1251-
lip = xfs_ail_min(mp->m_ail);
1252-
if (lip)
1253-
tail_lsn = lip->li_lsn;
1254-
else
1255-
tail_lsn = atomic64_read(&log->l_last_sync_lsn);
1256-
trace_xfs_log_assign_tail_lsn(log, tail_lsn);
1257-
atomic64_set(&log->l_tail_lsn, tail_lsn);
1258-
return tail_lsn;
1259-
}
1260-
1261-
xfs_lsn_t
1262-
xlog_assign_tail_lsn(
1263-
struct xfs_mount *mp)
1264-
{
1265-
xfs_lsn_t tail_lsn;
1266-
1267-
spin_lock(&mp->m_ail->ail_lock);
1268-
tail_lsn = xlog_assign_tail_lsn_locked(mp);
1269-
spin_unlock(&mp->m_ail->ail_lock);
1270-
1271-
return tail_lsn;
1272-
}
1273-
12741233
/*
12751234
* Return the space in the log between the tail and the head. The head
12761235
* is passed in the cycle/bytes formal parms. In the special case where
@@ -1501,7 +1460,6 @@ xlog_alloc_log(
15011460
log->l_prev_block = -1;
15021461
/* log->l_tail_lsn = 0x100000000LL; cycle = 1; current block = 0 */
15031462
xlog_assign_atomic_lsn(&log->l_tail_lsn, 1, 0);
1504-
xlog_assign_atomic_lsn(&log->l_last_sync_lsn, 1, 0);
15051463
log->l_curr_cycle = 1; /* 0 is bad since this is initial value */
15061464

15071465
if (xfs_has_logv2(mp) && mp->m_sb.sb_logsunit > 1)
@@ -2549,44 +2507,23 @@ xlog_get_lowest_lsn(
25492507
return lowest_lsn;
25502508
}
25512509

2552-
/*
2553-
* Completion of a iclog IO does not imply that a transaction has completed, as
2554-
* transactions can be large enough to span many iclogs. We cannot change the
2555-
* tail of the log half way through a transaction as this may be the only
2556-
* transaction in the log and moving the tail to point to the middle of it
2557-
* will prevent recovery from finding the start of the transaction. Hence we
2558-
* should only update the last_sync_lsn if this iclog contains transaction
2559-
* completion callbacks on it.
2560-
*
2561-
* We have to do this before we drop the icloglock to ensure we are the only one
2562-
* that can update it.
2563-
*
2564-
* If we are moving the last_sync_lsn forwards, we also need to ensure we kick
2565-
* the reservation grant head pushing. This is due to the fact that the push
2566-
* target is bound by the current last_sync_lsn value. Hence if we have a large
2567-
* amount of log space bound up in this committing transaction then the
2568-
* last_sync_lsn value may be the limiting factor preventing tail pushing from
2569-
* freeing space in the log. Hence once we've updated the last_sync_lsn we
2570-
* should push the AIL to ensure the push target (and hence the grant head) is
2571-
* no longer bound by the old log head location and can move forwards and make
2572-
* progress again.
2573-
*/
25742510
static void
25752511
xlog_state_set_callback(
25762512
struct xlog *log,
25772513
struct xlog_in_core *iclog,
25782514
xfs_lsn_t header_lsn)
25792515
{
2516+
/*
2517+
* If there are no callbacks on this iclog, we can mark it clean
2518+
* immediately and return. Otherwise we need to run the
2519+
* callbacks.
2520+
*/
2521+
if (list_empty(&iclog->ic_callbacks)) {
2522+
xlog_state_clean_iclog(log, iclog);
2523+
return;
2524+
}
25802525
trace_xlog_iclog_callback(iclog, _RET_IP_);
25812526
iclog->ic_state = XLOG_STATE_CALLBACK;
2582-
2583-
ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
2584-
header_lsn) <= 0);
2585-
2586-
if (list_empty_careful(&iclog->ic_callbacks))
2587-
return;
2588-
2589-
atomic64_set(&log->l_last_sync_lsn, header_lsn);
25902527
}
25912528

25922529
/*

fs/xfs/xfs_log_cil.c

Lines changed: 41 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -721,6 +721,24 @@ xlog_cil_ail_insert_batch(
721721
* items into the AIL. This uses bulk insertion techniques to minimise AIL lock
722722
* traffic.
723723
*
724+
* The AIL tracks log items via the start record LSN of the checkpoint,
725+
* not the commit record LSN. This is because we can pipeline multiple
726+
* checkpoints, and so the start record of checkpoint N+1 can be
727+
* written before the commit record of checkpoint N. i.e:
728+
*
729+
* start N commit N
730+
* +-------------+------------+----------------+
731+
* start N+1 commit N+1
732+
*
733+
* The tail of the log cannot be moved to the LSN of commit N when all
734+
* the items of that checkpoint are written back, because then the
735+
* start record for N+1 is no longer in the active portion of the log
736+
* and recovery will fail/corrupt the filesystem.
737+
*
738+
* Hence when all the log items in checkpoint N are written back, the
739+
* tail of the log most now only move as far forwards as the start LSN
740+
* of checkpoint N+1.
741+
*
724742
* If we are called with the aborted flag set, it is because a log write during
725743
* a CIL checkpoint commit has failed. In this case, all the items in the
726744
* checkpoint have already gone through iop_committed and iop_committing, which
@@ -738,24 +756,33 @@ xlog_cil_ail_insert_batch(
738756
*/
739757
static void
740758
xlog_cil_ail_insert(
741-
struct xlog *log,
742-
struct list_head *lv_chain,
743-
xfs_lsn_t commit_lsn,
759+
struct xfs_cil_ctx *ctx,
744760
bool aborted)
745761
{
746762
#define LOG_ITEM_BATCH_SIZE 32
747-
struct xfs_ail *ailp = log->l_ailp;
763+
struct xfs_ail *ailp = ctx->cil->xc_log->l_ailp;
748764
struct xfs_log_item *log_items[LOG_ITEM_BATCH_SIZE];
749765
struct xfs_log_vec *lv;
750766
struct xfs_ail_cursor cur;
751767
int i = 0;
752768

769+
/*
770+
* Update the AIL head LSN with the commit record LSN of this
771+
* checkpoint. As iclogs are always completed in order, this should
772+
* always be the same (as iclogs can contain multiple commit records) or
773+
* higher LSN than the current head. We do this before insertion of the
774+
* items so that log space checks during insertion will reflect the
775+
* space that this checkpoint has already consumed.
776+
*/
777+
ASSERT(XFS_LSN_CMP(ctx->commit_lsn, ailp->ail_head_lsn) >= 0 ||
778+
aborted);
753779
spin_lock(&ailp->ail_lock);
754-
xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
780+
ailp->ail_head_lsn = ctx->commit_lsn;
781+
xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn);
755782
spin_unlock(&ailp->ail_lock);
756783

757784
/* unpin all the log items */
758-
list_for_each_entry(lv, lv_chain, lv_list) {
785+
list_for_each_entry(lv, &ctx->lv_chain, lv_list) {
759786
struct xfs_log_item *lip = lv->lv_item;
760787
xfs_lsn_t item_lsn;
761788

@@ -768,9 +795,10 @@ xlog_cil_ail_insert(
768795
}
769796

770797
if (lip->li_ops->iop_committed)
771-
item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
798+
item_lsn = lip->li_ops->iop_committed(lip,
799+
ctx->start_lsn);
772800
else
773-
item_lsn = commit_lsn;
801+
item_lsn = ctx->start_lsn;
774802

775803
/* item_lsn of -1 means the item needs no further processing */
776804
if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
@@ -787,7 +815,7 @@ xlog_cil_ail_insert(
787815
continue;
788816
}
789817

790-
if (item_lsn != commit_lsn) {
818+
if (item_lsn != ctx->start_lsn) {
791819

792820
/*
793821
* Not a bulk update option due to unusual item_lsn.
@@ -810,14 +838,15 @@ xlog_cil_ail_insert(
810838
log_items[i++] = lv->lv_item;
811839
if (i >= LOG_ITEM_BATCH_SIZE) {
812840
xlog_cil_ail_insert_batch(ailp, &cur, log_items,
813-
LOG_ITEM_BATCH_SIZE, commit_lsn);
841+
LOG_ITEM_BATCH_SIZE, ctx->start_lsn);
814842
i = 0;
815843
}
816844
}
817845

818846
/* make sure we insert the remainder! */
819847
if (i)
820-
xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn);
848+
xlog_cil_ail_insert_batch(ailp, &cur, log_items, i,
849+
ctx->start_lsn);
821850

822851
spin_lock(&ailp->ail_lock);
823852
xfs_trans_ail_cursor_done(&cur);
@@ -863,8 +892,7 @@ xlog_cil_committed(
863892
spin_unlock(&ctx->cil->xc_push_lock);
864893
}
865894

866-
xlog_cil_ail_insert(ctx->cil->xc_log, &ctx->lv_chain,
867-
ctx->start_lsn, abort);
895+
xlog_cil_ail_insert(ctx, abort);
868896

869897
xfs_extent_busy_sort(&ctx->busy_extents.extent_list);
870898
xfs_extent_busy_clear(mp, &ctx->busy_extents.extent_list,

fs/xfs/xfs_log_priv.h

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -431,13 +431,10 @@ struct xlog {
431431
int l_prev_block; /* previous logical log block */
432432

433433
/*
434-
* l_last_sync_lsn and l_tail_lsn are atomics so they can be set and
435-
* read without needing to hold specific locks. To avoid operations
436-
* contending with other hot objects, place each of them on a separate
437-
* cacheline.
434+
* l_tail_lsn is atomic so it can be set and read without needing to
435+
* hold specific locks. To avoid operations contending with other hot
436+
* objects, it on a separate cacheline.
438437
*/
439-
/* lsn of last LR on disk */
440-
atomic64_t l_last_sync_lsn ____cacheline_aligned_in_smp;
441438
/* lsn of 1st LR with unflushed * buffers */
442439
atomic64_t l_tail_lsn ____cacheline_aligned_in_smp;
443440

fs/xfs/xfs_log_recover.c

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1177,8 +1177,8 @@ xlog_check_unmount_rec(
11771177
*/
11781178
xlog_assign_atomic_lsn(&log->l_tail_lsn,
11791179
log->l_curr_cycle, after_umount_blk);
1180-
xlog_assign_atomic_lsn(&log->l_last_sync_lsn,
1181-
log->l_curr_cycle, after_umount_blk);
1180+
log->l_ailp->ail_head_lsn =
1181+
atomic64_read(&log->l_tail_lsn);
11821182
*tail_blk = after_umount_blk;
11831183

11841184
*clean = true;
@@ -1212,7 +1212,7 @@ xlog_set_state(
12121212
if (bump_cycle)
12131213
log->l_curr_cycle++;
12141214
atomic64_set(&log->l_tail_lsn, be64_to_cpu(rhead->h_tail_lsn));
1215-
atomic64_set(&log->l_last_sync_lsn, be64_to_cpu(rhead->h_lsn));
1215+
log->l_ailp->ail_head_lsn = be64_to_cpu(rhead->h_lsn);
12161216
xlog_assign_grant_head(&log->l_reserve_head.grant, log->l_curr_cycle,
12171217
BBTOB(log->l_curr_block));
12181218
xlog_assign_grant_head(&log->l_write_head.grant, log->l_curr_cycle,
@@ -3366,14 +3366,13 @@ xlog_do_recover(
33663366

33673367
/*
33683368
* We now update the tail_lsn since much of the recovery has completed
3369-
* and there may be space available to use. If there were no extent
3370-
* or iunlinks, we can free up the entire log and set the tail_lsn to
3371-
* be the last_sync_lsn. This was set in xlog_find_tail to be the
3372-
* lsn of the last known good LR on disk. If there are extent frees
3373-
* or iunlinks they will have some entries in the AIL; so we look at
3374-
* the AIL to determine how to set the tail_lsn.
3369+
* and there may be space available to use. If there were no extent or
3370+
* iunlinks, we can free up the entire log. This was set in
3371+
* xlog_find_tail to be the lsn of the last known good LR on disk. If
3372+
* there are extent frees or iunlinks they will have some entries in the
3373+
* AIL; so we look at the AIL to determine how to set the tail_lsn.
33753374
*/
3376-
xlog_assign_tail_lsn(mp);
3375+
xfs_ail_assign_tail_lsn(log->l_ailp);
33773376

33783377
/*
33793378
* Now that we've finished replaying all buffer and inode updates,

fs/xfs/xfs_trace.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
#include "xfs_trans.h"
2323
#include "xfs_log.h"
2424
#include "xfs_log_priv.h"
25+
#include "xfs_trans_priv.h"
2526
#include "xfs_buf_item.h"
2627
#include "xfs_quota.h"
2728
#include "xfs_dquot_item.h"

fs/xfs/xfs_trace.h

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1407,19 +1407,19 @@ TRACE_EVENT(xfs_log_assign_tail_lsn,
14071407
__field(dev_t, dev)
14081408
__field(xfs_lsn_t, new_lsn)
14091409
__field(xfs_lsn_t, old_lsn)
1410-
__field(xfs_lsn_t, last_sync_lsn)
1410+
__field(xfs_lsn_t, head_lsn)
14111411
),
14121412
TP_fast_assign(
14131413
__entry->dev = log->l_mp->m_super->s_dev;
14141414
__entry->new_lsn = new_lsn;
14151415
__entry->old_lsn = atomic64_read(&log->l_tail_lsn);
1416-
__entry->last_sync_lsn = atomic64_read(&log->l_last_sync_lsn);
1416+
__entry->head_lsn = log->l_ailp->ail_head_lsn;
14171417
),
1418-
TP_printk("dev %d:%d new tail lsn %d/%d, old lsn %d/%d, last sync %d/%d",
1418+
TP_printk("dev %d:%d new tail lsn %d/%d, old lsn %d/%d, head lsn %d/%d",
14191419
MAJOR(__entry->dev), MINOR(__entry->dev),
14201420
CYCLE_LSN(__entry->new_lsn), BLOCK_LSN(__entry->new_lsn),
14211421
CYCLE_LSN(__entry->old_lsn), BLOCK_LSN(__entry->old_lsn),
1422-
CYCLE_LSN(__entry->last_sync_lsn), BLOCK_LSN(__entry->last_sync_lsn))
1422+
CYCLE_LSN(__entry->head_lsn), BLOCK_LSN(__entry->head_lsn))
14231423
)
14241424

14251425
DECLARE_EVENT_CLASS(xfs_file_class,

fs/xfs/xfs_trans_ail.c

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -720,6 +720,26 @@ xfs_ail_push_all_sync(
720720
finish_wait(&ailp->ail_empty, &wait);
721721
}
722722

723+
void
724+
__xfs_ail_assign_tail_lsn(
725+
struct xfs_ail *ailp)
726+
{
727+
struct xlog *log = ailp->ail_log;
728+
xfs_lsn_t tail_lsn;
729+
730+
assert_spin_locked(&ailp->ail_lock);
731+
732+
if (xlog_is_shutdown(log))
733+
return;
734+
735+
tail_lsn = __xfs_ail_min_lsn(ailp);
736+
if (!tail_lsn)
737+
tail_lsn = ailp->ail_head_lsn;
738+
739+
trace_xfs_log_assign_tail_lsn(log, tail_lsn);
740+
atomic64_set(&log->l_tail_lsn, tail_lsn);
741+
}
742+
723743
/*
724744
* Callers should pass the original tail lsn so that we can detect if the tail
725745
* has moved as a result of the operation that was performed. If the caller
@@ -734,15 +754,13 @@ xfs_ail_update_finish(
734754
{
735755
struct xlog *log = ailp->ail_log;
736756

737-
/* if the tail lsn hasn't changed, don't do updates or wakeups. */
757+
/* If the tail lsn hasn't changed, don't do updates or wakeups. */
738758
if (!old_lsn || old_lsn == __xfs_ail_min_lsn(ailp)) {
739759
spin_unlock(&ailp->ail_lock);
740760
return;
741761
}
742762

743-
if (!xlog_is_shutdown(log))
744-
xlog_assign_tail_lsn_locked(log->l_mp);
745-
763+
__xfs_ail_assign_tail_lsn(ailp);
746764
if (list_empty(&ailp->ail_head))
747765
wake_up_all(&ailp->ail_empty);
748766
spin_unlock(&ailp->ail_lock);

fs/xfs/xfs_trans_priv.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ struct xfs_ail {
5555
struct list_head ail_cursors;
5656
spinlock_t ail_lock;
5757
xfs_lsn_t ail_last_pushed_lsn;
58+
xfs_lsn_t ail_head_lsn;
5859
int ail_log_flush;
5960
unsigned long ail_opstate;
6061
struct list_head ail_buf_list;
@@ -130,6 +131,18 @@ struct xfs_log_item * xfs_trans_ail_cursor_next(struct xfs_ail *ailp,
130131
struct xfs_ail_cursor *cur);
131132
void xfs_trans_ail_cursor_done(struct xfs_ail_cursor *cur);
132133

134+
void __xfs_ail_assign_tail_lsn(struct xfs_ail *ailp);
135+
136+
static inline void
137+
xfs_ail_assign_tail_lsn(
138+
struct xfs_ail *ailp)
139+
{
140+
141+
spin_lock(&ailp->ail_lock);
142+
__xfs_ail_assign_tail_lsn(ailp);
143+
spin_unlock(&ailp->ail_lock);
144+
}
145+
133146
#if BITS_PER_LONG != 64
134147
static inline void
135148
xfs_trans_ail_copy_lsn(

0 commit comments

Comments
 (0)