Skip to content

Conversation

robn
Copy link
Member

@robn robn commented Aug 20, 2025

[Sponsors: Klara, Inc., Wasabi Technology, Inc.]

Motivation and Context

Fixes longstanding issues around tracking and checking for dnode dirtiness. Made famous by #15526, came to our attention again in a recent CI failure (see #17652).

Fixes: #17652
Fixes: #16297
Fixes: #15526
Fixes: #13143
Fixes: #11900
Fixes: #11824
Fixes: #9104
Fixes: #9068
Fixes: #8048
Fixes: #7997
Fixes: #7933
Fixes: #7733
Fixes: #7147
Closes: #15615

(Yes, that's a ridiculously long list. Might not even be all of them; those were the ones that seemed to match by description, crash style and call trace).

Description

Here we add a counter to dnode_t, dn_dirtycnt, that counts the number of txgs this dnode is dirty on. This is incremented the first time a dnode is made dirty on a txg (dnode_setdirty()), and decremented it has been synced to disk (dnode_rele_task(), userquota_updates_task()).

Full credit to @rrevans for the analysis and suggestion, see #15615 (comment) and #17652 (comment).

After that, dnode_is_dirty() becomes a simple check under lock.

Finally, we remove the other efforts at dirtyness tracking: dn_dirtyctx, dn_dirtyctx_firstset, dn_dirty_txg, DNODE_IS_DIRTY(), dnode_set_dirtyctx(). These were either unused, insufficient, or made redundant by the new counter, and have all made the situtation just a little more complicated each time.

How Has This Been Tested?

1000 runs of seekflood 2000 6 completed without issue. Previously attempts had triggered the crash in #17652 at ~400 and ~80.

Full ZTS run completed on 6.12.38.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

robn added 5 commits August 20, 2025 10:21
Bumped when we take the dirty hold in dnode_setdirty(), dropped when the
dnode is finally cleaned up after sync in dnode_rele_task() or
userquota_updates_task().

This gives us a way to check if the dnode is dirty on any txg without
having to rely on outside information (eg presence on a dirty list),
which has been a rich source of bugs in the past.

Suggested-by: Robert Evans <[email protected]>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
dn_dirty_txg only existed for DNODE_IS_DIRTY(). In turn, that only
existed to ensure that a dnode was clean before making it eligible for
removal from the array of cached dnodes attached to the object 0 L0
dbuf.

dn_dirtycnt is enough to check that now, so use it directly and remove
the rest.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
Old debug param, not used for anything.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
Only used for a couple of debug assertions which had very little value.

Setting it required taking certain locks, so we can remove all that too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
@rrevans
Copy link
Contributor

rrevans commented Aug 20, 2025

That's a lot of fixes. Muckraking indeed.

One thought- I think dn_dirtyctx and firstset are worth keeping as they enforce the invariant that dnodes only get dirtied from open or sync context but never both.

Or I think that's an invariant? I don't remember clearly, but it's worth understanding before ripping that part out. At least to write down why it's obsolete if it is.

@robn
Copy link
Member Author

robn commented Aug 20, 2025

Muckraking indeed.

As time goes on, I find myself more and more delighted by stupid wordplays. Not yet sure if its just general getting old or if its a symptom of very little adult human contact (100% home office), that is, no immediate feedback that no, please, stop.

One thought- I think dn_dirtyctx and firstset are worth keeping as they enforce the invariant that dnodes only get dirtied from open or sync context but never both.

dn_dirtyctx_firstset is never used; it seems to be set to the source tag of the first call to dnode_set_dirtyctx(), which ends up being a pointer to the first dbuf dirtied via dbuf_dirty(). Which isn't bad as a debugging tool perhaps, but I've never used it or though to use it, and I reckon that's a lot easier to find these days with a probe.

dn_dirtyctx is used to enforce "dirtied from open or sync, but never both" via this assert in dbuf_dirty():

	ASSERT(dn->dn_object == DMU_META_DNODE_OBJECT ||
	    dn->dn_dirtyctx == DN_UNDIRTIED || dn->dn_dirtyctx ==
	    (dmu_tx_is_syncing(tx) ? DN_DIRTY_SYNC : DN_DIRTY_OPEN));

The assertions are debug-only, but the setup wasn't, so the call to dn_set_dirtyctx() needed to take dn_mtx and possibly ds_bp_rwlock. So this particular construction is not zero-cost even in production builds, though its pretty negligible.

The invariant is still true I expect; it intuitively makes sense, but I don't really have this locking deeply internalised.

If we want to keep it, I think the way I would try to do it is make dn_dirtycnt a bitfield (maybe with a different name). Six bits, two per txg, one for "dirty on open context", one for "dirty on syncing context". Then we can still up/down it in the same places, >0 to test overall dirtiness still holds, but no extra overhead. Does that seem right?

@rrevans
Copy link
Contributor

rrevans commented Aug 20, 2025

If we want to keep it, I think the way I would try to do it is make dn_dirtycnt a bitfield

Clever, but maybe too clever for a debug facility?

I think it's much clearer to have the plain counter because it's so much more obvious.

Copy link
Contributor

@adamdmoss adamdmoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fix that's also a simplification, hooray!
LGTM.

@robn
Copy link
Member Author

robn commented Aug 20, 2025

I think it's much clearer to have the plain counter because it's so much more obvious.

I agree, that's why I just ripped it all out.

@rrevans
Copy link
Contributor

rrevans commented Aug 20, 2025

@robn Looking closely, I don't think dn_dirtyctx is ever reset? I think it then devolves into enforcing that mos objects are dirtied in sync context and normal dataset objects are dirtied in open context in an unintuitive way?

If we want to really protect these invariants, we'd prefer to teach dnode_dirty the full set of invariants based on object properties and check directly. This already happens for normal objects, but mos objects are looser for reasons I don't understand. (I also can't see how the same object could ever be dirtied in both contexts despite the dubious comments to the contrary given that dn_dirtyctx would trip in exactly that case.)

Anyway, TL;DR- I agree dn_dirtyctx is low value and not worth keeping given the other checks.

Copy link
Contributor

@rrevans rrevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for seeing this through! It's all quite elegant and tidy. LGTM

@rrevans
Copy link
Contributor

rrevans commented Aug 20, 2025

If we are keeping score of the number of times this bug has shown up in the past, add https://github.com/openzfs/zfs/pull/16019/files#r1536719525 to the list?

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 20, 2025
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! @rrevans @robn thanks for digging in to this and wrapping up this long standing bit of racy code. This is so much simpler than the various other partial attempts at fixes.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Aug 20, 2025
behlendorf pushed a commit that referenced this pull request Aug 21, 2025
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Robert Evans <[email protected]>
Reviewed-by: Adam Moss <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #16297
Closes #17652
Closes #17658
behlendorf pushed a commit that referenced this pull request Aug 21, 2025
dn_dirty_txg only existed for DNODE_IS_DIRTY(). In turn, that only
existed to ensure that a dnode was clean before making it eligible for
removal from the array of cached dnodes attached to the object 0 L0
dbuf.

dn_dirtycnt is enough to check that now, so use it directly and remove
the rest.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Robert Evans <[email protected]>
Reviewed-by: Adam Moss <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #16297
Closes #17652
Closes #17658
behlendorf pushed a commit that referenced this pull request Aug 21, 2025
Old debug param, not used for anything.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Robert Evans <[email protected]>
Reviewed-by: Adam Moss <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #16297
Closes #17652
Closes #17658
behlendorf pushed a commit that referenced this pull request Aug 21, 2025
Only used for a couple of debug assertions which had very little value.

Setting it required taking certain locks, so we can remove all that too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Robert Evans <[email protected]>
Reviewed-by: Adam Moss <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes #16297
Closes #17652
Closes #17658
Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be good otherwise. Thanks.

spauka pushed a commit to spauka/zfs that referenced this pull request Aug 30, 2025
Bumped when we take the dirty hold in dnode_setdirty(), dropped when the
dnode is finally cleaned up after sync in dnode_rele_task() or
userquota_updates_task().

This gives us a way to check if the dnode is dirty on any txg without
having to rely on outside information (eg presence on a dirty list),
which has been a rich source of bugs in the past.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Suggested-by: Robert Evans <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Robert Evans <[email protected]>
Reviewed-by: Adam Moss <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16297
Closes openzfs#17652
Closes openzfs#17658
spauka pushed a commit to spauka/zfs that referenced this pull request Aug 30, 2025
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Robert Evans <[email protected]>
Reviewed-by: Adam Moss <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16297
Closes openzfs#17652
Closes openzfs#17658
spauka pushed a commit to spauka/zfs that referenced this pull request Aug 30, 2025
dn_dirty_txg only existed for DNODE_IS_DIRTY(). In turn, that only
existed to ensure that a dnode was clean before making it eligible for
removal from the array of cached dnodes attached to the object 0 L0
dbuf.

dn_dirtycnt is enough to check that now, so use it directly and remove
the rest.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Robert Evans <[email protected]>
Reviewed-by: Adam Moss <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16297
Closes openzfs#17652
Closes openzfs#17658
spauka pushed a commit to spauka/zfs that referenced this pull request Aug 30, 2025
Old debug param, not used for anything.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Robert Evans <[email protected]>
Reviewed-by: Adam Moss <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16297
Closes openzfs#17652
Closes openzfs#17658
spauka pushed a commit to spauka/zfs that referenced this pull request Aug 30, 2025
Only used for a couple of debug assertions which had very little value.

Setting it required taking certain locks, so we can remove all that too.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Robert Evans <[email protected]>
Reviewed-by: Adam Moss <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#16297
Closes openzfs#17652
Closes openzfs#17658
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Accepted Ready to integrate (reviewed, tested)

Projects

None yet

5 participants