-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
I reproduced the crash first noticed here: #17625 (comment)
It took over 400 rounds of seekflood
, so its tricky to hit.
[Mon Aug 18 20:22:33 2025] VERIFY3B(node->next == ((void *) 0x100 + (0xdead000000000000UL)), ==, node->prev == ((void *) 0x122 + (0xdead000000000000UL))) failed (0 == 1)
[Mon Aug 18 20:22:33 2025] PANIC at list.h:188:list_link_active()
[Mon Aug 18 20:22:33 2025] Showing stack for process 1895973
[Mon Aug 18 20:22:33 2025] CPU: 0 UID: 0 PID: 1895973 Comm: seekflood Tainted: P OE 6.17.0-rc2 #1 PREEMPT(voluntary)
[Mon Aug 18 20:22:33 2025] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[Mon Aug 18 20:22:33 2025] Hardware name: FreeBSD BHYVE/BHYVE, BIOS 14.0 10/17/2021
[Mon Aug 18 20:22:33 2025] Call Trace:
[Mon Aug 18 20:22:33 2025] <TASK>
[Mon Aug 18 20:22:33 2025] dump_stack_lvl+0x5d/0x80
[Mon Aug 18 20:22:33 2025] spl_panic+0xf3/0x118 [spl]
[Mon Aug 18 20:22:33 2025] ? dnode_hold_impl+0x8eb/0x1080 [zfs]
[Mon Aug 18 20:22:33 2025] list_link_active+0x69/0x70 [zfs]
[Mon Aug 18 20:22:33 2025] dnode_is_dirty+0x62/0x190 [zfs]
[Mon Aug 18 20:22:33 2025] dmu_offset_next+0xc4/0x260 [zfs]
[Mon Aug 18 20:22:33 2025] zfs_holey_common+0xa0/0x190 [zfs]
[Mon Aug 18 20:22:33 2025] zfs_holey+0x51/0x80 [zfs]
[Mon Aug 18 20:22:33 2025] zpl_llseek+0x89/0xd0 [zfs]
[Mon Aug 18 20:22:33 2025] ksys_lseek+0x3f/0xb0
[Mon Aug 18 20:22:33 2025] do_syscall_64+0x84/0x2f0
[Mon Aug 18 20:22:33 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025] ? do_syscall_64+0xbc/0x2f0
[Mon Aug 18 20:22:33 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025] ? do_syscall_64+0xbc/0x2f0
[Mon Aug 18 20:22:33 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025] ? zpl_iter_write+0x134/0x160 [zfs]
[Mon Aug 18 20:22:33 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025] ? vfs_write+0x25d/0x450
[Mon Aug 18 20:22:33 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025] ? ksys_write+0x6b/0xe0
[Mon Aug 18 20:22:33 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025] ? __task_pid_nr_ns+0xa0/0xb0
[Mon Aug 18 20:22:33 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025] ? do_syscall_64+0xbc/0x2f0
[Mon Aug 18 20:22:33 2025] ? __task_pid_nr_ns+0xa0/0xb0
[Mon Aug 18 20:22:33 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025] ? do_syscall_64+0xbc/0x2f0
[Mon Aug 18 20:22:33 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Mon Aug 18 20:22:33 2025] RIP: 0033:0x7fc65bb24637
[Mon Aug 18 20:22:33 2025] Code: 8b 05 c5 37 0e 00 64 c7 00 0d 00 00 00 eb b2 e8 7f 95 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 08 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 91 37 0e 00 f7 d8 6
4 89 02 48
[Mon Aug 18 20:22:33 2025] RSP: 002b:00007fffcbd45058 EFLAGS: 00000246 ORIG_RAX: 0000000000000008
[Mon Aug 18 20:22:33 2025] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc65bb24637
[Mon Aug 18 20:22:33 2025] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000003
[Mon Aug 18 20:22:33 2025] RBP: 00007fffcbd45100 R08: 0000000000000000 R09: 0000000000000000
[Mon Aug 18 20:22:33 2025] R10: 0000000000000180 R11: 0000000000000246 R12: 00000000000001e6
[Mon Aug 18 20:22:33 2025] R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000000000
[Mon Aug 18 20:22:33 2025] </TASK>
I don't have the brain left today to try and understand it fully and patch it. Here's what I've learned if someone wants to get to it before I do.
The crash in question:
static inline int
list_link_active(list_node_t *node)
{
EQUIV(node->next == LIST_POISON1, node->prev == LIST_POISON2);
return (node->next != LIST_POISON1);
}
So its just testing that both pointers are poisoned, or neither are.
Calling function:
boolean_t
dnode_is_dirty(dnode_t *dn)
{
mutex_enter(&dn->dn_mtx);
for (int i = 0; i < TXG_SIZE; i++) {
if (multilist_link_active(&dn->dn_dirty_link[i]) ||
!list_is_empty(&dn->dn_dirty_records[i])) {
mutex_exit(&dn->dn_mtx);
return (B_TRUE);
}
}
mutex_exit(&dn->dn_mtx);
return (B_FALSE);
}
multilist_link_active()
is thin wrapping around list_link_active()
:
int
multilist_link_active(multilist_node_t *link)
{
return (list_link_active(link));
}
In crash debugger:
sdb> find_task 1895973 | frame 6 dn | member dn_dirty_link
(multilist_node_t [4]){
{
.next = (struct list_head *)0xdead000000000100,
.prev = (struct list_head *)0xdead000000000122,
},
{
.next = (struct list_head *)0xdead000000000100,
.prev = (struct list_head *)0xdead000000000122,
},
{
.next = (struct list_head *)0xdead000000000100,
.prev = (struct list_head *)0xdead000000000122,
},
{
.next = (struct list_head *)0xdead000000000100,
.prev = (struct list_head *)0xdead000000000122,
},
}
So most likely, we caught this in a transition from "active" to "inactive", that is, being removed from the list. That is, list_del()
in the kernel, which is list_remove*()
for us:
static inline void list_del(struct list_head *entry)
{
__list_del_entry(entry);
entry->next = LIST_POISON1;
entry->prev = LIST_POISON2;
}
This is wrapped by multilist_sublist_remove()
and variants:
multilist_sublist_remove(multilist_sublist_t *mls, void *obj)
{
ASSERT(MUTEX_HELD(&mls->mls_lock));
list_remove(&mls->mls_list, obj);
}
dn_dirty_link
is the list linkage node for os_dirty_dnodes
and os_synced_dnodes
:
int
dmu_objset_open_impl(spa_t *spa, dsl_dataset_t *ds, blkptr_t *bp,
objset_t **osp)
{
...
for (i = 0; i < TXG_SIZE; i++) {
multilist_create(&os->os_dirty_dnodes[i], sizeof (dnode_t),
offsetof(dnode_t, dn_dirty_link[i]),
dnode_multilist_index_func);
void
dmu_objset_sync(objset_t *os, zio_t *pio, dmu_tx_t *tx)
{
...
multilist_create(&os->os_synced_dnodes, sizeof (dnode_t),
offsetof(dnode_t, dn_dirty_link[txgoff]),
dnode_multilist_index_func);
There appears to be three calls to multilist_sublist_remove()
for one of these lists, in:
dmu_objset_sync_dnodes()
userquota_updates_task()
dnode_rele_task()
None hold db_mtx
while removing the dnode form the list, which dnode_is_dirty()
is using to protect access to the node, so there's nothing stopping it observing the removal directly, tripping the assert and crashing.
The "fixes" here seem to be to either take the sublist lock while checking dirtiness, or cache the dirty state on the dnode. @rrevans had some good thoughts about this in #15615 (comment); now is probably the time to dust that off and have a go at it.