`list_link_active` panic in `dnode_is_dirty`

I reproduced the crash first noticed here: https://github.com/openzfs/zfs/pull/17625#issuecomment-3185929793

It took over 400 rounds of `seekflood`, so its tricky to hit.

```
[Mon Aug 18 20:22:33 2025] VERIFY3B(node->next == ((void *) 0x100 + (0xdead000000000000UL)), ==, node->prev == ((void *) 0x122 + (0xdead000000000000UL))) failed (0 == 1)
[Mon Aug 18 20:22:33 2025] PANIC at list.h:188:list_link_active()
[Mon Aug 18 20:22:33 2025] Showing stack for process 1895973
[Mon Aug 18 20:22:33 2025] CPU: 0 UID: 0 PID: 1895973 Comm: seekflood Tainted: P           OE       6.17.0-rc2 #1 PREEMPT(voluntary)
[Mon Aug 18 20:22:33 2025] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[Mon Aug 18 20:22:33 2025] Hardware name: FreeBSD BHYVE/BHYVE, BIOS 14.0 10/17/2021
[Mon Aug 18 20:22:33 2025] Call Trace:
[Mon Aug 18 20:22:33 2025]  <TASK>
[Mon Aug 18 20:22:33 2025]  dump_stack_lvl+0x5d/0x80
[Mon Aug 18 20:22:33 2025]  spl_panic+0xf3/0x118 [spl]
[Mon Aug 18 20:22:33 2025]  ? dnode_hold_impl+0x8eb/0x1080 [zfs]
[Mon Aug 18 20:22:33 2025]  list_link_active+0x69/0x70 [zfs]
[Mon Aug 18 20:22:33 2025]  dnode_is_dirty+0x62/0x190 [zfs]
[Mon Aug 18 20:22:33 2025]  dmu_offset_next+0xc4/0x260 [zfs]
[Mon Aug 18 20:22:33 2025]  zfs_holey_common+0xa0/0x190 [zfs]
[Mon Aug 18 20:22:33 2025]  zfs_holey+0x51/0x80 [zfs]
[Mon Aug 18 20:22:33 2025]  zpl_llseek+0x89/0xd0 [zfs]
[Mon Aug 18 20:22:33 2025]  ksys_lseek+0x3f/0xb0
[Mon Aug 18 20:22:33 2025]  do_syscall_64+0x84/0x2f0
[Mon Aug 18 20:22:33 2025]  ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025]  ? do_syscall_64+0xbc/0x2f0
[Mon Aug 18 20:22:33 2025]  ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025]  ? do_syscall_64+0xbc/0x2f0
[Mon Aug 18 20:22:33 2025]  ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025]  ? zpl_iter_write+0x134/0x160 [zfs]
[Mon Aug 18 20:22:33 2025]  ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025]  ? vfs_write+0x25d/0x450
[Mon Aug 18 20:22:33 2025]  ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025]  ? ksys_write+0x6b/0xe0
[Mon Aug 18 20:22:33 2025]  ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025]  ? __task_pid_nr_ns+0xa0/0xb0
[Mon Aug 18 20:22:33 2025]  ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025]  ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025]  ? do_syscall_64+0xbc/0x2f0
[Mon Aug 18 20:22:33 2025]  ? __task_pid_nr_ns+0xa0/0xb0
[Mon Aug 18 20:22:33 2025]  ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025]  ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025]  ? do_syscall_64+0xbc/0x2f0
[Mon Aug 18 20:22:33 2025]  ? srso_alias_return_thunk+0x5/0xfbef5
[Mon Aug 18 20:22:33 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Mon Aug 18 20:22:33 2025] RIP: 0033:0x7fc65bb24637
[Mon Aug 18 20:22:33 2025] Code: 8b 05 c5 37 0e 00 64 c7 00 0d 00 00 00 eb b2 e8 7f 95 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 08 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 91 37 0e 00 f7 d8 6
4 89 02 48
[Mon Aug 18 20:22:33 2025] RSP: 002b:00007fffcbd45058 EFLAGS: 00000246 ORIG_RAX: 0000000000000008
[Mon Aug 18 20:22:33 2025] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc65bb24637
[Mon Aug 18 20:22:33 2025] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000003
[Mon Aug 18 20:22:33 2025] RBP: 00007fffcbd45100 R08: 0000000000000000 R09: 0000000000000000
[Mon Aug 18 20:22:33 2025] R10: 0000000000000180 R11: 0000000000000246 R12: 00000000000001e6
[Mon Aug 18 20:22:33 2025] R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000000000
[Mon Aug 18 20:22:33 2025]  </TASK>
```

I don't have the brain left today to try and understand it fully and patch it. Here's what I've learned if someone wants to get to it before I do.

The crash in question:

```c
static inline int
list_link_active(list_node_t *node)
{
	EQUIV(node->next == LIST_POISON1, node->prev == LIST_POISON2);
	return (node->next != LIST_POISON1);
}
```

So its just testing that both pointers are poisoned, or neither are.

Calling function:

```c
boolean_t
dnode_is_dirty(dnode_t *dn)
{
	mutex_enter(&dn->dn_mtx);

	for (int i = 0; i < TXG_SIZE; i++) {
		if (multilist_link_active(&dn->dn_dirty_link[i]) ||
		    !list_is_empty(&dn->dn_dirty_records[i])) {
			mutex_exit(&dn->dn_mtx);
			return (B_TRUE);
		}
	}

	mutex_exit(&dn->dn_mtx);

	return (B_FALSE);
}
```

`multilist_link_active()` is thin wrapping around `list_link_active()`:

```c
int
multilist_link_active(multilist_node_t *link)
{
	return (list_link_active(link));
}
```

In crash debugger:

```
sdb> find_task 1895973 | frame 6 dn | member dn_dirty_link
(multilist_node_t [4]){
        {
                .next = (struct list_head *)0xdead000000000100,
                .prev = (struct list_head *)0xdead000000000122,
        },
        {
                .next = (struct list_head *)0xdead000000000100,
                .prev = (struct list_head *)0xdead000000000122,
        },
        {
                .next = (struct list_head *)0xdead000000000100,
                .prev = (struct list_head *)0xdead000000000122,
        },
        {
                .next = (struct list_head *)0xdead000000000100,
                .prev = (struct list_head *)0xdead000000000122,
        },
}
```

So most likely, we caught this in a transition from "active" to "inactive", that is, being removed from the list. That is, `list_del()` in the kernel, which is `list_remove*()` for us:

```c
static inline void list_del(struct list_head *entry)
{
	__list_del_entry(entry);
	entry->next = LIST_POISON1;
	entry->prev = LIST_POISON2;
}
```

This is wrapped by `multilist_sublist_remove()` and variants:

```c
multilist_sublist_remove(multilist_sublist_t *mls, void *obj)
{
	ASSERT(MUTEX_HELD(&mls->mls_lock));
	list_remove(&mls->mls_list, obj);
}
```

`dn_dirty_link` is the list linkage node for `os_dirty_dnodes` and `os_synced_dnodes`:

```c
int
dmu_objset_open_impl(spa_t *spa, dsl_dataset_t *ds, blkptr_t *bp,
    objset_t **osp)
{
...
	for (i = 0; i < TXG_SIZE; i++) {
		multilist_create(&os->os_dirty_dnodes[i], sizeof (dnode_t),
		    offsetof(dnode_t, dn_dirty_link[i]),
		    dnode_multilist_index_func);
```

```c
void
dmu_objset_sync(objset_t *os, zio_t *pio, dmu_tx_t *tx)
{
...
		multilist_create(&os->os_synced_dnodes, sizeof (dnode_t),
		    offsetof(dnode_t, dn_dirty_link[txgoff]),
		    dnode_multilist_index_func);
```

There appears to be three calls to `multilist_sublist_remove()` for one of these lists, in:

- `dmu_objset_sync_dnodes()`
- `userquota_updates_task()`
- `dnode_rele_task()`

None hold `db_mtx` while removing the dnode form the list, which `dnode_is_dirty()` is using to protect access to the node, so there's nothing stopping it observing the removal directly, tripping the assert and crashing.

The "fixes" here seem to be to either take the sublist lock while checking dirtiness, or cache the dirty state on the dnode. @rrevans had some good thoughts about this in https://github.com/openzfs/zfs/pull/15615#issuecomment-1871381313; now is probably the time to dust that off and have a go at it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`list_link_active` panic in `dnode_is_dirty` #17652

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

list_link_active panic in dnode_is_dirty #17652

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`list_link_active` panic in `dnode_is_dirty` #17652