Skip to content

Pwx 42981 main#361

Open
sreymondjohn wants to merge 9 commits intomainfrom
PWX-42981_main
Open

Pwx 42981 main#361
sreymondjohn wants to merge 9 commits intomainfrom
PWX-42981_main

Conversation

@sreymondjohn
Copy link
Collaborator

@sreymondjohn sreymondjohn commented May 8, 2025

Test notes

Available trace

ls -l /sys/kernel/debug/tracing/events/pxd/
total 0
drwxr-x--- 2 root root 0 May  8 18:41 copy_in_read_data_iovec
-rw-r----- 1 root root 0 May  8 18:41 enable
drwxr-x--- 2 root root 0 May  8 18:41 end_clone_bio
-rw-r----- 1 root root 0 May  8 18:41 filter
drwxr-x--- 2 root root 0 May  8 18:41 fp_discard_reply
drwxr-x--- 2 root root 0 May  8 18:41 fuse_notify_add_ext
drwxr-x--- 2 root root 0 May  8 18:41 fuse_notify_read_data_copy
drwxr-x--- 2 root root 0 May  8 18:41 fuse_notify_read_data_finalcopy
drwxr-x--- 2 root root 0 May  8 18:41 fuse_notify_read_data_request
drwxr-x--- 2 root root 0 May  8 18:41 fuse_notify_read_data_segment_info
drwxr-x--- 2 root root 0 May  8 18:41 pxd_close_ctrl_fd
drwxr-x--- 2 root root 0 May  8 18:41 pxd_export
drwxr-x--- 2 root root 0 May  8 18:41 pxd_fastpath_reset_device
drwxr-x--- 2 root root 0 May  8 18:41 pxd_get_fuse_req
drwxr-x--- 2 root root 0 May  8 18:41 pxd_get_fuse_req_result
drwxr-x--- 2 root root 0 May  8 18:41 pxd_initiate_failover
drwxr-x--- 2 root root 0 May  8 18:41 pxd_initiate_fallback
drwxr-x--- 2 root root 0 May  8 18:41 pxd_ioctl
drwxr-x--- 2 root root 0 May  8 18:41 pxd_ioc_update_size
drwxr-x--- 2 root root 0 May  8 18:41 pxd_ioswitch_complete
drwxr-x--- 2 root root 0 May  8 18:41 pxd_open
drwxr-x--- 2 root root 0 May  8 18:41 pxd_queue_rq
drwxr-x--- 2 root root 0 May  8 18:41 pxd_release
drwxr-x--- 2 root root 0 May  8 18:41 pxd_reply
drwxr-x--- 2 root root 0 May  8 18:41 pxd_request
drwxr-x--- 2 root root 0 May  8 18:41 pxd_request_complete
drwxr-x--- 2 root root 0 May  8 18:41 pxd_reroute_slowpath_transition
drwxr-x--- 2 root root 0 May  8 18:41 pxd_rq_fn

native path

native path - IO request and response

probe-bcache-909752 [002] 1385629.724625: pxd_queue_rq:         dev_id 838795747660972637 minor 1 dir 0 op 0 rq_offset 1073676288 size 4096 nr_phys_segments 1 flags 80700 bio 0xffff996519edad00 bio_tail 0xffff996519edad00 single_bio 1 bio_offset 1073676288
probe-bcache-909752 [002] 1385629.724628: pxd_request:          dev_id 838795747660972637 minor 1 unique 262144 off 1073676288 size 4096 req_op 0 req_flags 80700 pxd_op 8194 pxd_flags 0
px-storage-909170 [000] 1385629.733469: pxd_request_complete: dev_id 838795747660972637 minor 1 unique 262144 offset 1073676288 len 4096 op 0 flags 80700 status 0

native path - IO processing, iovec < 64

px-storage-909170 [007] 1385629.767994: copy_in_read_data_iovec: req_id 262273 prev_iovcnt 130 curr_iovcnt 66
px-storage-909170 [007] 1385629.767995: fuse_notify_read_data_request: devid 838795747660972637 req_id 262273 rq_offset 0 rq_size 532480 rdwr_offset 0 read_data_p_offset 0
px-storage-909170 [007] 1385629.767996: fuse_notify_read_data_segment_info: devid 838795747660972637 req_id 262273 bv_offset 0 bv_len 4096
px-storage-909170 [007] 1385629.767998: fuse_notify_read_data_copy: devid 838795747660972637 req_id 262273 copied 0 copy_this 4096 bv_offset 0 offset 0 bv_len 4096 len 4096 iter_count 1056

native path - IO processing, iovec > 64

px-storage-909170 [007] 1385629.768114: copy_in_read_data_iovec: req_id 262273 prev_iovcnt 66 curr_iovcnt 2
px-storage-909170 [007] 1385629.768116: fuse_notify_read_data_finalcopy: devid 838795747660972637 req_id 262273 len 4096 copied 4096 bv_offset 0 offset 4096 bv_len 4096

fastpath

fastpath setup

px-storage-911098 [012] 1386280.179795: fuse_notify_add_ext:  dev_id 899908062089615119 size 2147483648 queue_depth 128 discard_size 1048576 open_mode 40002 enable_fp 1 path_count 1

fastpath IO request and completion

probe-bcache-912110 [007] 1386280.182950: pxd_queue_rq:         dev_id 899908062089615119 minor 1 dir 0 op 0 rq_offset 10737352704 size 4096 nr_phys_segments 1 flags 80700 bio 0xffff996482cbb700 bio_tail 0xffff996482cbb700 single_bio 1 bio_offset 10737352704
pxfpn0c3-907865 [003] 1386280.183702: end_clone_bio:        dev_id 899908062089615119 minor 1 bio_op 0 bio_offset 10737356800 bio_size 0 rq_op 0 rq_offset 10737352704 rq_size 4096 status 0 bio 0xffff996482cbb700 biotail 0xffff996482cbb700

fallback and failover

px-storage-909156 [011] 1386960.703606: pxd_initiate_failover: dev_id 855961816184237243 minor 1 reason 1
px-storage-909156 [011] 1386960.751059: pxd_request:          dev_id 855961816184237243 minor 1 unique 524372 off 0 size 0 req_op 4294967295 req_flags ffffffff pxd_op 8208 pxd_flags 0
px-storage-914457 [000] 1386960.813009: fuse_notify_add_ext:  dev_id 855961816184237243 size 2147483648 queue_depth 128 discard_size 1048576 open_mode 40002 enable_fp 1 path_count 0
px-storage-909157 [006] 1386960.813476: pxd_ioswitch_complete: dev_id 855961816184237243 minor 1 opcode 8208
px-storage-909156 [008] 1386982.080058: pxd_initiate_fallback: dev_id 855961816184237243 minor 1
px-storage-909156 [008] 1386982.102290: pxd_request:          dev_id 855961816184237243 minor 1 unique 524726 off 0 size 0 req_op 4294967295 req_flags ffffffff pxd_op 8209 pxd_flags 0
px-storage-914605 [011] 1386982.122808: fuse_notify_add_ext:  dev_id 855961816184237243 size 2147483648 queue_depth 128 discard_size 1048576 open_mode 40002 enable_fp 1 path_count 1
px-storage-909156 [008] 1386982.123058: pxd_ioswitch_complete: dev_id 855961816184237243 minor 1 opcode 8209

fastpath failure of IOs => failover, captures the IO failing in fastpath and queued in native path

In pxd_initiate_failover, reason = 0 => IO_FAILURE triggered a failover

pxfpn0c0-920575 [000] 1387903.832848: end_clone_bio:        dev_id 861218944645956711 minor 1 bio_op 1 bio_offset 33136640 bio_size 0 rq_op 1 rq_offset 33132544 rq_size 4096 status -67 bio 0xffff9966b3718718 biotail 0xffff9966b3718718
pxfpn0c3-920578 [003] 1387903.832865: pxd_initiate_failover: dev_id 861218944645956711 minor 1 reason 0
....
px-storage-921520 [008] 1387903.929122: fuse_notify_add_ext:  dev_id 861218944645956711 size 1073741824 queue_depth 128 discard_size 1048576 open_mode 40002 enable_fp 1 path_count 0
px-storage-921918 [002] 1387903.929342: pxd_ioswitch_complete: dev_id 861218944645956711 minor 1 opcode 8208

px-storage-921918 [002] 1387903.930703: pxd_reroute_slowpath_transition: dev_id 861218944645956711 minor 1 transition 1 dir 1 op 1 offset 33132544 size 4096 nr_phys_segments 1 flags 8801

fastpath node down => failover

In pxd_initiate_failover, reason = 0 => userspace request (for eg NodeDown) triggered a failover

px-storage-921919 [007] 1388819.017559: pxd_initiate_failover: dev_id 24825835109670207 minor 1 reason 1
px-storage-925822 [012] 1388819.076087: fuse_notify_add_ext:  dev_id 24825835109670207 size 1073741824 queue_depth 128 discard_size 1048576 open_mode 40002 enable_fp 1 path_count 0
px-storage-921917 [011] 1388819.076825: pxd_ioswitch_complete: dev_id 24825835109670207 minor 1 opcode 8208

...once the nodes comes back up, eventually fallback is triggered
px-storage-921919 [012] 1388920.940351: pxd_initiate_fallback: dev_id 24825835109670207 minor 1
px-storage-923395 [008] 1388922.140304: fuse_notify_add_ext:  dev_id 24825835109670207 size 1073741824 queue_depth 128 discard_size 1048576 open_mode 40002 enable_fp 1 path_count 1
px-storage-921918 [013] 1388922.140571: pxd_ioswitch_complete: dev_id 24825835109670207 minor 1 opcode 8209

What this PR does / why we need it:
Adds traces to track the native path and fastpath.

Which issue(s) this PR fixes (optional)
Closes # PWX-42981

Special notes for your reviewer:

Sebas added 9 commits May 8, 2025 17:56
* tracks requests into px-fuse (added op to track)
* tracks multiple states during data copy from px-fuse to px-storage
  for write request
* tracks fastpath start and end
* tracks rerouting IO to native path
* tracks request complete in native path
* tracks failover/fallback start and end
* for failover, since it can happen because of IO failure
  and also from userspace, specify the reason as well
* to track PXD_ADD_EXT, PXD_REMOVE, PXD_EXPORT_DEV and
fastpath_reset_device
* Adds trace for pxd_request in fuse_request_send_nowait since
it has access to the unique id
* Add trace for tracking the block resize
* on updating the block device's size as part of resize, make sure
to update the pxd_dev->size as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant