Skip to content

Hangs writing to ZFS with fixed buffers #113

@talex5

Description

@talex5

When run on a ZFS partition (Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-102-generic x86_64)), this program spins forever and cannot be killed:

let rec wait_with_retry uring =
  match Uring.wait uring with
  | None -> wait_with_retry uring        (* Interrupted *)
  | Some { result; data } -> result, data

let () =
  let uring = Uring.create ~queue_depth:2 () in
  let buf = Cstruct.of_string "ab" in
  Uring.set_fixed_buffer uring buf.buffer |> Result.get_ok;
  let fd = Unix.openfile "test.data" [O_CREAT; O_TRUNC; O_RDWR] 0o600 in
  for i = 0 to 1 do
    let job = Uring.write_fixed uring fd ~file_offset:(Optint.Int63.of_int i) ~off:i ~len:1 () in
    assert (Option.is_some job);
    let x = Uring.submit uring in
    assert (x = 1);
    let result, () = wait_with_retry uring in
    assert (result = 1);
  done

Based on original report by @patricoferris at ocaml-multicore/eio#715 (comment).

pidstat -t 1 shows:

10:40:58      UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
10:40:59     1000      1027         -    0.00   99.02    0.00    0.00   99.02     0  main.exe
10:40:59     1000         -      1048    0.00   98.04    0.00    0.98   98.04     0  |__iou-wrk-1027

perf record -g shows:

   - zpl_iter_write                                                                                                  ▒
      - 98.63% zfs_write                                                                                             ▒
         + 27.21% dmu_tx_assign                                                                                      ▒
         + 26.45% dmu_tx_commit                                                                                      ▒
         + 19.34% dmu_write_uio_dbuf                                                                                 ▒
         + 11.55% dmu_tx_hold_write_by_dnode                                                                         ▒
         + 5.20% dmu_tx_create                                                                                       ▒
         + 3.83% dmu_tx_hold_sa                                                                                      ▒
           0.82% zfs_clear_setid_bits_if_necessary   

The process cannot be killed, even with kill -9.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions