Skip to content

Conversation

@eddyb
Copy link
Member

@eddyb eddyb commented Dec 17, 2024

This unbreaks Wayland (I had been using WAYLAND_DISPLAY= cargo run ... for ages instead of investigating it, turns out to have been something very silly).

This is what the bug looked like:

wp_linux_drm_syncobj_manager_v1#63: error 0: surface already exists
thread 'main' panicked at /home/eddy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-22.1.0/src/device/global.rs:1930:25:
internal error: entered unreachable code: Fallback system failed to choose present mode. This is a bug. Mode: AutoVsync, Options: []
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I kept thinking maybe this is a Wayland protocol mismatch or something, but no, Mesa (the opensource GPU driver stack for Linux) has a bug:

We accidentally ended up with this broken scenario, on Wayland:

  • two wgpu::Surfaces for the same wl_surface (the Wayland window object)
    • (technically we had this issue on other platforms but they care less?)
  • both surfaces had .configure(...) called on them
    • AIUI, this is where vkCreateSwapchainKHR gets called
  • the second vkCreateSwapchainKHR fails to acquire an exclusive resource
    • i.e. wp_linux_drm_syncobj_manager_v1#63: error 0: surface already exists
    • however, due to that Mesa bug, this error isn't propagated to the caller
    • wgpu now thinks it has a valid swapchain for the second surface, too
  • the second Vulkan surface/swapchain is, however, partially broken
    • this makes various operations on that Vulkan surface/swapchain fail
    • in particular, wgpu fails to query various surface properties
    • somewhat indirectly, it finally panics failing to find a present mode

With RUST_LOG=wgpu_hal=error I was able to see these VK_ERROR_SURFACE_LOST_KHR
(errors which wgpu largely ignores, leading to 0 supported modes/formats):

[2024-12-16T03:25:00Z ERROR wgpu_hal::vulkan::adapter] get_physical_device_surface_present_modes: ERROR_SURFACE_LOST_KHR
[2024-12-16T03:25:00Z ERROR wgpu_hal::vulkan::adapter] get_physical_device_surface_formats: ERROR_SURFACE_LOST_KHR
[2024-12-16T03:25:00Z ERROR wgpu_hal::vulkan::adapter] get_physical_device_surface_present_modes: ERROR_SURFACE_LOST_KHR
[2024-12-16T03:25:00Z ERROR wgpu_hal::vulkan::adapter] get_physical_device_surface_formats: ERROR_SURFACE_LOST_KHR

(maybe we should run with at least the equivalent of RUST_LOG=error by default? I remember being frustrated that warn!/error! were silent, while working on rustc self-profiling code, which didn't necessarily need nice user-facing diagnostics, but also didn't have a good way to emit them anyway, from the separate measureme library)


While the Mesa bug being fixed wouldn't prevent the second wgpu::Surface from being created (via instance.create_surface(&window)), it could at least fail with a better error (e.g. VK_ERROR_NATIVE_WINDOW_IN_USE_KHR) when trying to create the swapchain, which would make the situation less confusing.

I've mentioned some of these interactions in this wgpu issue:

@eddyb eddyb force-pushed the push-zqulmxwskwvp branch from f84f69b to edd713e Compare December 18, 2024 12:08
@eddyb eddyb added this pull request to the merge queue Dec 18, 2024
Merged via the queue into Rust-GPU:main with commit f069c58 Dec 18, 2024
7 checks passed
@eddyb eddyb deleted the push-zqulmxwskwvp branch December 18, 2024 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants