Skip to content

Releases: HansKristian-Work/vkd3d-proton

Version 3.0b (bugfix)

10 Dec 09:49

Choose a tag to compare

Another tiny bugfix release:

  • Fix silly regression in synchronization when VK_KHR_unified_image_layouts is not supported.
  • Update shader workaround hash for Wuthering Waves

Version 3.0a (bugfix)

18 Nov 16:58

Choose a tag to compare

Tiny bugfix release that addresses a silly performance regression in the new unified image layout path.

Version 3.0

17 Nov 15:54

Choose a tag to compare

A new major release, yay!
A few milestones have been reached over the last year, warranting a new major bump.
It's been quite a while since the last release due to new things coming up constantly.
These tags are mostly arbitrary anyway, and tend to be done when islands of calm and stability emerge.

Major items

DXBC shader backend rewrite

@doitsujin rewrote the entire DXBC backend, replacing our legacy vkd3d-shader path.
DXVK and vkd3d-proton now share the same DXBC frontend which gives us clean,
"readable" (as readable as DXBC can be) and lean IR to work with.
dxil-spirv standalone project now supports DXBC as well as a result.

Lots of games which used to be completely broken before due to bugs and missing features
in the legacy vkd3d-shader backend are now fixed. E.g. Red Dead Redemption 2 runs just fine now in D3D12 mode.
Some recently released DXBC based games also only work on the new path.
The amount of regressions found the last months in DXBC games has been very minor,
but it's possible there are still bugs in this area.
However, given that DXVK uses it now as well, it's been battle tested quite extensively already.

FSR4 support

We added support for AGS WMMA intrinsics through VK_KHR_cooperative_matrix and VK_KHR_shader_float8,
which is enough to support FSR4.
Note that these shaders are tightly coded for AMD GPUs with some implementation defined behavior
(particularly around matrix layouts), and they will not necessarily work on other GPU vendors.

There is also a quite hacky emulation path of this which relies on int8 and float16 cooperative matrix support,
which can run on older GPUs at significant performance cost (and some cost to theoretical correctness).

Note that the default "official" build of vkd3d-proton only exposes this feature when the native
VK_KHR_shader_float8 is properly supported, i.e. RDNA4+ only.
The emulation path is available when building from source with the appropriate build flags.
The decision to not include this emulation path by default is over my pay grade.
The aim is to be able to ship FSR4 in a more proper way in Proton.

Features

We've more or less caught up on the things we can feasibly implement,
so there isn't much exciting stuff happening on the feature front.

  • Implemented experimental support for D3D12 work graphs. No real-world content ships this yet.
    This implementation is far from complete,
    but it works on "any" GPU since we emulate the feature with normal compute shaders.
    Funnily enough, the performance of this emulation can massively outperform native driver implementations of the feature
    in many scenarios we've tested (at the cost of some extra VRAM usage).
    See docs/ for more details on implementation and some performance numbers.
  • Expose AdvancedTextureOpsSupported by default from SM 6.7 if VK_KHR_maintenance8 is supported.
  • Expose the recently added sparse TIER_4.
  • Bump exposed D3D12SDKVersion to latest 618.
  • Experimentally expose support for opacity micromaps.
    There are some details which aren't quite compatible with the D3D12 API, but some basic demo content is working fine.
  • Add support for AMD_anti_lag when exposed. The current implementation does not take frame-gen into account.
  • Implement support for tight alignment from recent AgilitySDK.
  • Add support for shared resource path on upstream Wine.

Performance

  • Overhaul the texture copy batching situation.
    The new batching logic should be able to improve performance in many more cases than before.
    • Implemented support for VK_KHR_unified_image_layouts.
      Image copy batching in particular can take advantage of this to avoid a lot of unnecessary barriers.
  • Removed manual clear workaround on newer (6.15.9+) kernels on AMD, where an old kernel regression was finally fixed.
    Kernels older than 6.10 are also not affected by this workaround.
  • Use push descriptor path on Qualcomm GPUs over BDA for speed.
  • Improve handling of GDeflate when decompression extension is not available.
    We now ship our own fallback shader in GLSL instead of the more awkward HLSL shader that dstorage ships.
  • Bump DGC scratch size on NVIDIA. Should avoid some massive perf drops in Halo Infinite on NVIDIA.
  • Add performance optimization for The Last of Us Part 1 to prefer 2D tiling on 3D images.
    Requires an update to Mesa as well to get the proper effect.
  • Handle depth/stencil <-> color image copies better when VK_KHR_maintenance8 is supported.
  • Make use of VK_EXT_zero_initialize_device_memory to avoid manual clears on allocation.

Fixes

  • Emit render pass barriers as expected on tiled GPUs. Fixes misc rendering bugs reported on e.g. Turnip.
    • For performance reasons, we deliberately skirt the spec a bit on desktop GPUs.
  • Fixed a bunch of minor correctness problems exposed by new Vulkan-ValidationLayers.
  • Adjust how PointSamplingAddressesNeverRoundUp is reported to match recent driver behaviors.
  • Fix overflow bugs in massive (> 4GiB) sparse resource handling.
  • Fix reporting of some esoteric format properties to better match native drivers.
  • Fix handling of NULL acceleration structure descriptors.
  • Fix some texturing bugs in Helldivers II on NVIDIA.
  • Fix some bugs with memory type handling on very old NVIDIA GPUs.
  • Fix bug when pixel shader includes root signature.
  • Make ClearUAV barrier insertion the default now.
    Too many games screw this up, and D3D12 drivers seem to do it by default.
  • Fix shared fences when initial value is not 0. Fixes some Star Citizen issues.
  • Fix rare deadlock scenario in Ninja Gaiden 4.
    Fixes some long-standing issues with how we deal with fence rewinds.
  • Fix some long-standing issues with how we deal with placed MSAA resources and alignment.
  • Make sure we don't clear memory of imported resources.
    This doesn't fix any known games, but you never know :V
  • Improve correctness for many odd GS/HS/DS corner cases with primitive types and API validation.
  • Fixes crashes when index buffer SizeInBytes = 0, but VA was invalid.
    Seen in some Saber Interactive games.
  • Fixes some potential deadlocks in VR interop APIs when multiple threads attempt to acquire Vulkan queue.
  • Fixes 16-bit aligned structured buffer strides. Not observed in any real content, but you never know!

Workarounds

  • Add FF VII rebirth sync bugs workarounds. Fixes some rare GPU hangs.
  • Add misc AMD workarounds for Monster Hunter Wilds caused by bugged hardware around sparse SMEM.
    • A proper hardware workaround in RADV is still pending.
  • Workaround some Starfield bugs around NonUniformResourceIndex use.
  • Add performance workarounds for extremely large tessellation factors used in misc new Koei Tecmo games.
  • Add Wreckfest 2 workarounds for illegal texture placement aliasing. Fixes some broken textures.
  • Add barrier in Satisfactory that game missed. Fixes some corrupt rendering especially on AMD.
  • Ignore NOT_CLEARED flags on allocation in all games now. Native drivers seem to always clear regardless of the flag,
    and e.g. Street Fighter 6 relies on NOT_CLEARED memory to actually be cleared :(
  • Workaround some issues with RGB9E5 and alpha write masks observed in Ninja Gaiden 4.
  • Add missing barrier in Death Stranding (the older build, not Director's Cut).
  • Add missing barrier in Wuthering Waves.
  • Workaround bugged uninitialized loop variable in Dune MMO.
  • Disable UAV compression in Spider-Man Remastered. Fixes some weird RT issues on RDNA2.
  • Add Root CBV robustness workaround for Gray Zone Warfare.
  • Disables color compression in Rise of the Tomb Raider. Fixes some glitches due to game bug on AMD.
  • Workaround some bugs in Port Royal benchmark.
  • Workaround Mafia: Definitive Edition hanging GPU when using FSR on startup due to use-after-free.
    • The workaround applies to all uses of FSR. Plausibly workaround a hang in MGS: Delta as well, but not confirmed it was this bug.
  • Workaround Control RT path occasionally observing NaNs due to bad normalize() patterns.
  • Workaround Final Fantasy Tactics Ivalice Chronicles illegally using dynamically indexed root constants.

Misc

  • Added a lot more debug instrumentation as usual.
    • Not user facing, so omitting details.
  • Make it a bit easier to use vkd3d-proton in Linux-native projects.
  • Remove DXVK_FRAME_RATE to align with DXVK's removal. Only VKD3D_FRAME_RATE remains (at least for now).

Version 2.14.1

10 Jan 13:55

Choose a tag to compare

This is a bug-fix release which resolves some regressions introduced in 2.14.

  • Fix a crash on start-up which affected GPUs without sparse support. E.g. Intel iGPU or Turnip.
    Crash could happen even if that GPU was the secondary GPU on the system.
  • Fix a memory allocation issue affecting NVK.
  • Fix a CPU performance regression issue affecting Horizon Zero Dawn Remastered on NVIDIA GPUs.
    This fix might improve CPU performance in other games too, but unverified.
  • Not a regression fix, but add a no_upload_hvv workaround for Arma Reforger to workaround weird asset loading behavior.

Version 2.14

13 Dec 13:57

Choose a tag to compare

Rolls up the usual collection of new features, performance improvements, bug fixes and the copious amount of game workarounds,
just in time for the holidays.

Features

  • Implement DXGI frame statistics (exposed by DXVK DXGI).
  • Implement a global frame rate limiter (see VKD3D_FRAME_RATE or DXVK_FRAME_RATE).
    Also improves behavior of presentation with swap interval > 1 since we use frame limiter instead
    of duplicated presents now. Also allows support for full-screen frame rate targets in DXGI which normally would imply a mode change.
  • Implement support for planar video formats such as NV12.
  • Implement D24 depth bias correctly now on AMD when VK_EXT_depth_bias_control is supported.
  • Expose a new command interop interface that allows e.g. dxvk-nvapi to implement DLSS3 frame generation.
  • Use VK_KHR_compute_shader_derivatives when available.
  • Use VK_EXT_device_generated_commands when available. Expose execute indirect tier 1.1.
  • Implement GPU upload heap from latest AgilitySDKs. Allows explicit control over ReBAR instead of heuristic based hacks in games that use the new API.
  • Implement ID3DDestructionNotifier. Fixes some particular games that expect this to be supported.

Performance

  • Reduce some VRAM bloat on RDNA2 and 3 GPUs when VK_MESA_image_alignment_control is exposed.
  • Improve CPU overhead for games that query swapchain format support over and over.
  • Remove old heuristic that preferred 2 frames of latency depending on BufferCount used.
    The default on DXGI is 3, and using 2 caused some performance issues in various games with GPU starvation,
    especially on Deck. VKD3D_SWAPCHAIN_LATENCY_FRAMES is still available as an override to force a tighter default.
  • Rewrite queue submission logic to deal better with difficult submission patterns such as FSR3 3.1 Frame Generation.
    On implementations with only one graphics queue, vkd3d-proton will now attempt to do basic software scheduling of GPU work.
    This may regress GPU performance in some other cases and VKD3D_CONFIG=no_staggered_submit is a way to disable this code path.
    One particularly big improvement is FF XVI on RADV with FSR 3 frame-gen, with almost doubled performance in some cases.
    We are still awaiting a proper kernel-level fix for this problem to be fully resolved.
  • Rewrite queue submission logic to use fewer "dummy" wait/signal submissions.
    Works around pathological CPU overhead in amdgpu taking 20ms+ to submit work in some cases.
  • Rewrite queue submission logic for sparse updates to be more efficient.

Fixes and workarounds

  • Rework various multi-sampling queries to be more spec correct.
  • Workaround bugged MSAA behavior in World of Warcraft.
  • Workaround buggy/questionable use of ID3D12PipelineLibrary in FF XVI.
  • Always use native 16-bit integers for min16int. Fixes some real-world bugs where shaders expect min16int is always implemented as 16-bit.
  • Workaround game bug leading to GPU hang in Dragon Age: Veilguard on RADV.
  • Always emit proper floating-point environment modes in DXBC shaders. Fixes glitched eyes in Dragon Age: Veilguard on NV.
  • Fix potential use-after-free bug for some sparse resource update cases.
  • Correctly validate when application attempts to allocate a too large descriptor heap.
    Fixes Stalker 2 entering into undefined behavior.
  • A lot of misc fixes in dxil-spirv as usual.
  • Workaround broken amdgpu zerovram behavior on 6.10+ kernels. Fixes random extreme glitchiness in Helldivers 2 on AMD.
  • Workaround NV issue which lead to GPU hang when loading a save file in Star Wars: Outlaws.
  • Fix copying between BC <-> RGBA images in some cases.
  • Add workaround for a game bug in The First Descendant which lead to broken cubemap reflections in some cases.
  • Workaround Skull & Bones crashing on startup on NV GPUs by disabling Reflex support.
  • Workaround Hunt: Showdown missing precise qualifiers on vertex shaders, leading to glitched rendering.
  • Workaround poor CPU performance in Red Dead Redemption.

Misc / Debug

  • Add support for instruction_qa_checks. For deep debug, allows us to be notified when NaNs and Infs are generated in shaders.
    For internal QA use.
  • Add fine-grained control of QA behavior on a per-shader basis. For narrowing down issues.
  • Remove a bunch of old and obsolete workarounds for NV drivers. New cutoff is 535 series.
  • Bump exposed SDKVersion to 614 to match latest stable AgilitySDK.
  • Add an optional code path to support DXBC via the official dxilconv library.
    This code is not enabled in release builds,
    and is currently only intended as a path to take advantage of QA instrumentation for DXBC shaders.

Version 2.13

21 Jun 15:03

Choose a tag to compare

Features

  • Implement Shader Model 6.8 min-spec
    • SV_StartInstanceLocation
    • SV_StartVertexLocation
    • WaveSize range
    • Implement Vulkan texturing catch-up features (esoteric comparison sampling functions)
  • Implement interop for OpenVR / OpenXR on Proton
  • Correctly support NULL index buffers with VK_KHR_maintenance6.
  • Implement VK_MESA_image_alignment_control. Reduces memory bloat on AMD cards in particular.

Fixes

  • Reimplement VK_NV_low_latency2 to fix some issues with heavy stuttering caused by non-monotonic frame IDs.
    Relies on a more recent dxvk-nvapi which can paper over API design issues in Reflex API.
    Requires a more recent NVIDIA driver which fixes some bugs exposed in this new code.
    On older NVIDIA drivers, it should run, but low-latency will not kick in as expected.
  • Explicitly disable variable-rate shading when depth-stencil is written in shader.
    Fixes glitched hair rendering in Hellblade 2.
  • Correctly expose MSAA features for depth-stencil. Fixes Arma Reforger.
  • Fix bugs in MSAA resolve implementation when dealing with custom resolve formats. Fixes Arma Reforger.
  • Fix validation error in internal query resolve shader.
  • Fix some bugs in wave-ops where helper lanes participated where they were not supposed to.
    Fixes some WaveMatch / WaveMultiPrefix use-cases in the wild.
  • Various dxil-spirv fixes to fix invalid control-flow as always.

Performance

  • Tweak how we opt-in to ReBAR for UPLOAD heaps. Now, only > 8 GB cards will get it.
    On 8 GB cards, we were regularly hitting the upper limits of what the GPU could hold in VRAM,
    and using ReBAR would be detrimental to performance since there was risk of more important
    memory being demoted to system memory. Works well together with VK_MESA_image_alignment_control
    to free up significant amounts of VRAM. Performance gains from ReBAR on 8 GB were also found to be minimal
    compared to the larger GPUs since we quickly exhausted the limited 512 MiB budget anyway.
  • Sub-allocate small image heaps. Avoids heavy stutter in Ghost of Tsushima on desktop.
    (Steam Deck code path does not seem to use small heaps to begin with).
  • Improve performance with ROV when used with more complicated shader code patterns.

Workarounds

  • Implement a crude workaround for depth-stencil sparse and MSAA sparse.
    • Just allocates a committed resource instead. Not correct, but good enough band-aid.
    • Allows SottR to run on RADV.
  • Disable NV_dgcc on Halo Infinite on NV drivers.
  • Workaround a missing barrier in AC: Mirage causing random corrupt geometry.

Misc

  • Split vkd3d-proton shader cache up by .exe name when using a unified directory with VKD3D_SHADER_CACHE_PATH.
  • Implement VK_EXT_device_address_binding_report.

Version 2.12

15 Mar 16:57

Choose a tag to compare

Features

  • Implement support for NVIDIA Reflex through VK_NV_low_latency2. Thanks to NVIDIA for contributing implementation
  • Implement D3D12 render pass API (tier 0)
  • Implement ID3D12DeviceRemovedExtendedDataSettings stubs. Fixes some games that rely on this existing
  • Implement VK_EXT_device_fault. Makes it possible to grab fault information and vendor binary if supported
  • Implement VK_EXT_swapchain_maintenance1
    • Allows seamless transition between V-Sync and tearing present modes without stutter
    • Implemented on both Mesa and NV drivers
  • Expose Shader Model 6.7 by default if
    VK_KHR_shader_maximal_reconvergence and VK_KHR_shader_quad_control are supported
  • Add optimized descriptor copy path on Intel Arc GPUs that support VK_EXT_descriptor_buffer
  • Implement fallback for compute shader derivatives on NVIDIA Pascal and older GPUs.
    Allows exposing Shader Model 6.7 by default on Pascal as well (albeit with some known cases where it does not work).
    The workaround is expected to work with any known use of SM 6.6 compute derivatives in the wild

Fixes

  • Fix Atlas Fallen black screen due to edge case with MinLODClamp
  • Correctly disable alpha-to-coverage if sampler mask is exported
  • Fix format feature reports for DXGI_FORMAT_UNKNOWN
  • Relax root signature compatibility rules when compiling Ray Tracing pipelines.
    Fixes GPU hang on NV in Warhammer: Darktide
  • Fix GPU hang on NV in UE5 Lyra demo
  • Explicitly validate stage IO signatures in PSO creation similar to native D3D12 runtime.
    Fixes some scenarios where a game attempts to create an invalid pipeline that should have failed creation
    on native D3D12

Workarounds

  • Workaround crash in Resident Evil 4 RT mode when tessellation is enabled
  • Workaround mesh shader glitches on NVIDIA in several UE5 titles
  • Workaround GPU hang on NVIDIA in World of Warcraft when MSAA is enabled
  • Disable RT by default in Persona 3 Reload on Deck

Performance

  • Implement VK_NV_raw_access_chains. Significantly improves GPU performance on NV GPUs in some games.
    Games using DXBC instead of DXIL are expected to see more improvements.
    Not all games are expected to see an uplift
  • Fix extremely poor GPU performance in some locations in Persona 3 Reload

Debug

  • Add support for VKD3D_QUEUE_PROFILE, a simple system profiling method
    • Includes VK_NV_low_latency2 support to debug NVIDIA Reflex sleeps
  • Root signature blobs are also dumped when dumping shaders
    • A simple CLI tool to inspect the root-sig blobs is included in programs/
  • Misc improvements to breadcrumbs, debug ring, etc
  • Pipeline creation failure now dumps PSO creation commands in log

Version 2.11.1

15 Dec 15:51

Choose a tag to compare

This release is a minor bug-fix release before the holidays.

  • Implement COLOR -> STENCIL fallback copy on NVIDIA
  • Implement SM 6.6 ResourceDescriptorHeap[] + UAV counters correctly on RADV
  • Fix bugged implementation of DXBC resinfo instruction, affecting Avatar: Frontiers of Pandora
  • Fix memory type used for DGC preprocess memory on NVIDIA (~5% performance, YMMV)
  • Fix crash in Callisto Protocol when booting game with DXR support

More complete MSAA resolve implementation

  • Add depth-stencil resolve
  • Support typeless formats
  • Add MIN/MAX resolve modes
  • Implement missing code paths on NVIDIA

Workarounds

  • Update workaround for GPU hang in CP77 when using DXR for patch 2.1.
  • Remove workaround for NO_DGCC in Halo Infinite on NVIDIA.
  • Workaround game bug in Pioneers of Pagonia causing GPU hangs on RADV.

Version 2.11

24 Nov 16:18

Choose a tag to compare

This release rolls up a bunch of features, perf improvements and bug fixes / workarounds as usual.

Features

DXR enabled by default

VKD3D_CONFIG=dxr is default now, and no longer needed.
There are some special cases where DXR is not enabled by default. The only such current example is
"Hellblade: Senua's Sacrifice" on Deck which force-enables DXR if it is supported, even on Deck.
New semantics are:

  • dxr: Force-enable DXR, even when it is considered unsafe
  • nodxr: Disable DXR
  • dxr11: Removed. dxr already implied DXR 1.1 anyway

Sampler feedback

This feature was the last feature required for FL 12.2 and is implemented through emulation.
As demonstrated in the implementation docs, all
native implementations of this feature are fundamentally broken in some way.
There's also no known game that ships requiring this feature, so we just consider this a checkbox feature.

DX Ultimate (FL 12.2) now exposed by default

On RDNA2+ and Turing+ we can finally expose the DX Ultimate feature set!

Misc

  • Implement a bunch of missing "Vulkan-on-D3D12" features
    • IndependentFrontAndBackStencilRefMaskSupported
    • TriangleFanSupported
    • DynamicIndexBufferStripCutSupported
    • DynamicDepthBiasSupported
    • NonNormalizedCoordinateSamplersSupported
    • MismatchingOutputDimensionsSupported
    • PointSamplingAddressesNeverRoundUp
    • RasterizerDesc2Supported
      • Explicit line rasterization mode
    • NarrowQuadrilateralLinesSupported
    • AnisoFilterWithPointMipSupported
  • Implement missing MSAD instruction in DXIL, allowing FSR3 to run
  • Implement some esoteric DXR features
    • Implement support for multiple mismatching global root signatures in DXR
      • Fixes crash in Battlefield V
    • Implement support for LOCAL_ON_EXTERNAL dependencies in DXR
      • Fixes DXR in Warhammer: Darktide
  • Implement support for ExecuteIndirect + Mesh shaders with state changes
    • Currently unused by games

Performance

  • Improve performance of NV_device_generated_commands and NV_device_generated_commands_compute by
    reordering and batching command preprocessing
    • We have observed 15% FPS gains in Halo Infinite on RADV
    • 1-2% in Starfield in some test locations
    • Needs pending Mesa work to land to take advantage of this improvement
  • Tune memory allocation patterns for DGC preprocess buffers
    • Avoids a lot of allocation churn
    • Greatly reduces CPU overhead on NV

Workarounds

  • Work around RADV bug causing GPU hang in RE4: Separate Ways DLC
  • Work around RADV bug causing GPU hang in Lords of the Fallen
  • Work around Witcher 3 bug causing broken shadows and GPU hangs when enabling DXR
  • Work around Cyberpunk 2077 bug when RT is enabled, where game would cause spurious GPU hangs due to accessing descriptor heap out of bounds
  • Work around Windjammers 2 bug causing random crashes on startup
  • Add support for VK_EXT_image_compression_control to allow for more fine-grained workarounds for broken games running on RADV
  • Enable NV_device_generated_commands_compute on latest NV beta drivers
    • 545.x drivers are still disabled until a fix can be confirmed on shipping drivers
  • Remove CURB_MEMORY_PSO_CACHE workaround on Mesa 23.2+
    • Should reduce overhead in PSO creation

Fixes

  • Misc dxil-spirv changes to fix various bugs in game shaders as usual
  • Fix Jurassic World Evolution 2 crashing when enabling DXR
  • Fix some deprecation warnings in Meson build system
    • Some submodule locations moved, which may cause minor disruption

Version 2.10

11 Sep 13:53

Choose a tag to compare

This release rolls up a ton of bug fixes, game and driver workarounds, and other improvements.

Features

DirectStorage MetaCommands

We can now make use of NV_memory_decompression to implement
GPU accelerated GDeflate compression in DirectStorage.
This is demonstrated to work in Ratchet & Clank: Rift Apart.

We also worked around an NV driver bug when using the fallback GDeflate shader.
The fallback works on RADV.

Enhanced Barriers

NOTE: This isn't all that well tested because there are no games shipping with this yet to our knowledge.

Device generated commands for compute

With NV_device_generated_commands_compute we can efficiently implement
Starfield's use of ExecuteIndirect which hammers multi-dispatch COMPUTE + root parameter changes.
Previously, we would rely on a very slow workaround.

NOTE: This feature is currently only enabled on RADV due to driver issues.

Misc

  • Support Root Signature version 1.2
  • Implement Shader Model 6.7
    • Includes all SM 6.7 features like AdvancedTextureOps, WaveOpsIncludeHelperLanes
    • Caveat: Technically not Vulkan spec compliant implementation, but works fine on at least NV and RADV. Currently implemented as an opt-in option for now in case some game relies on it to work
  • Implement CreateSampler2
  • Expose inverted viewport / height feature
  • Implement RelaxedFormatCasting feature from Enhanced Barriers
  • Implement support for adjacency topologies
  • Support A8_UNORM format properly by using VK_KHR_maintenance5, allowing A8_UNORM UAVs to work correctly
  • Handle range checked index buffers correctly with VK_KHR_maintenance5

New extension use

  • VK_EXT_dynamic_rendering_unused_attachments
  • VK_KHR_maintenance5
  • VK_NV_device_generated_commands_compute

Performance

  • Batch acceleration structure builds. Massively improves build performance on at least RADV.
  • Massively improve ExecuteIndirect performance when using COMPUTE + root parameter changes when VK_NV_device_generated_commands_compute is enabled.

Fixes

  • Fix root signature creation from DXIL library target (DXR) blobs
  • Fix some dual source blending PSOs scenarios. Fixes Star Wars Battlefront II
  • Implement wave operations in pixel shaders more strictly according to D3D12 rules
  • Fix spurious hangs in Ashes of Singularity when using shared fences and wait-before-signal
  • Fix PSO caching bug in mesh shaders. Fixes mesh shaders in Unreal Engine 5
  • Fix udiv remainder in DXBC, which fixed some Xenia bugs
  • Fix query heap tracking bug that was exposed by NV Streamline
  • Various DXIL -> SPIR-V fixes as usual
  • Rewrote descriptor set layouts to be more robust against application bugs
    • Motivated by Armored Core VI bug (see below)
    • Native D3D12 drivers are also robust against these application bugs :(

Workarounds

  • Workaround bad ReBAR performance in Age of Wonders 4
  • Remove workaround for KHR_present_wait on NV 535+ drivers
  • Workaround Starfield memory corruption issue where it does not correctly query for 4 KiB alignment
  • Disable ReBAR usage on Halo Infinite to workaround very poor CPU performance
  • Workaround Street Fighter 6 bug causing spurious GPU hangs
    • Also appears to have worked around GPU hangs in Resident Evil 2
  • Workaround Armored Core VI bug causing GPU hang on Balteus fight in chapter 1
  • Workaround "firefly" glitches in Resident Evil 4 caused by dubious min16float usage
  • Workaround "firefly" glitches in Monster Hunter Rise caused by dubious shader requiring particular precise math
  • Workaround Unreal Engine 5 breaking if mesh shaders are exposed, but not barycentrics
  • Workaround NV driver bug with TIMESTAMP query heaps that could cause spurious GPU hangs
  • Workaround broken CFG code generation in Xenia's DXBC emitter