Skip to content

Commit a2f417a

Browse files
Make BDA Alignment a dedicated chapter (#355)
1 parent b25a71e commit a2f417a

File tree

5 files changed

+200
-92
lines changed

5 files changed

+200
-92
lines changed

README.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,8 @@ The Vulkan Guide content is also viewable from https://docs.vulkan.org/guide/lat
110110

111111
* `VK_KHR_buffer_device_address`, `VK_EXT_buffer_device_address`
112112

113+
==== xref:{chapters}buffer_device_address_alignment.adoc[Buffer Device Address - Alignment]
114+
113115
== xref:{chapters}pipeline_cache.adoc[Pipeline Caching/Derivatives]
114116

115117
== xref:{chapters}threading.adoc[Threading]

antora/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
*** xref:{chapters}sparse_resources.adoc[]
4242
*** xref:{chapters}protected.adoc[]
4343
*** xref:{chapters}buffer_device_address.adoc[]
44+
**** xref:{chapters}buffer_device_address_alignment.adoc[]
4445
** xref:{chapters}pipeline_cache.adoc[]
4546
** xref:{chapters}threading.adoc[]
4647
** xref:{chapters}depth.adoc[]

chapters/buffer_device_address.adoc

Lines changed: 1 addition & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -71,98 +71,7 @@ Some device migth support `bufferDeviceAddress`, but not `shaderInt64`. The way
7171

7272
=== Alignment
7373

74-
All variables accessed with `PhysicalStorageBuffer` must have an `Aligned` memory operand to it.
75-
76-
[source,swift]
77-
----
78-
%x = OpLoad %type %ptr Aligned 16
79-
OpStore %ptr %obj Aligned 16
80-
----
81-
82-
Shading languages will have a default, but can allow you to align it explicitly (ex `buffer_reference_alignment`).
83-
84-
The goal of this alignment is this is a promise for how aligned this specific pointer is.
85-
The compiler has no idea what the address will be when the shader is compiled.
86-
By providing an alignment it can generate valid code to match the requirement.
87-
The user is responsible to confirm the address they use is aligned to it.
88-
89-
[source,glsl]
90-
----
91-
layout(buffer_reference, buffer_reference_align = 64) buffer MyBDA {
92-
uint data;
93-
};
94-
95-
MyBDA ptr_a; // at 0x1000
96-
MyBDA ptr_b; // at 0x1010
97-
MyBDA ptr_c; // at 0x1040
98-
99-
ptr_a.data = 0; // (Aligned 64) valid!
100-
ptr_b.data = 0; // (Aligned 64) invalid!
101-
ptr_c.data = 0; // (Aligned 64) valid!
102-
----
103-
104-
When deciding on an alignment, the minimum value will always be the size greater than or equal to the largest scalar/component type in the block.
105-
106-
[source,glsl]
107-
----
108-
// alignment must be at least 4
109-
layout(buffer_reference) buffer MyBDA {
110-
vec4 a; // scalar is float
111-
};
112-
113-
// alignment must be at least 1
114-
layout(buffer_reference) buffer MyBDA {
115-
uint8_t a; // scalar is 8-bit int
116-
};
117-
118-
// alignment must be at least 8
119-
layout(buffer_reference) buffer MyBDA {
120-
uint a; // 32-bit
121-
double b; // 64-bit
122-
};
123-
----
124-
125-
=== Alignment Example
126-
127-
To help explain alignment, lets take an example of loading an array of vectors
128-
129-
[source,glsl]
130-
----
131-
layout(buffer_reference, buffer_reference_align = ???) buffer MyBDA {
132-
uvec4 data[];
133-
};
134-
135-
MyBDA ptr; // at 0x1000
136-
ptr.data[i] = uvec4(0);
137-
----
138-
139-
Here we have 2 options, we could set the `Aligned` to be `4` or `16`.
140-
141-
If we set alignment to `16` we are letting the compiler know it can load 16 bytes at a time, so it will hopefully do a vector load/store on the memory.
142-
143-
If we set alignment to `4` the compiler will likely have no way to infer the real alignment and will now do 4 scalar int load/store on the memory.
144-
145-
[NOTE]
146-
====
147-
Some GPUs can do vector load/store even on unaligned addresses.
148-
====
149-
150-
For the next case, if we had `uvec3` instead of `uvec4` such as
151-
152-
[source,glsl]
153-
----
154-
layout(buffer_reference, buffer_reference_align = 4, scalar) buffer MyBDA {
155-
uvec3 data[];
156-
};
157-
158-
data[0]; // 0x1000
159-
data[1]; // 0x100C
160-
data[2]; // 0x1018
161-
data[3]; // 0x1024
162-
----
163-
164-
We know that setting the alignment to `16` would be violated at `data[1]` and therefore we need to use an alignment of `4` in this case.
165-
Luckily shading languages will help do this for you as seen in both link:https://godbolt.org/z/jWGKax1ed[glslang] and link:https://godbolt.org/z/Y7xW3Mfd4[slang] .
74+
See dedicated xref:{chapters}sparse_resources.adoc#sparse-resources[BDA Alignment chapter].
16675

16776
=== Nullptr
16877

Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
// Copyright 2024 The Khronos Group, Inc.
2+
// SPDX-License-Identifier: CC-BY-4.0
3+
4+
// Required for both single-page and combined guide xrefs to work
5+
ifndef::chapters[:chapters:]
6+
ifndef::images[:images: images/]
7+
8+
[[buffer-device-address-alignment]]
9+
= Buffer Device Address Alignment
10+
11+
All variables accessed with `PhysicalStorageBuffer` must have an `Aligned` memory operand to it.
12+
13+
[source,swift]
14+
----
15+
%x = OpLoad %type %ptr Aligned 16
16+
OpStore %ptr %obj Aligned 16
17+
----
18+
19+
Shading languages will have a default, but can allow you to align it explicitly (ex `buffer_reference_alignment`).
20+
21+
The goal of this alignment is this is a promise for how aligned this specific pointer is.
22+
The compiler has no idea what the address will be when the shader is compiled.
23+
By providing an alignment it can generate valid code to match the requirement.
24+
The user is responsible to confirm the address they use is aligned to it.
25+
26+
[source,glsl]
27+
----
28+
layout(buffer_reference, buffer_reference_align = 64) buffer MyBDA {
29+
uint data;
30+
};
31+
32+
MyBDA ptr_a; // at 0x1000
33+
MyBDA ptr_b; // at 0x1010
34+
MyBDA ptr_c; // at 0x1040
35+
36+
ptr_a.data = 0; // (Aligned 64) valid!
37+
ptr_b.data = 0; // (Aligned 64) invalid!
38+
ptr_c.data = 0; // (Aligned 64) valid!
39+
----
40+
41+
When deciding on an alignment, the minimum value will always be the size greater than or equal to the largest scalar/component type in the block.
42+
43+
[source,glsl]
44+
----
45+
// alignment must be at least 4
46+
layout(buffer_reference) buffer MyBDA {
47+
vec4 a; // scalar is float
48+
};
49+
50+
// alignment must be at least 1
51+
layout(buffer_reference) buffer MyBDA {
52+
uint8_t a; // scalar is 8-bit int
53+
};
54+
55+
// alignment must be at least 8
56+
layout(buffer_reference) buffer MyBDA {
57+
uint a; // 32-bit
58+
double b; // 64-bit
59+
};
60+
----
61+
62+
== Setting Alignment Example
63+
64+
To help explain alignment, lets take an example of loading an array of vectors
65+
66+
[source,glsl]
67+
----
68+
layout(buffer_reference, buffer_reference_align = ???) buffer MyBDA {
69+
uvec4 data[];
70+
};
71+
72+
MyBDA ptr; // at 0x1000
73+
ptr.data[i] = uvec4(0);
74+
----
75+
76+
Here we have 2 options, we could set the `Aligned` to be `4` or `16`.
77+
78+
If we set alignment to `16` we are letting the compiler know it can load 16 bytes at a time, so it will hopefully do a vector load/store on the memory.
79+
80+
If we set alignment to `4` the compiler will likely have no way to infer the real alignment and will now do 4 scalar int load/store on the memory.
81+
82+
[NOTE]
83+
====
84+
Some GPUs can do vector load/store even on unaligned addresses.
85+
====
86+
87+
For the next case, if we had `uvec3` instead of `uvec4` such as
88+
89+
[source,glsl]
90+
----
91+
layout(buffer_reference, buffer_reference_align = 4, scalar) buffer MyBDA {
92+
uvec3 data[];
93+
};
94+
95+
data[0]; // 0x1000
96+
data[1]; // 0x100C
97+
data[2]; // 0x1018
98+
data[3]; // 0x1024
99+
----
100+
101+
We know that setting the alignment to `16` would be violated at `data[1]` and therefore we need to use an alignment of `4` in this case.
102+
Luckily shading languages will help do this for you as seen in both link:https://godbolt.org/z/jWGKax1ed[glslang] and link:https://godbolt.org/z/Y7xW3Mfd4[slang].
103+
104+
== Matching Alignment From The Host
105+
106+
When dealing with buffer device address, you are able to do a simple `memcpy` to that memory on the host, which can easily lead to bugs if you aren't careful about things being aligned.
107+
108+
[NOTE]
109+
====
110+
The following issues are not directly tied to Buffer Device Address, and still can occur with any uniform or storage buffer.
111+
====
112+
113+
Take the following GLSL code as an example (link:https://godbolt.org/z/G4P8GdG9q[view online])
114+
115+
[source,glsl]
116+
----
117+
// ArrayStride is 16
118+
struct Metadata {
119+
uint64_t address;
120+
uint status;
121+
};
122+
123+
layout(buffer_reference, buffer_reference_align = 8, scalar) readonly buffer Payload {
124+
uint count; // offset 0
125+
Metadata meta[]; // offset 8
126+
};
127+
128+
layout(set = 0, binding = 0) buffer SSBO_0 {
129+
Payload data;
130+
};
131+
----
132+
133+
Because the `uint64_t` needs be accessed at an 8-byte alignment, `glslang` (and any other compiler) will be smart and pack things as tightly as possible for you.
134+
135+
The first thing you might notice is `Metadata` needs to have an array stride of 16 instead of 12. This is because otherwise `uint64_t address` will land on a non 8-byte alignment every other instance of the array.
136+
137+
The next thing happening is because `struct Metedata` **largest scalar** is an 8-byte value, it knows to have the offset at `8` instead of `4`. This is why trying to change the struct to
138+
139+
[source,glsl]
140+
----
141+
struct Metadata {
142+
uint status;
143+
uint64_t address;
144+
};
145+
----
146+
147+
or
148+
149+
[source,glsl]
150+
----
151+
struct Metadata {
152+
uint64_t address;
153+
uint status;
154+
uint pad;
155+
};
156+
----
157+
158+
won't change the offset from `8`.
159+
160+
Here is how the memory is laid out in memory:
161+
162+
image::{images}buffer_device_address_alignment_1.svg[buffer_device_address_alignment_1.svg]
163+
164+
So the issue here becomes when we try to map our host memory. When you call `vkMapMemory` and get a `void*` you need to cautious that memory needs to be laid out the same as the diagram above. One way to ensure this is use a struct on host as it will match the shader code.
165+
166+
[source,c++]
167+
----
168+
struct Metadata {
169+
uint64_t address;
170+
uint32_t status;
171+
};
172+
173+
struct Payload {
174+
uint32_t count;
175+
Metadata meta[2];
176+
} payload;
177+
178+
payload.count = 2;
179+
payload.meta[0].address = 0xDEADBEEF;
180+
payload.meta[0].status = 20;
181+
payload.meta[1].address = 0xDEADBEEF;
182+
payload.meta[1].status = 5;
183+
184+
void* data;
185+
vkMapMemory(device, device_memory, 0, VK_WHOLE_SIZE, 0, &data);
186+
187+
// You can also just memcpy here as well!
188+
Payload *payload_ptr = (Payload*)data;
189+
*payload_ptr = payload;
190+
----
191+
192+
If we examine the C++ code here (https://godbolt.org/z/Gq75qq1x6) we can see the assembly also automatically maps the offsets the same as the GLSL code above!

chapters/images/buffer_device_address_alignment_1.svg

Lines changed: 4 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)