@@ -8,22 +8,22 @@ layout: learningpathall
88
99# Optimizing graphics vertex efficiency for Arm GPUs
1010
11- You're writing a graphics application targeting an Arm Immortalis
11+ You are writing a graphics application targeting an Arm Immortalis
1212GPU, and not hitting your desired performance. When running the Arm
1313Frame Advisor tool, you spot that the draw calls in your shadow map
1414creation pass have poor Vertex Memory Efficiency (VME) scores. How
1515should you go about improving this?
1616
1717![ Frame Advisor screenshot] ( fa-found-bad-vme-in-content-metrics.png )
1818
19- In this article, I will outline a common source of rendering
19+ In this Learning Path, you will learn about a common source of rendering
2020inefficiency, how to spot the issue using Arm Frame Advisor, and how
2121to rectify it.
2222
2323
2424## Shadow mapping
2525
26- In this scenario, draw calls in our shadow map render pass are the
26+ In this scenario, draw calls in the shadow map render pass are the
2727source of our poor VME scores. Let's start by reviewing exactly what
2828these draws are doing.
2929
@@ -37,7 +37,7 @@ to the light are lit, and any part that is occluded must be in shadow.
3737## Mesh layout
3838
3939The primary input into shadow map creation is the object geometry for
40- all of the objects that are shadow casters . In our scenario, let's
40+ all of the objects that cast shadows . In this scenario, let's
4141assume that the vertex data for each object is stored in memory as an
4242array structure, which is a commonly used layout in many applications:
4343
@@ -63,11 +63,11 @@ This would give the mesh the following layout in memory:
6363This looks like a standard way of passing mesh data into a GPU,
6464so where is the inefficiency coming from?
6565
66- The vertex data we have defined contains all of the attributes that
67- we need for our object, including those that are needed to compute
66+ The vertex data that is defined contains all of the attributes that
67+ you need for your object, including those that are needed to compute
6868color in the main lighting pass. When generating the shadow map,
69- we only actually need to compute the position of the object, so most
70- of our vertex attributes will be unused by the shadow map generation
69+ you only need to compute the position of the object, so most
70+ of your vertex attributes will be unused by the shadow map generation
7171draw calls.
7272
7373The inefficiency comes from how hardware gets the data it needs from
@@ -76,9 +76,9 @@ single values from DRAM, but instead fetch a small neighborhood of
7676data, because this is the most efficient way to read from DRAM. For Arm
7777GPUs, the hardware will read an entire 64 byte cache line at a time.
7878
79- In our example, an attempt to fetch a vertex position during shadow
79+ In this example, an attempt to fetch a vertex position during shadow
8080map creation would also load the nearby color and normal values,
81- even though we do not need them.
81+ even though you do not need them.
8282
8383
8484## Detecting a sub-optimal layout
@@ -96,17 +96,17 @@ A VME of less than one indicates that unnecessary data is being loaded
9696from memory, wasting bandwidth on data that is not being used in the
9797computation on the GPU.
9898
99- In our mesh layout we are only using 12 bytes for the `position`
100- field, out of a total vertex size of 36 bytes, so our VME score would
99+ In this mesh layout you are only using 12 bytes for the `position`
100+ field, out of a total vertex size of 36 bytes, so your VME score would
101101be only 0.33.
102102
103103
104104## Fixing a sub-optimal layout
105105
106- Shadow mapping only needs to load position, so to fix this issue we
106+ Shadow mapping only needs to load position, so to fix this issue you
107107need to use a memory layout that allows position to be fetched in
108108isolation from the other data. It is still preferable to leave the
109- other attributes interleaved. This would look like this on the CPU :
109+ other attributes interleaved. On the CPU, this would look like the following :
110110
111111``` C++
112112struct VertexPart1 {
@@ -134,10 +134,10 @@ object will then read from both memory regions.
134134The good news is that this technique is actually a useful one to apply
135135all of the time, even for the main lighting pass! Many mobile GPUs,
136136including Arm GPUs, process geometry in two passes. The first pass
137- computes only the primitive position, and second pass will processes
137+ computes only the primitive position, and second pass will process
138138the remainder of the vertex shader only for the primitives that are
139139visible after primitive culling has been performed. By splitting
140- the position attributes into a separate stream, we avoid wasting
140+ the position attributes into a separate stream, you avoid wasting
141141memory bandwidth fetching non-position data for primitives that are
142142ultimately discarded by primitive culling tests.
143143
0 commit comments