Skip to content

Commit 5df4249

Browse files
authored
Merge pull request #1997 from pareenaverma/content_review
VME LP review
2 parents b51bac0 + d7e1957 commit 5df4249

File tree

3 files changed

+22
-20
lines changed

3 files changed

+22
-20
lines changed

assets/contributors.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,3 +89,5 @@ Nina Drozd,Arm,NinaARM,ninadrozd,,
8989
Jun He,Arm,JunHe77,jun-he-91969822,,
9090
Gian Marco Iodice,Arm,,,,
9191
Aude Vuilliomenet,Arm,,,,
92+
Andrew Kilroy,Arm,,,,
93+
Peter Harris,Arm,,,,

content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/_index.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ cascade:
77

88
minutes_to_complete: 10
99

10-
who_is_this_for: Android graphics application developers
10+
who_is_this_for: This is an advanced topic for Android graphics application developers.
1111

1212
learning_objectives:
1313
- Optimize vertex representations on Arm GPUs
@@ -22,8 +22,8 @@ author:
2222
- Peter Harris
2323

2424
### Tags
25-
skilllevels: Intermediate
26-
subjects: Performance
25+
skilllevels: Advanced
26+
subjects: Performance and Architecture
2727
armips:
2828
- Immortalis
2929
- Mali
@@ -39,7 +39,7 @@ further_reading:
3939
link: https://developer.arm.com/documentation/101897/0304/Vertex-shading/Attribute-layout
4040
type: documentation
4141
- resource:
42-
title: Frame Advisor user guide
42+
title: Frame Advisor User Guide
4343
link: https://developer.arm.com/documentation/102693/latest/
4444
type: documentation
4545
- resource:

content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/vme-learning-path.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,22 +8,22 @@ layout: learningpathall
88

99
# Optimizing graphics vertex efficiency for Arm GPUs
1010

11-
You're writing a graphics application targeting an Arm Immortalis
11+
You are writing a graphics application targeting an Arm Immortalis
1212
GPU, and not hitting your desired performance. When running the Arm
1313
Frame Advisor tool, you spot that the draw calls in your shadow map
1414
creation pass have poor Vertex Memory Efficiency (VME) scores. How
1515
should you go about improving this?
1616

1717
![Frame Advisor screenshot](fa-found-bad-vme-in-content-metrics.png)
1818

19-
In this article, I will outline a common source of rendering
19+
In this Learning Path, you will learn about a common source of rendering
2020
inefficiency, how to spot the issue using Arm Frame Advisor, and how
2121
to rectify it.
2222

2323

2424
## Shadow mapping
2525

26-
In this scenario, draw calls in our shadow map render pass are the
26+
In this scenario, draw calls in the shadow map render pass are the
2727
source of our poor VME scores. Let's start by reviewing exactly what
2828
these draws are doing.
2929

@@ -37,7 +37,7 @@ to the light are lit, and any part that is occluded must be in shadow.
3737
## Mesh layout
3838

3939
The primary input into shadow map creation is the object geometry for
40-
all of the objects that are shadow casters. In our scenario, let's
40+
all of the objects that cast shadows. In this scenario, let's
4141
assume that the vertex data for each object is stored in memory as an
4242
array structure, which is a commonly used layout in many applications:
4343

@@ -63,11 +63,11 @@ This would give the mesh the following layout in memory:
6363
This looks like a standard way of passing mesh data into a GPU,
6464
so where is the inefficiency coming from?
6565
66-
The vertex data we have defined contains all of the attributes that
67-
we need for our object, including those that are needed to compute
66+
The vertex data that is defined contains all of the attributes that
67+
you need for your object, including those that are needed to compute
6868
color in the main lighting pass. When generating the shadow map,
69-
we only actually need to compute the position of the object, so most
70-
of our vertex attributes will be unused by the shadow map generation
69+
you only need to compute the position of the object, so most
70+
of your vertex attributes will be unused by the shadow map generation
7171
draw calls.
7272
7373
The inefficiency comes from how hardware gets the data it needs from
@@ -76,9 +76,9 @@ single values from DRAM, but instead fetch a small neighborhood of
7676
data, because this is the most efficient way to read from DRAM. For Arm
7777
GPUs, the hardware will read an entire 64 byte cache line at a time.
7878
79-
In our example, an attempt to fetch a vertex position during shadow
79+
In this example, an attempt to fetch a vertex position during shadow
8080
map creation would also load the nearby color and normal values,
81-
even though we do not need them.
81+
even though you do not need them.
8282
8383
8484
## Detecting a sub-optimal layout
@@ -96,17 +96,17 @@ A VME of less than one indicates that unnecessary data is being loaded
9696
from memory, wasting bandwidth on data that is not being used in the
9797
computation on the GPU.
9898
99-
In our mesh layout we are only using 12 bytes for the `position`
100-
field, out of a total vertex size of 36 bytes, so our VME score would
99+
In this mesh layout you are only using 12 bytes for the `position`
100+
field, out of a total vertex size of 36 bytes, so your VME score would
101101
be only 0.33.
102102
103103
104104
## Fixing a sub-optimal layout
105105
106-
Shadow mapping only needs to load position, so to fix this issue we
106+
Shadow mapping only needs to load position, so to fix this issue you
107107
need to use a memory layout that allows position to be fetched in
108108
isolation from the other data. It is still preferable to leave the
109-
other attributes interleaved. This would look like this on the CPU:
109+
other attributes interleaved. On the CPU, this would look like the following:
110110
111111
``` C++
112112
struct VertexPart1 {
@@ -134,10 +134,10 @@ object will then read from both memory regions.
134134
The good news is that this technique is actually a useful one to apply
135135
all of the time, even for the main lighting pass! Many mobile GPUs,
136136
including Arm GPUs, process geometry in two passes. The first pass
137-
computes only the primitive position, and second pass will processes
137+
computes only the primitive position, and second pass will process
138138
the remainder of the vertex shader only for the primitives that are
139139
visible after primitive culling has been performed. By splitting
140-
the position attributes into a separate stream, we avoid wasting
140+
the position attributes into a separate stream, you avoid wasting
141141
memory bandwidth fetching non-position data for primitives that are
142142
ultimately discarded by primitive culling tests.
143143

0 commit comments

Comments
 (0)