Merge pull request #1997 from pareenaverma/content_review

pareenaverma · web-flow · commit 5df4249497de · 2025-05-30T10:28:13.000-04:00
VME LP review
diff --git a/assets/contributors.csv b/assets/contributors.csv
@@ -89,3 +89,5 @@ Nina Drozd,Arm,NinaARM,ninadrozd,,
 Jun He,Arm,JunHe77,jun-he-91969822,,
 Gian Marco Iodice,Arm,,,,
 Aude Vuilliomenet,Arm,,,,
+Andrew Kilroy,Arm,,,,
+Peter Harris,Arm,,,,
diff --git a/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/_index.md b/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/_index.md
@@ -7,7 +7,7 @@ cascade:
 
 minutes_to_complete: 10
 
-who_is_this_for: Android graphics application developers
+who_is_this_for: This is an advanced topic for Android graphics application developers.
 
 learning_objectives:
     - Optimize vertex representations on Arm GPUs
@@ -22,8 +22,8 @@ author:
     - Peter Harris
 
 ### Tags
-skilllevels: Intermediate
-subjects: Performance
+skilllevels: Advanced
+subjects: Performance and Architecture
 armips:
     - Immortalis
     - Mali
@@ -39,7 +39,7 @@ further_reading:
         link:  https://developer.arm.com/documentation/101897/0304/Vertex-shading/Attribute-layout
         type:  documentation
     - resource:
-        title: Frame Advisor user guide
+        title: Frame Advisor User Guide
         link: https://developer.arm.com/documentation/102693/latest/
         type: documentation
     - resource:
diff --git a/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/vme-learning-path.md b/content/learning-paths/mobile-graphics-and-gaming/optimizing-vertex-efficiency/vme-learning-path.md
@@ -8,22 +8,22 @@ layout: learningpathall
 
 # Optimizing graphics vertex efficiency for Arm GPUs
 
-You're writing a graphics application targeting an Arm Immortalis
+You are writing a graphics application targeting an Arm Immortalis
 GPU, and not hitting your desired performance. When running the Arm
 Frame Advisor tool, you spot that the draw calls in your shadow map
 creation pass have poor Vertex Memory Efficiency (VME) scores. How
 should you go about improving this?
 
 ![Frame Advisor screenshot](fa-found-bad-vme-in-content-metrics.png)
 
-In this article, I will outline a common source of rendering
+In this Learning Path, you will learn about a common source of rendering
 inefficiency, how to spot the issue using Arm Frame Advisor, and how
 to rectify it.
 
 
 ## Shadow mapping
 
-In this scenario, draw calls in our shadow map render pass are the
+In this scenario, draw calls in the shadow map render pass are the
 source of our poor VME scores. Let's start by reviewing exactly what
 these draws are doing.
 
@@ -37,7 +37,7 @@ to the light are lit, and any part that is occluded must be in shadow.
 ## Mesh layout
 
 The primary input into shadow map creation is the object geometry for
-all of the objects that are shadow casters. In our scenario, let's
+all of the objects that cast shadows. In this scenario, let's
 assume that the vertex data for each object is stored in memory as an
 array structure, which is a commonly used layout in many applications:
 
@@ -63,11 +63,11 @@ This would give the mesh the following layout in memory:
 This looks like a standard way of passing mesh data into a GPU,
 so where is the inefficiency coming from?
 
-The vertex data we have defined contains all of the attributes that
-we need for our object, including those that are needed to compute
+The vertex data that is defined contains all of the attributes that
+you need for your object, including those that are needed to compute
 color in the main lighting pass. When generating the shadow map,
-we only actually need to compute the position of the object, so most
-of our vertex attributes will be unused by the shadow map generation
+you only need to compute the position of the object, so most
+of your vertex attributes will be unused by the shadow map generation
 draw calls.
 
 The inefficiency comes from how hardware gets the data it needs from
@@ -76,9 +76,9 @@ single values from DRAM, but instead fetch a small neighborhood of
 data, because this is the most efficient way to read from DRAM. For Arm
 GPUs, the hardware will read an entire 64 byte cache line at a time.
 
-In our example, an attempt to fetch a vertex position during shadow
+In this example, an attempt to fetch a vertex position during shadow
 map creation would also load the nearby color and normal values,
-even though we do not need them.
+even though you do not need them.
 
 
 ## Detecting a sub-optimal layout
@@ -96,17 +96,17 @@ A VME of less than one indicates that unnecessary data is being loaded
 from memory, wasting bandwidth on data that is not being used in the
 computation on the GPU.
 
-In our mesh layout we are only using 12 bytes for the `position`
-field, out of a total vertex size of 36 bytes, so our VME score would
+In this mesh layout you are only using 12 bytes for the `position`
+field, out of a total vertex size of 36 bytes, so your VME score would
 be only 0.33.
 
 
 ## Fixing a sub-optimal layout
 
-Shadow mapping only needs to load position, so to fix this issue we
+Shadow mapping only needs to load position, so to fix this issue you
 need to use a memory layout that allows position to be fetched in
 isolation from the other data. It is still preferable to leave the
-other attributes interleaved. This would look like this on the CPU:
+other attributes interleaved. On the CPU, this would look like the following:
 
 ``` C++
 struct VertexPart1 {
@@ -134,10 +134,10 @@ object will then read from both memory regions.
 The good news is that this technique is actually a useful one to apply
 all of the time, even for the main lighting pass! Many mobile GPUs,
 including Arm GPUs, process geometry in two passes. The first pass
-computes only the primitive position, and second pass will processes
+computes only the primitive position, and second pass will process
 the remainder of the vertex shader only for the primitives that are
 visible after primitive culling has been performed. By splitting
-the position attributes into a separate stream, we avoid wasting
+the position attributes into a separate stream, you avoid wasting
 memory bandwidth fetching non-position data for primitives that are
 ultimately discarded by primitive culling tests.