Skip to content

Commit e583f36

Browse files
Merge pull request #1805 from jasonrandrews/review
Refine content for render graph optimization learning path, including…
2 parents 7935c6c + 3a0373b commit e583f36

File tree

9 files changed

+63
-112
lines changed

9 files changed

+63
-112
lines changed

content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/_index.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,34 +7,34 @@ cascade:
77

88
minutes_to_complete: 30
99

10-
who_is_this_for: Application developers who wish to improve graphics performance.
10+
who_is_this_for: Mobile application developers who wish to improve graphics performance.
1111

1212
learning_objectives:
13-
- Understand Frame Advisor's Render Graph view
14-
- Use the Render Graph view to identify and resolve performance issues in your application
13+
- Understand Frame Advisor's Render Graph view.
14+
- Use the Render Graph view to identify and resolve performance issues in your application.
1515

1616
prerequisites:
17-
- An installed copy of Frame Advisor, part of Arm Performance Studio. [You can download Arm Performance Studio for free.](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio#Downloads)
18-
- A supported Android device, if you wish to analyze your own applications.
19-
- Some basic familiarity with Frame Advisor. To get started, read [“Frame Advisor”](../ams/fa) in _Get started with Arm Performance Studio for Mobile_.
17+
- Frame Advisor, part of Arm Performance Studio, installed. Refer to the [Arm Performance Studio](/install-guides/ams/) install guide.
18+
- If you wish to analyze your own applications you will need a supported Android device.
19+
- Some basic familiarity with Frame Advisor. Review the [Frame Advisor](/learning-paths/mobile-graphics-and-gaming/ams/fa/) section in [Get started with Arm Performance Studio for mobile](/learning-paths/mobile-graphics-and-gaming/ams/).
2020

2121
author: Mark Thurman
2222

2323
further_reading:
2424
- resource:
25-
title: Frame Advisor user guide
25+
title: Frame Advisor User Guide
2626
link: https://developer.arm.com/documentation/102693/latest/
2727
type: documentation
2828
- resource:
29-
title: Arm Performance Studio (main site)
29+
title: Arm Performance Studio
3030
link: https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio%20for%20Mobile
3131
type: website
3232
- resource:
33-
title: Learning path – Get started with Arm Performance Studio for mobile
33+
title: Get started with Arm Performance Studio for mobile
3434
link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/ams/fa
3535
type: website
3636
- resource:
37-
title: Learning path – Analyze a frame with Frame Advisor
37+
title: Analyze a frame with Frame Advisor
3838
link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/analyze_a_frame_with_frame_advisor
3939
type: website
4040
- resource:

content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/_review.md

Lines changed: 0 additions & 51 deletions
This file was deleted.

content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/generating-a-render-graph-for-your-application.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ layout: learningpathall
1010

1111
Your first step is to identify which parts of your application are limited by GPU performance.
1212

13-
[Arm Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer) is a good place to start. This is included as another part of Arm Performance Studio.
13+
[Streamline Performance Analyzer](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer) is a good place to start, it is included in Arm Performance Studio.
1414

1515
### Setting up Streamline
1616

@@ -23,36 +23,34 @@ If you have an Arm GPU, basic configuration is simple:
2323
If you have some other GPU, or want more control over the data collected, you'll need to select GPU counters manually:
2424
- Select the “Use advanced mode” checkbox.
2525
- Click the “Select counters” button to open the Counters window.
26-
- Inside the Counters window, select the counters you wish to analyze. (The Arm website has [details of counters available on Arm GPUs](https://developer.arm.com/documentation#numberOfResults=48&q=Performance%20Counters&sort=relevancy&f:@navigationhierarchiesproducts=[IP%20Products,Graphics%20and%20Multimedia%20Processors,Mali%20GPUs]).)
26+
- Inside the Counters window, select the counters you wish to analyze. [Reference Guides are available for Arm GPUs](https://developer.arm.com/documentation#numberOfResults=48&q=Performance%20Counters&sort=relevancy&f:@navigationhierarchiesproducts=[IP%20Products,Graphics%20and%20Multimedia%20Processors,Mali%20GPUs]).
2727
- Close the Counters window.
2828

29-
For more details, refer to the [Get Started with Streamline” tutorial](https://developer.arm.com/documentation/102477/0900/Overview), or [Starting a capture](https://developer.arm.com/documentation/101816/0905/Capture-a-Streamline-profile/Starting-a-capture) in the Arm Streamline user guide.
29+
For more details, refer to the [Get Started with Streamline](https://developer.arm.com/documentation/102477/0900/Overview) tutorial, or [Starting a capture](https://developer.arm.com/documentation/101816/0905/Capture-a-Streamline-profile/Starting-a-capture) in the Arm Streamline User Guide.
3030

3131
### Capturing GPU data in Streamline
3232

3333
Once you have chosen GPU counters, click the “Start capture” button to begin your capture.
3434

35-
Streamline will produce a graph showing the most GPU-heavy parts of your application (see [Timeline overview](https://developer.arm.com/documentation/101816/0905/Analyze-your-capture/Timeline-overview?lang=en) in the Arm Streamline user guide).
35+
Streamline will produce a graph showing the most GPU-heavy parts of your application. Refer to the [Timeline overview](https://developer.arm.com/documentation/101816/0905/Analyze-your-capture/Timeline-overview?lang=en) in the Arm Streamline User Guide.
3636

3737
## Capturing a render graph
3838

3939
Now that you have identified areas of your application that you want to optimize, you can turn from Streamline to Frame Advisor.
4040

41-
To ask Frame Advisor to capture data relating to the problem areas you have seen:
41+
Ask Frame Advisor to capture data relating to the problem areas you have observed:
4242

4343
- Click “Capture new trace” in Frame Advisor's launch screen
4444
- Connect to your application
4545
- In the Capture screen, select the number of frames you wish to capture
4646
- When you've reached the GPU-heavy part of the application run, click “Capture”, then “Analyze” to advance to the Analysis screen
4747

48-
For more details, refer to [the “Frame Advisor” section](../../ams/fa) of “Get started with Arm Performance Studio for mobile”.
49-
5048
## Viewing the render graph
5149

5250
Observe that part of the Frame Advisor window is labelled “Render Graph”. This contains the render graph relating to the frames you asked Frame Advisor to analyze.
5351

54-
For the purpose of this Learning Path, we will assume that you've captured the following render graph:
52+
Assume that you've captured the following render graph:
5553

56-
![An inefficient render graph in need of optimization#center](inefficient-render-graph.svg "Figure 2. An inefficient render graph in need of optimization")
54+
![An inefficient render graph in need of optimization#center](inefficient-render-graph.svg "An inefficient render graph in need of optimization")
5755

58-
In the next section, we will use this graph to illustrate some common application faults.
56+
In the next section, you will use this graph to understand some common application faults.

content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/inefficient-transfer-workloads.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ To find which API calls your application uses to start transfer workloads:
2828
- Now move to the API Calls view (labelled “API Calls”)
2929
- Observe the API calls in use.
3030

31-
## Problem: Inefficient clear routines
31+
## Problem: inefficient clear routines
3232

3333
`vkCmdClearColorImage()` is an inefficient way to clear an image.
3434

@@ -38,6 +38,6 @@ A more efficient way to clear an image attachment is to clear the attachment at
3838
- Attach the `VkAttachmentDescription` to a `VkRenderPassCreateInfo`
3939
- Supply this to API `vkCreateRenderPass()`
4040

41-
## Problem: Inefficient image resolutions
41+
## Problem: inefficient image resolutions
4242

43-
[In the previous section](../textures-with-excessive-resolution), we looked at an issue where unnecessarily large textures were inputs to a render pass. Similar problems can be seen in the context of transfer operations, which should operate over the smallest practicable area. To achieve this, change the values in the `VkBufferCopy` structure passed to `vkCmdCopyBuffer()`.
43+
In the previous section, you saw an issue where unnecessarily large textures were inputs to a render pass. Similar problems can be seen in the context of transfer operations, which should operate over the smallest practicable area. To achieve this, change the values in the `VkBufferCopy` structure passed to `vkCmdCopyBuffer()`.

content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/textures-with-excessive-resolution.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@ Some textures used in your application may be unnecessarily large.
1212

1313
Frame Advisor provides an easy way to detect this situation. It shows the resolution of the image being rendered in each execution node.
1414

15-
There is an example of this in the render graph we looked at in a previous section:
15+
There is an example of this in the render graph you looked at in a previous section:
1616

17-
![Textures with excessive resolution#center](excessive-resolution.png "Figure 1. Textures with excessive resolution")
17+
![Textures with excessive resolution#center](excessive-resolution.png "Textures with excessive resolution")
1818

1919
This graph shows three execution nodes, through which data flows from left to right. These are:
2020

@@ -26,5 +26,5 @@ The resolution given in the top left-hand corner of the execution nodes reduces
2626

2727
## Solution
2828

29-
Make the computation shown in the graph produce the smaller final texture from smaller input textures. In this example, try to reduce the resolution of the inputs to Render Pass 0 (`RP0`).
29+
Make the computation shown in the graph produce the smaller final texture from smaller input textures. In this example, reduce the resolution of the inputs to Render Pass 0 (`RP0`).
3030

content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/understanding-your-render-graph.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,15 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
So that we can use render graphs to find application faults, we must first understand what they are telling us.
10-
119
## Components of a render graph
1210

1311
Here's the graph from the previous section:
1412

15-
![An inefficient render graph in need of optimization#center](inefficient-render-graph.svg "Figure 1. An inefficient render graph in need of optimization")
13+
![An inefficient render graph in need of optimization#center](inefficient-render-graph.svg "An inefficient render graph in need of optimization")
1614

17-
Notice that a basic level, render graphs consist of _boxes_ and _arrows_:
15+
At a basic level, render graphs consist of _boxes_ and _arrows_:
1816

19-
* Boxes, formally called _nodes_, represent rendering operations and resources (in other words, processing operations and data). We formally refer to nodes for rendering operations as _execution nodes_.
17+
* Boxes, formally called _nodes_, represent rendering operations and resources. Nodes for rendering operations are formally called _execution nodes_.
2018
* Arrows, formally called _edges_, show the movement of data between rendering operations.
2119

2220
Render graphs describe rendering performed by the GPU while it is constructing a single frame. This rendering starts and ends with resources. As a result of graphs being read from left to right:
@@ -33,7 +31,7 @@ In a suboptimally-generated frame, there may be other outputs which are not sent
3331

3432
## Graph nodes in detail
3533

36-
Let's take a closer look at what can be represented in a node on the render graph.
34+
Take a closer look at what can be represented in a node on the render graph.
3735

3836
### Render passes
3937

@@ -46,7 +44,7 @@ Render passes transform sets of input images into sets of output images using a
4644
Within each execution node representing a render pass, Frame Advisor gives information such as:
4745

4846
- The resolution of the texture being rendered
49-
- A list of the attachments passed to the render pass and the name of each of these attachments.
47+
- A list of the attachments provided to the render pass and the name of each of these attachments.
5048
- The number of API draw calls within the render pass.
5149

5250
{{% notice Tip %}}
@@ -55,11 +53,11 @@ When you click an execution node, such as a render pass, Frame Advisor navigates
5553

5654
### Other types of execution node
5755

58-
The graph also shows a transfer node, colored blue. This is labelled ”Tr…”
56+
The graph also shows a transfer node, colored blue, and labelled ”Tr…”.
5957

60-
![A transfer node#center](transfer-node.png "Figure 3. A transfer node")
58+
![A transfer node#center](transfer-node.png "A transfer node")
6159

62-
Transfer nodes represent data movement between resource locations in memory. They are discussed in more depth in [a later problem-solving section](../inefficient-transfer-workloads).
60+
Transfer nodes represent data movement between resource locations in memory.
6361

6462
You may also see other types of execution node. For example, you may see compute nodes if your application uses compute shaders.
6563

@@ -69,7 +67,9 @@ Resource nodes show inputs and outputs of the execution nodes. They are shown as
6967

7068
There are different types of resource node:
7169

72-
- The swapchain: this represents the output of the computation. There is one swapchain on every graph. ![A swapchain node#center](swapchain-node.png "Figure 4. A swapchain node")
73-
- [Textures](https://www.khronos.org/opengl/wiki/Texture): In the graph, these are marked with a leading letter `T`. ![A texture node#center](texture-node.png "Figure 5. A node for texture 1")
74-
- [Render buffers](https://www.khronos.org/opengl/wiki/Renderbuffer_Object): Like textures, these represent images. The title of render buffer nodes begins with letters `RB`. The title ends with a code indicating the class of render buffer – for example, `.s` for stencil and `.d` for depth. ![A render buffer node#center](render-buffer-node.png "Figure 6. A node for render buffer 1, representing depth. Figure 1 also contains a render buffer node representing a stencil.")
70+
- The swapchain: this represents the output of the computation. There is one swapchain on every graph. ![A swapchain node#center](swapchain-node.png "A swapchain node")
71+
- [Textures](https://www.khronos.org/opengl/wiki/Texture): these are marked with a leading letter `T`. ![A texture node#center](texture-node.png "A node for texture 1")
72+
- [Render buffers](https://www.khronos.org/opengl/wiki/Renderbuffer_Object): Like textures, these represent images. The title of render buffer nodes begins with letters `RB`. The title ends with a code indicating the class of render buffer – for example, `.s` for stencil and `.d` for depth. ![A render buffer node#center](render-buffer-node.png "A node for render buffer 1, representing depth")
73+
74+
The original render graph also contains a render buffer node representing a stencil.
7575

content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/unused-resources.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@ This excerpt shows a render pass (`RP0`) which writes three output resources to
1616

1717
The texture is fed into another execution node, `Tr2`. This is a transfer node, number 2. From here, the data ultimately makes its way into the swapchain – and is therefore seen by the user.
1818

19-
The renderbuffer nodes `RB1.d` and `RB1.s` are different. They are not sent to another execution node. Instead, they are unused.
19+
The renderbuffer nodes `RB1.d` and `RB1.s` are different. They are not sent to another execution node, they are unused.
2020

21-
*This wastes both bandwidth and power.*
21+
This wastes both bandwidth and power.
2222

2323
## Solution
2424

content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/unwanted-execution-nodes.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,18 @@ layout: learningpathall
88

99
## Problem
1010

11-
Your application might contain execution nodes which are _entirely_ unnecessary. For example:
11+
Your application might contain execution nodes which are _entirely_ unnecessary.
1212

13-
![An unnecessary execution node (highlighted)#center](unused-execution-node.png "Figure 1. An unnecessary execution node (highlighted)")
13+
For example, look at the render graph below:
1414

15-
This is a more extreme version of the problem discussed in [the previous section](../unwanted-outputs). There, we looked at execution nodes which produced _some_ outputs which are unnecessary. Here, _all_ outputs are unnecessary.
15+
![An unnecessary execution node (highlighted)#center](unused-execution-node.png "An unnecessary execution node (highlighted)")
1616

17-
*It therefore follows that the computation producing the output is itself unnecessary.*
17+
This is a more extreme version of the problem discussed in the previous section. Previously, you saw execution nodes which produced _some_ outputs which are unnecessary. Here, _all_ outputs are unnecessary.
18+
19+
You can conclude that the computation producing the output is unnecessary.
1820

1921
## Solution
2022

21-
The solution is similar to that in the previous section. Remove any API calls which represent the unused computation.
23+
Remove any API calls which represent the unused computation.
2224

23-
*Be careful, however: your application may be using an apparently “unused” output of an execution node in a later frame.*
25+
Be careful, your application may be using an apparently “unused” output of an execution node in a later frame.

0 commit comments

Comments
 (0)