You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/_index.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,34 +7,34 @@ cascade:
7
7
8
8
minutes_to_complete: 30
9
9
10
-
who_is_this_for: Application developers who wish to improve graphics performance.
10
+
who_is_this_for: Mobile application developers who wish to improve graphics performance.
11
11
12
12
learning_objectives:
13
-
- Understand Frame Advisor's Render Graph view
14
-
- Use the Render Graph view to identify and resolve performance issues in your application
13
+
- Understand Frame Advisor's Render Graph view.
14
+
- Use the Render Graph view to identify and resolve performance issues in your application.
15
15
16
16
prerequisites:
17
-
- An installed copy of Frame Advisor, part of Arm Performance Studio. [You can download Arm Performance Studio for free.](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Studio#Downloads)
18
-
- A supported Android device, if you wish to analyze your own applications.
19
-
- Some basic familiarity with Frame Advisor. To get started, read [“Frame Advisor”](../ams/fa) in _Get started with Arm Performance Studio for Mobile_.
17
+
- Frame Advisor, part of Arm Performance Studio, installed. Refer to the [Arm Performance Studio](/install-guides/ams/) install guide.
18
+
- If you wish to analyze your own applications you will need a supported Android device.
19
+
- Some basic familiarity with Frame Advisor. Review the [Frame Advisor](/learning-paths/mobile-graphics-and-gaming/ams/fa/) section in [Get started with Arm Performance Studio for mobile](/learning-paths/mobile-graphics-and-gaming/ams/).
Copy file name to clipboardExpand all lines: content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/generating-a-render-graph-for-your-application.md
+8-10Lines changed: 8 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ layout: learningpathall
10
10
11
11
Your first step is to identify which parts of your application are limited by GPU performance.
12
12
13
-
[Arm Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer) is a good place to start. This is included as another part of Arm Performance Studio.
13
+
[Streamline Performance Analyzer](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer) is a good place to start, it is included in Arm Performance Studio.
14
14
15
15
### Setting up Streamline
16
16
@@ -23,36 +23,34 @@ If you have an Arm GPU, basic configuration is simple:
23
23
If you have some other GPU, or want more control over the data collected, you'll need to select GPU counters manually:
24
24
- Select the “Use advanced mode” checkbox.
25
25
- Click the “Select counters” button to open the Counters window.
26
-
- Inside the Counters window, select the counters you wish to analyze. (The Arm website has [details of counters available on Arm GPUs](https://developer.arm.com/documentation#numberOfResults=48&q=Performance%20Counters&sort=relevancy&f:@navigationhierarchiesproducts=[IP%20Products,Graphics%20and%20Multimedia%20Processors,Mali%20GPUs]).)
26
+
- Inside the Counters window, select the counters you wish to analyze. [Reference Guides are available for Arm GPUs](https://developer.arm.com/documentation#numberOfResults=48&q=Performance%20Counters&sort=relevancy&f:@navigationhierarchiesproducts=[IP%20Products,Graphics%20and%20Multimedia%20Processors,Mali%20GPUs]).
27
27
- Close the Counters window.
28
28
29
-
For more details, refer to the [“Get Started with Streamline” tutorial](https://developer.arm.com/documentation/102477/0900/Overview), or [“Starting a capture”](https://developer.arm.com/documentation/101816/0905/Capture-a-Streamline-profile/Starting-a-capture) in the Arm Streamline user guide.
29
+
For more details, refer to the [Get Started with Streamline](https://developer.arm.com/documentation/102477/0900/Overview) tutorial, or [Starting a capture](https://developer.arm.com/documentation/101816/0905/Capture-a-Streamline-profile/Starting-a-capture) in the Arm Streamline User Guide.
30
30
31
31
### Capturing GPU data in Streamline
32
32
33
33
Once you have chosen GPU counters, click the “Start capture” button to begin your capture.
34
34
35
-
Streamline will produce a graph showing the most GPU-heavy parts of your application (see [“Timeline overview”](https://developer.arm.com/documentation/101816/0905/Analyze-your-capture/Timeline-overview?lang=en) in the Arm Streamline user guide).
35
+
Streamline will produce a graph showing the most GPU-heavy parts of your application. Refer to the [Timeline overview](https://developer.arm.com/documentation/101816/0905/Analyze-your-capture/Timeline-overview?lang=en) in the Arm Streamline User Guide.
36
36
37
37
## Capturing a render graph
38
38
39
39
Now that you have identified areas of your application that you want to optimize, you can turn from Streamline to Frame Advisor.
40
40
41
-
To ask Frame Advisor to capture data relating to the problem areas you have seen:
41
+
Ask Frame Advisor to capture data relating to the problem areas you have observed:
42
42
43
43
- Click “Capture new trace” in Frame Advisor's launch screen
44
44
- Connect to your application
45
45
- In the Capture screen, select the number of frames you wish to capture
46
46
- When you've reached the GPU-heavy part of the application run, click “Capture”, then “Analyze” to advance to the Analysis screen
47
47
48
-
For more details, refer to [the “Frame Advisor” section](../../ams/fa) of “Get started with Arm Performance Studio for mobile”.
49
-
50
48
## Viewing the render graph
51
49
52
50
Observe that part of the Frame Advisor window is labelled “Render Graph”. This contains the render graph relating to the frames you asked Frame Advisor to analyze.
53
51
54
-
For the purpose of this Learning Path, we will assume that you've captured the following render graph:
52
+
Assume that you've captured the following render graph:
55
53
56
-

54
+

57
55
58
-
In the next section, we will use this graph to illustrate some common application faults.
56
+
In the next section, you will use this graph to understand some common application faults.
Copy file name to clipboardExpand all lines: content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/inefficient-transfer-workloads.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ To find which API calls your application uses to start transfer workloads:
28
28
- Now move to the API Calls view (labelled “API Calls”)
29
29
- Observe the API calls in use.
30
30
31
-
## Problem: Inefficient clear routines
31
+
## Problem: inefficient clear routines
32
32
33
33
`vkCmdClearColorImage()` is an inefficient way to clear an image.
34
34
@@ -38,6 +38,6 @@ A more efficient way to clear an image attachment is to clear the attachment at
38
38
- Attach the `VkAttachmentDescription` to a `VkRenderPassCreateInfo`
39
39
- Supply this to API `vkCreateRenderPass()`
40
40
41
-
## Problem: Inefficient image resolutions
41
+
## Problem: inefficient image resolutions
42
42
43
-
[In the previous section](../textures-with-excessive-resolution), we looked at an issue where unnecessarily large textures were inputs to a render pass. Similar problems can be seen in the context of transfer operations, which should operate over the smallest practicable area. To achieve this, change the values in the `VkBufferCopy` structure passed to `vkCmdCopyBuffer()`.
43
+
In the previous section, you saw an issue where unnecessarily large textures were inputs to a render pass. Similar problems can be seen in the context of transfer operations, which should operate over the smallest practicable area. To achieve this, change the values in the `VkBufferCopy` structure passed to `vkCmdCopyBuffer()`.
Copy file name to clipboardExpand all lines: content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/textures-with-excessive-resolution.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,9 +12,9 @@ Some textures used in your application may be unnecessarily large.
12
12
13
13
Frame Advisor provides an easy way to detect this situation. It shows the resolution of the image being rendered in each execution node.
14
14
15
-
There is an example of this in the render graph we looked at in a previous section:
15
+
There is an example of this in the render graph you looked at in a previous section:
16
16
17
-

17
+

18
18
19
19
This graph shows three execution nodes, through which data flows from left to right. These are:
20
20
@@ -26,5 +26,5 @@ The resolution given in the top left-hand corner of the execution nodes reduces
26
26
27
27
## Solution
28
28
29
-
Make the computation shown in the graph produce the smaller final texture from smaller input textures. In this example, try to reduce the resolution of the inputs to Render Pass 0 (`RP0`).
29
+
Make the computation shown in the graph produce the smaller final texture from smaller input textures. In this example, reduce the resolution of the inputs to Render Pass 0 (`RP0`).
Copy file name to clipboardExpand all lines: content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/understanding-your-render-graph.md
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,17 +6,15 @@ weight: 4
6
6
layout: learningpathall
7
7
---
8
8
9
-
So that we can use render graphs to find application faults, we must first understand what they are telling us.
10
-
11
9
## Components of a render graph
12
10
13
11
Here's the graph from the previous section:
14
12
15
-

13
+

16
14
17
-
Notice that a basic level, render graphs consist of _boxes_ and _arrows_:
15
+
At a basic level, render graphs consist of _boxes_ and _arrows_:
18
16
19
-
* Boxes, formally called _nodes_, represent rendering operations and resources (in other words, processing operations and data). We formally refer to nodes for rendering operations as_execution nodes_.
17
+
* Boxes, formally called _nodes_, represent rendering operations and resources. Nodes for rendering operations are formally called _execution nodes_.
20
18
* Arrows, formally called _edges_, show the movement of data between rendering operations.
21
19
22
20
Render graphs describe rendering performed by the GPU while it is constructing a single frame. This rendering starts and ends with resources. As a result of graphs being read from left to right:
@@ -33,7 +31,7 @@ In a suboptimally-generated frame, there may be other outputs which are not sent
33
31
34
32
## Graph nodes in detail
35
33
36
-
Let's take a closer look at what can be represented in a node on the render graph.
34
+
Take a closer look at what can be represented in a node on the render graph.
37
35
38
36
### Render passes
39
37
@@ -46,7 +44,7 @@ Render passes transform sets of input images into sets of output images using a
46
44
Within each execution node representing a render pass, Frame Advisor gives information such as:
47
45
48
46
- The resolution of the texture being rendered
49
-
- A list of the attachments passed to the render pass and the name of each of these attachments.
47
+
- A list of the attachments provided to the render pass and the name of each of these attachments.
50
48
- The number of API draw calls within the render pass.
51
49
52
50
{{% notice Tip %}}
@@ -55,11 +53,11 @@ When you click an execution node, such as a render pass, Frame Advisor navigates
55
53
56
54
### Other types of execution node
57
55
58
-
The graph also shows a transfer node, colored blue. This is labelled ”Tr…”
56
+
The graph also shows a transfer node, colored blue, and labelled ”Tr…”.
59
57
60
-

58
+

61
59
62
-
Transfer nodes represent data movement between resource locations in memory. They are discussed in more depth in [a later problem-solving section](../inefficient-transfer-workloads).
60
+
Transfer nodes represent data movement between resource locations in memory.
63
61
64
62
You may also see other types of execution node. For example, you may see compute nodes if your application uses compute shaders.
65
63
@@ -69,7 +67,9 @@ Resource nodes show inputs and outputs of the execution nodes. They are shown as
69
67
70
68
There are different types of resource node:
71
69
72
-
- The swapchain: this represents the output of the computation. There is one swapchain on every graph. 
73
-
-[Textures](https://www.khronos.org/opengl/wiki/Texture): In the graph, these are marked with a leading letter `T`. 
74
-
-[Render buffers](https://www.khronos.org/opengl/wiki/Renderbuffer_Object): Like textures, these represent images. The title of render buffer nodes begins with letters `RB`. The title ends with a code indicating the class of render buffer – for example, `.s` for stencil and `.d` for depth. 
70
+
- The swapchain: this represents the output of the computation. There is one swapchain on every graph. 
71
+
-[Textures](https://www.khronos.org/opengl/wiki/Texture): these are marked with a leading letter `T`. 
72
+
-[Render buffers](https://www.khronos.org/opengl/wiki/Renderbuffer_Object): Like textures, these represent images. The title of render buffer nodes begins with letters `RB`. The title ends with a code indicating the class of render buffer – for example, `.s` for stencil and `.d` for depth. 
73
+
74
+
The original render graph also contains a render buffer node representing a stencil.
Copy file name to clipboardExpand all lines: content/learning-paths/mobile-graphics-and-gaming/render-graph-optimization/unused-resources.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,9 +16,9 @@ This excerpt shows a render pass (`RP0`) which writes three output resources to
16
16
17
17
The texture is fed into another execution node, `Tr2`. This is a transfer node, number 2. From here, the data ultimately makes its way into the swapchain – and is therefore seen by the user.
18
18
19
-
The renderbuffer nodes `RB1.d` and `RB1.s` are different. They are not sent to another execution node. Instead, they are unused.
19
+
The renderbuffer nodes `RB1.d` and `RB1.s` are different. They are not sent to another execution node, they are unused.
This is a more extreme version of the problem discussed in [the previous section](../unwanted-outputs). There, we looked at execution nodes which produced _some_ outputs which are unnecessary. Here, _all_ outputs are unnecessary.
*It therefore follows that the computation producing the output is itself unnecessary.*
17
+
This is a more extreme version of the problem discussed in the previous section. Previously, you saw execution nodes which produced _some_ outputs which are unnecessary. Here, _all_ outputs are unnecessary.
18
+
19
+
You can conclude that the computation producing the output is unnecessary.
18
20
19
21
## Solution
20
22
21
-
The solution is similar to that in the previous section. Remove any API calls which represent the unused computation.
23
+
Remove any API calls which represent the unused computation.
22
24
23
-
*Be careful, however: your application may be using an apparently “unused” output of an execution node in a later frame.*
25
+
Be careful, your application may be using an apparently “unused” output of an execution node in a later frame.
0 commit comments