Skip to content

Commit 0f26abe

Browse files
committed
HDInsight LLAP cluster sizing guide
1 parent 2d6dc4b commit 0f26abe

File tree

1 file changed

+71
-51
lines changed

1 file changed

+71
-51
lines changed

articles/hdinsight/interactive-query/interactive-query-troubleshoot-llap-sizing-guide.md

Lines changed: 71 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
---
32
title: HDInsight Interactive Query Cluster(LLAP) sizing guide
43
description: LLAP sizing guide
@@ -12,7 +11,8 @@ ms.date: 05/05/2020
1211

1312
# Azure HDInsight Interactive Query Cluster (Hive LLAP) sizing guide
1413

15-
This document describes the sizing of the HDInsight Interactive Query Cluster (Hive LLAP cluster) for a typical workload to achieve reasonable performance. Please note that the recommendations provided in this document are generic guidelines and specific workloads may need specific tuning.
14+
This document describes the sizing of the HDInsight Interactive Query Cluster (Hive LLAP cluster) for a typical workload to achieve reasonable performance. Please note that the recommendations provided in this document are generic guidelines and specific workloads may need 
15+
specific tuning.
1616

1717
### **Azure Default VM Types for HDInsight Interactive Query Cluster(LLAP)**
1818

@@ -30,37 +30,38 @@ This document describes the sizing of the HDInsight Interactive Query Cl
3030
| yarn.nodemanager.resource.memory-mb | 102400 (MB) | Total memory given, in MB, for all YARN containers on a node |
3131
| yarn.scheduler.maximum-allocation-mb | 102400 (MB) | The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this value won't take effect |
3232
| yarn.scheduler.maximum-allocation-vcores | 12 |The maximum number of CPU cores for every container request at the Resource Manager. Requests higher than this value won't take effect. |
33+
| yarn.nodemanager.resource.cpu-vcores | 12 |
34+
| yarn.scheduler.capacity.root.llap.capacity | 80 (%) | YARN capacity allocation for llap queue |
3335
| hive.server2.tez.sessions.per.default.queue | <number_of_worker_nodes> |The number of sessions for each queue named in the hive.server2.tez.default.queues. This number corresponds to number of query coordinators(Tez AMs) |
3436
| tez.am.resource.memory.mb | 4096 (MB) | The amount of memory in MB to be used by the tez AppMaster |
3537
| hive.tez.container.size | 4096 (MB) | Specified Tez container size in MB |
36-
| yarn.scheduler.capacity.root.llap.capacity | 90% | YARN capacity allocation for llap queue |
3738
| hive.llap.daemon.num.executors | 12 | Number of executors per LLAP daemon |
3839
| hive.llap.io.threadpool.size | 12 | Thread pool size for executors |
39-
| hive.llap.daemon.yarn.container.mb | 86016 (MB) | Total memory in MB used by individual LLAP daemons (Memory per daemon)
40-
| hive.llap.io.memory.size | 409600 (MB) | Cache size in MB per LLAP daemon provided SSD cache is enabled |
40+
| hive.llap.daemon.yarn.container.mb | 77824 (MB) | Total memory in MB used by individual LLAP daemons (Memory per daemon)
41+
| hive.llap.io.memory.size | 235520 (MB) | Cache size in MB per LLAP daemon provided SSD cache is enabled |
4142
| hive.auto.convert.join.noconditionaltask.size | 2048 (MB) | memory size in MB to perform Map Join |
4243

4344
### **LLAP Daemon size estimations:**
4445

45-
#### **1. Determining YARN total memory allocation for all YARN containers on a node**
46+
#### **1. Determining total YARN memory allocation for all containers on a node**
4647
Configuration: ***yarn.nodemanager.resource.memory-mb***
4748

48-
This value indicates a maximum sum of memory in MB used by the YARN containers on each node. It specifies the amount of memory YARN can utilize on this node and therefore this value should be lesser than the total memory on that node.
49-
Set this value [Total physical memory on node] – [ memory for OS + Other services ]
49+
This value indicates a maximum sum of memory in MB that can be used by the YARN containers on each node. The value specified should be lesser than the total amount of physical memory on that node.
50+
Total memory for all YARN containers on a node [Total physical memory] – [ memory for OS + Other services ]
5051
It is recommended to set this value to ~90% of the available RAM.
5152
For D14 v2, the recommended value is **102400 MB**
5253

53-
#### **2. Determining YARN scheduler maximum allocation size per request**
54+
#### **2. Determining maximum amount of memory per YARN container request**
5455
Configuration: ***yarn.scheduler.maximum-allocation-mb***
5556

56-
This value indicates the maximum allocation for every container request at the Resource Manager, in MB. Memory requests higher than the specified value will not take effect. The Resource Manager can only allocate memory to containers in increments of *yarn.scheduler.minimum-allocation-mb* and cannot exceed the size specified by *yarn.scheduler.maximum-allocation-mb*. This value should not be more than the total allocated memory of the node, which is specified by *yarn.nodemanager.resource.memory-mb*.
57+
This value indicates the maximum allocation for every container request at the Resource Manager, in MB. Memory requests higher than the specified value will not take effect. The Resource Manager can allocate memory to containers in increments of *yarn.scheduler.minimum-allocation-mb* and cannot exceed the size specified by *yarn.scheduler.maximum-allocation-mb*. The value specified should not be more than the total allocated memory for all containers on the node specified by *yarn.nodemanager.resource.memory-mb*.
5758
For D14 v2 worker nodes, the recommended value is **102400 MB**
5859

59-
#### **3. Determining maximum amount of vcores per YARN container**
60+
#### **3. Determining maximum amount of vcores per YARN container request**
6061
Configuration: ***yarn.scheduler.maximum-allocation-vcores***
6162

6263
This value indicates the maximum number of virtual CPU cores for every container request at the Resource Manager. Requesting a higher value than this will not take effect. This is a global property of the YARN scheduler. For LLAP daemon container, this value can be set to 75% of total available vcores. The remaining 25% should be reserved for NodeManager, DataNode, and other services running on the worker nodes.
63-
For D14 v2 worker nodes, there are 16 vcores and 75% of 16 vcores can be allocated, therefore the recommended value for LLAP daemon container is **12**.
64+
For D14 v2 worker nodes, there are 16 vcores and 75% of 16 vcores can be used by LLAP daemon container, therefore the recommended value for LLAP daemon container is **12**.
6465

6566
#### **4. Number of concurrent queries**
6667
Configuration: ***hive.server2.tez.sessions.per.default.queue***
@@ -76,72 +77,91 @@ Configuration: ***tez.am.resource.memory.mb, hive.tez.container.size***
7677
The recommended value is **4096 MB**.
7778

7879
*hive.tez.container.size* - defines the amount of memory allocated for Tez container. This value must be set between the YARN minimum container size(*yarn.scheduler.minimum-allocation-mb*) and the YARN maximum container size(*yarn.scheduler.maximum-allocation-mb*).
79-
It is recommended to be set to **4096 MB**.
80+
It is recommended to be set to **4096 MB**. The LLAP daemon executors use this configuration for limiting memory usage per executor.
8081

81-
A general rule of thumb is to keep it lesser than the amount of memory per processor considering one processor per container. You should reserve memory for number of Tez AMs on a node before allocating the memory for LLAP daemon. For instance, if you are using two Tez AMs(4 GB each) per node you should allocate only 82 GB out of 90 GB for LLAP daemon reserving 8 GB for two Tez AMs.
82+
You should reserve some memory for Tez AMs on a node before allocating the memory for LLAP daemon container. For instance, if you are using two Tez AMs (4 GB each) per node, you should allocate only 82 GB out of 90 GB for LLAP daemon reserving 8 GB for two Tez AMs.
8283

8384
#### **6. LLAP Queue capacity allocation**
8485
Configuration: ***yarn.scheduler.capacity.root.llap.capacity***
8586

86-
This value indicates a percentage of capacity allocated for llap queue. The HDInsights Interactive query cluster allocates 90% of the total capacity for llap queue and the remaining 10% is set to default queue for other container allocations.
87-
For D14v2 worker nodes, the recommended value is **90** for llap queue.
87+
This value indicates a percentage of capacity allocated for llap queue. The capacity allocations may have different values for different workloads depending on how the YARN queues are configured. If your workload is read only operations then setting it as high as 90% of the capacity should work. However, if your workload is mix of update/delete/merge operations using managed tables, it is recommended to assign 80% of the capacity for llap queue. The remaining 20% capacity can be used by other internally invoked tasks such as compaction etc. to allocate containers from default queue without much depriving of YARN resources.
88+
For D14v2 worker nodes, the recommended value is **80** for llap queue. For readonly workloads, it can be increased up to 90 as suitable.
8889

8990
#### **7. LLAP daemon container size**
9091
Configuration: ***hive.llap.daemon.yarn.container.mb***
9192

92-
The total memory size for LLAP daemon depends on following components
93-
1. configuration of YARN container size (yarn.scheduler.maximum-allocation-mb, yarn.scheduler.maximum-allocation-mb, yarn.nodemanager.resource.memory-mb)
94-
2. Heap memory used by executors (Xmx)
95-
3. Off-heap in-memory cache per daemon (hive.llap.io.memory.size)
96-
4. Headroom
97-
98-
Memory per daemon = [In-memory cache size] + [Heap size] + [Head room]
99-
It can be calculated as follows;
100-
Tez AM memory per node = [ (Number of Tez AMs/Number of LLAP daemon nodes) * Tez AM size ]
101-
**LLAP daemon container size = [ 90% of YARN max container memory ][ Tez AM memory per node ]**
102-
103-
For D14 v2 worker node, HDI 4.0 - the recommended value is (90 - (1/1 * 4 GB)) = **86 GB**
104-
(For HDI 3.6, recommended value is **84 GB** because you should reserve ~2 GB for slider AM.)
93+
LLAP daemon is run as a YARN container on each worker node. The total memory size for LLAP daemon container depends on following factors,
94+
1. Configurations of YARN container size (yarn.scheduler.maximum-allocation-mb, yarn.scheduler.maximum-allocation-mb, yarn.nodemanager.resource.memory-mb)
95+
2. Number of Tez AMs on a node
96+
3. Total memory configured for all containers on a node and LLAP queue capacity
10597

106-
**Headroom size**:
107-
It is a portion of off-heap memory used for Java VM overhead (metaspace, threads stack, gc data structures, etc.). This is observed to be around 6% of the heap size (Xmx). To be on the safer side, it can be calculated as 6% of total LLAP daemon memory size because it possible when SSD cache is enabled it will allow LLAP daemon to utilize all of the available in-memory space to be used only for heap.
108-
For D14 v2, the recommended value is ceil(86 GB x 0.06) ~= **6 GB**.
98+
Memory needed by Tez Application Masters(Tez AM) can be calculated as follows,
99+
For HDInsight Interactive cluster, by default, there is one Tez AM per worker node which acts as a query coordinator. The number of Tez AMs can be configured based on a number of concurrent queries to be served.
100+
It is recommended to have 4 GB of memory per Tez AM.
109101

110-
**Heap size(Xmx)**:
111-
It is amount of RAM available after taking out Headroom size.
112-
For D14 v2, HDI 4.0 - this value is (86 GB - 6 GB) = 80 GB
113-
For D14 v2, HDI 3.6 - this is (84 GB - 6 GB) = 78 GB
102+
Tez AM memory per node = [ number of Tez AMs x Tez AM container size ]
103+
= (1 x 4 GB ) = 4 GB
114104

115-
#### **8. LLAP daemon cache size**
116-
Configuration: ***hive.llap.io.memory.size***
105+
Total Memory available for LLAP queue per worker node can be calculated as follows:
106+
This value depends on the total amount of memory available for all YARN containers on a node given by *yarn.nodemanager.resource.memory-mb* and the percentage of capacity configured for llap queue *yarn.scheduler.capacity.root.llap.capacity*..
107+
Total memory for LLAP queue on worker node = Total memory available for all YARN containers on a node x Percentage of capacity for llap queue
108+
For D14 v2, this value is [ 100 GB x 0.80 ] = 80 GB.
117109

118-
This is the amount of memory available as cache for LLAP daemon.
119-
The LLAP daemons can use SSD as a cache. Setting *hive.llap.io.allocator.mmap* = true will enable SSD caching.
120-
The D14 v2 comes with ~800 GB of SSD and the SSD caching is enabled by default for interactive query Cluster (LLAP).
121-
It is configured to use 50% of the SSD space for off-heap cache.
122-
For D14 v2, the recommended value is **409600 MB**.
110+
The LLAP daemon container size is calculated as follows;
123111

124-
For other VMs, with no SSD caching enabled, it is beneficial to allocate portion of available RAM for LLAP caching to achieve better performance. Adjust the total memory size for LLAP daemon as follows:
125-
**Total LLAP daemon memory = [LLAP cache size] + [Heap size] + [Head room]**
126-
It is recommended to adjust the cache size and the heap size that is best suitable for your workload.
112+
**LLAP daemon container size = [ Total memory available for LLAP queue ][ Tez AM memory per node ]**
113+
114+
For D14 v2 worker node, HDI 4.0 - the recommended value is (80 GB - 4 GB)) = **76 GB**
115+
(For HDI 3.6, recommended value is **74 GB** because you should reserve additional ~2 GB for slider AM.)
127116

128-
#### **9. Determining number of executors per LLAP daemon**
117+
#### **8. Determining number of executors per LLAP daemon**
129118
Configuration: ***hive.llap.daemon.num.executors***, ***hive.llap.io.threadpool.size***
130119

131120
***hive.llap.daemon.num.executors***:
132-
This configuration controls the number of executors that can execute tasks in parallel per LLAP daemon. This value is a balance of number of available vcores, the amount of memory allocated per executor and the amount of total memory available per LLAP daemon. Usually, we would like this value to be as close as possible to the number of cores.
121+
This configuration controls the number of executors that can execute tasks in parallel per LLAP daemon. This value is a balance of number of available vcores, the amount of memory allocated per executor and the amount of total memory available for LLAP daemon. Usually, we would like this value to be as close as possible to the number of cores.
133122
For D14 v2, there are 16 vcores available, however, not all of the vcores can be allocated because the worker nodes also run other services like NodeManager, DataNode, Metrics Monitor etc. that need some portion of available vcores.
134123

135-
This value can be configured up to 75% of the total vcores available on that node.
124+
This value can be configured up to 75% of the total vcores available on that node.
136125
For D14 v2, the recommended value is (.75 X 16) = **12**
137126

138-
It is recommended that you reserve ~6 GB of heap space per executor and adjust your number of executors based on available llap daemon size and number of available vcores per node.
127+
If you need to adjust your number of executors it is recommended that you consider 4 GB of memory per executor as specified by *hive.tez.container.size* and make sure total memory needed for all executors do not exceed the total memory available for LLAP daemon container.
139128

140129
***hive.llap.io.threadpool.size***:
141130
This value specifies the thread pool size for executors. Since executors are fixed as specified, it will be same as number of executors per LLAP daemon.
142131
For D14 v2, it is recommended to set this value to **12**.
143132

144-
**Note:** This configuration cannot exceed yarn.nodemanager.resource.cpu-vcores value.
133+
#### **9. Determining LLAP daemon cache size**
134+
Configuration: ***hive.llap.io.memory.size***
135+
136+
LLAP daemon container memory consist of following components;
137+
1. Head room
138+
2. Heap memory used by executors (Xmx)
139+
3. In-memory cache per daemon (this is off-heap memory, not applicable when SSD cache is enabled)
140+
4. In-memory cache metadata size (applicable only when SSD cache is enabled)
141+
142+
**Headroom size**:
143+
It is a portion of off-heap memory used for Java VM overhead (metaspace, threads stack, gc data structures, etc.). This is observed to be around 6% of the heap size (Xmx). To be on the safer side, it can be calculated as 6% of total LLAP daemon memory size
144+
For D14 v2, the recommended value is ceil(76 GB x 0.06) ~= **5 GB**.
145+
146+
**Heap size(Xmx)**:
147+
It is amount of heap memory available for all executors.
148+
Total Heap size = Number of executors x 4 GB
149+
For D14 v2, this value is 12 x 4 GB = 48 GB
150+
151+
When SSD cache is disabled, the in-memory cache is an amount of memory that is left after taking out Headroom size and Heap size from LLAP daemon container size.
152+
153+
Cache size calculation differs when SSD cache is enabled.
154+
Setting *hive.llap.io.allocator.mmap* = true will enable SSD caching.
155+
When SSD cache is enabled, some portion of the memory will be used to store metadata for the SSD cache. The metadata is stored in memory and it is expected to be ~10% of SSD cache size.
156+
SSD Cache in-memory metadata size = [ LLAP daemon container size ] - [ Head room + Heap size ]
157+
For D14 v2, with HDI 4.0, SSD cache in-memory metadata size = [ 76 GB ] - [ 5 GB + 48 GB ] = 23 GB
158+
For D14 v2, with HDI 3.6, SSD cache in-memory metadata size = [ 76 GB ] - [ 5 GB + 48 GB + 2 GB slider ] = 21 GB
159+
160+
Given the size of available memory for cache metadata, we can calculate the size of SSD cache that can be supported.
161+
size of in-memory metadata for SSD cache = 10 % of size of SSD Cache
162+
size of SSD cache = size of in-memory metadata for SSD cache x 10
163+
For D14 v2, with HDI 4.0, the recommended SSD cache size = 23 GB x 10 = 230 GB
164+
For D14 v2, with HDI 4.0, the recommended SSD cache size = 21 GB x 10 = 210 GB
145165

146166
#### **10. Adjusting Map Join memory**
147167
Configuration: ***hive.auto.convert.join.noconditionaltask.size***

0 commit comments

Comments
 (0)