You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hbase/apache-hbase-advisor.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: yeturis
6
6
ms.author: sairamyeturi
7
7
ms.service: hdinsight
8
8
ms.topic: conceptual
9
-
ms.date: 07/20/2022
9
+
ms.date: 09/15/2023
10
10
#Customer intent: The azure advisories help to tune the cluster/query. This doc gives a much deeper understanding of the various advisories including the recommended configuration tunings.
11
11
---
12
12
# Apache HBase advisories in Azure HDInsight
@@ -15,9 +15,9 @@ This article describes several advisories to help you optimize the Apache HBase
15
15
16
16
## Optimize HBase to read most recently written data
17
17
18
-
If your usecase involves reading the most recently written data from HBase, this advisory can help you. For high performance, it's optimal that HBase reads are to be served from memstore, instead of the remote storage.
18
+
If your use case involves reading the most recently written data from HBase, this advisory can help you. For high performance, it is optimal that HBase reads are to be served from `memstore`, instead of the remote storage.
19
19
20
-
The query advisory indicates that for a given column family in a table > 75% reads that are getting served from memstore. This indicator suggests that even if a flush happens on the memstore the recent file needs to be accessed and that needs to be in cache. The data is first written to memstore the system accesses the recent data there. There's a chance that the internal HBase flusher threads detect that a given region has reached 128M (default) size and can trigger a flush. This scenario happens to even the most recent data that was written when the memstore was around 128M in size. Therefore, a later read of those recent records may require a file read rather than from memstore. Hence it is best to optimize that even recent data that is recently flushed can reside in the cache.
20
+
The query advisory indicates that for a given column family in a table > 75% reads that are getting served from `memstore`. This indicator suggests that even if a flush happens on the `memstore` the recent file needs to be accessed and that needs to be in cache. The data is first written to `memstore` the system accesses the recent data there. There's a chance that the internal HBase flusher threads detect that a given region has reached 128M (default) size and can trigger a flush. This scenario happens to even the most recent data that was written when the `memstore` was around 128M in size. Therefore, a later read of those recent records may require a file read rather than from `memstore`. Hence it is best to optimize that even recent data that is recently flushed can reside in the cache.
21
21
22
22
To optimize the recent data in cache, consider the following configuration settings:
23
23
@@ -41,25 +41,25 @@ To optimize the recent data in cache, consider the following configuration setti
41
41
42
42
## Optimize the flush queue
43
43
44
-
This advisory indicates that HBase flushes may need tuning. The current configuration for flush handlers may not be high enough to handle with write traffic which may lead to slow down of flushes.
44
+
This advisory indicates that HBase flushes may need tuning. The current configuration for flush handlers may not be high enough to handle with write traffic that may lead to slow down of flushes.
45
45
46
46
In the region server UI, notice if the flush queue grows beyond 100. This threshold indicates the flushes are slow and you may have to tune the `hbase.hstore.flusher.count` configuration. By default, the value is 2. Ensure that the max flusher threads don't increase beyond 6.
47
47
48
-
Additionally, see if you have a recommendation for region count tuning. If we yes, we suggest you to try the region tuning to see if that helps in faster flushes. Otherwise, tuning the flusher threads may help you.
48
+
Additionally, see if you have a recommendation for region count tuning. If yes, we suggest you to try the region tuning to see if that helps in faster flushes. Otherwise, tuning the flusher threads may help you.
49
49
50
50
## Region count tuning
51
51
52
-
The region count tuning advisory indicates that HBase has blocked updates, and the region count may be more than the optimally supported heap size. You can tune the heap size, memstore size, and the region count.
52
+
The region count tuning advisory indicates that HBase has blocked updates, and the region count may be more than the optimally supported heap size. You can tune the heap size, `memstore` size, and the region count.
53
53
54
54
As an example scenario:
55
55
56
-
- Assume the heap size for the region server is 10 GB. By default the `hbase.hregion.memstore.flush.size` is `128M`. The default value for `hbase.regionserver.global.memstore.size` is `0.4`. Which means that out of the 10 GB, 4 GB is allocated for memstore (globally).
56
+
- Assume the heap size for the region server is 10 GB. By default the `hbase.hregion.memstore.flush.size` is `128M`. The default value for `hbase.regionserver.global.memstore.size` is `0.4`. Which means that out of the 10 GB, 4 GB is allocated for `memstore` (globally).
57
57
58
58
- Assume there's an even distribution of the write load on all the regions and assuming every region grows upto 128 MB only then the max number of regions in this setup is `32` regions. If a given region server is configured to have 32 regions, the system better avoids blocking updates.
59
59
60
-
- With these settings in place, the number of regions is 100. The 4-GB global memstore is now split across 100 regions. So effectively each region gets only 40 MB for memstore. When the writes are uniform, the system does frequent flushes and smaller size of the order < 40 MB. Having many flusher threads might increase the flush speed `hbase.hstore.flusher.count`.
60
+
- With these settings in place, the number of regions is 100. The 4-GB global `memstore` is now split across 100 regions. So effectively each region gets only 40 MB for `memstore`. When the writes are uniform, the system does frequent flushes and smaller size of the order < 40 MB. Having many flusher threads might increase the flush speed `hbase.hstore.flusher.count`.
61
61
62
-
The advisory means that it would be good to reconsider the number of regions per server, the heap size, and the global memstore size configuration along with the tuning of flush threads to avoid updates getting blocked.
62
+
The advisory means that it would be good to reconsider the number of regions per server, the heap size, and the global `memstore` size configuration along with the tuning of flush threads to avoid updates getting blocked.
0 commit comments