Skip to content

Commit c8f00b1

Browse files
authored
Update apache-hbase-advisor.md
1 parent a8383b1 commit c8f00b1

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

articles/hdinsight/hbase/apache-hbase-advisor.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ This article describes several advisories to help you optimize the Apache HBase
1515

1616
## Optimize HBase to read most recently written data
1717

18-
If your usecase involves reading the most recently written data from HBase, this advisory can help you. For high performance, it's optimal that HBase reads are to be served from memstore, instead of the remote storage.
18+
If your use case involves reading the most recently written data from HBase, this advisory can help you. For high performance, it is optimal that HBase reads are to be served from `memstore`, instead of the remote storage.
1919

20-
The query advisory indicates that for a given column family in a table > 75% reads that are getting served from memstore. This indicator suggests that even if a flush happens on the memstore the recent file needs to be accessed and that needs to be in cache. The data is first written to memstore the system accesses the recent data there. There's a chance that the internal HBase flusher threads detect that a given region has reached 128M (default) size and can trigger a flush. This scenario happens to even the most recent data that was written when the memstore was around 128M in size. Therefore, a later read of those recent records may require a file read rather than from memstore. Hence it is best to optimize that even recent data that is recently flushed can reside in the cache.
20+
The query advisory indicates that for a given column family in a table > 75% reads that are getting served from `memstore`. This indicator suggests that even if a flush happens on the `memstore` the recent file needs to be accessed and that needs to be in cache. The data is first written to `memstore` the system accesses the recent data there. There's a chance that the internal HBase flusher threads detect that a given region has reached 128M (default) size and can trigger a flush. This scenario happens to even the most recent data that was written when the `memstore` was around 128M in size. Therefore, a later read of those recent records may require a file read rather than from `memstore`. Hence it is best to optimize that even recent data that is recently flushed can reside in the cache.
2121

2222
To optimize the recent data in cache, consider the following configuration settings:
2323

@@ -41,25 +41,25 @@ To optimize the recent data in cache, consider the following configuration setti
4141

4242
## Optimize the flush queue
4343

44-
This advisory indicates that HBase flushes may need tuning. The current configuration for flush handlers may not be high enough to handle with write traffic which may lead to slow down of flushes .
44+
This advisory indicates that HBase flushes may need tuning. The current configuration for flush handlers may not be high enough to handle with write traffic that may lead to slow down of flushes.
4545

4646
In the region server UI, notice if the flush queue grows beyond 100. This threshold indicates the flushes are slow and you may have to tune the `hbase.hstore.flusher.count` configuration. By default, the value is 2. Ensure that the max flusher threads don't increase beyond 6.
4747

48-
Additionally, see if you have a recommendation for region count tuning. If we yes, we suggest you to try the region tuning to see if that helps in faster flushes. Otherwise, tuning the flusher threads may help you.
48+
Additionally, see if you have a recommendation for region count tuning. If yes, we suggest you to try the region tuning to see if that helps in faster flushes. Otherwise, tuning the flusher threads may help you.
4949

5050
## Region count tuning
5151

52-
The region count tuning advisory indicates that HBase has blocked updates, and the region count may be more than the optimally supported heap size. You can tune the heap size, memstore size, and the region count.
52+
The region count tuning advisory indicates that HBase has blocked updates, and the region count may be more than the optimally supported heap size. You can tune the heap size, `memstore` size, and the region count.
5353

5454
As an example scenario:
5555

56-
- Assume the heap size for the region server is 10 GB. By default the `hbase.hregion.memstore.flush.size` is `128M`. The default value for `hbase.regionserver.global.memstore.size` is `0.4`. Which means that out of the 10 GB, 4 GB is allocated for memstore (globally).
56+
- Assume the heap size for the region server is 10 GB. By default the `hbase.hregion.memstore.flush.size` is `128M`. The default value for `hbase.regionserver.global.memstore.size` is `0.4`. Which means that out of the 10 GB, 4 GB is allocated for `memstore` (globally).
5757

5858
- Assume there's an even distribution of the write load on all the regions and assuming every region grows upto 128 MB only then the max number of regions in this setup is `32` regions. If a given region server is configured to have 32 regions, the system better avoids blocking updates.
5959

60-
- With these settings in place, the number of regions is 100. The 4-GB global memstore is now split across 100 regions. So effectively each region gets only 40 MB for memstore. When the writes are uniform, the system does frequent flushes and smaller size of the order < 40 MB. Having many flusher threads might increase the flush speed `hbase.hstore.flusher.count`.
60+
- With these settings in place, the number of regions is 100. The 4-GB global `memstore` is now split across 100 regions. So effectively each region gets only 40 MB for `memstore`. When the writes are uniform, the system does frequent flushes and smaller size of the order < 40 MB. Having many flusher threads might increase the flush speed `hbase.hstore.flusher.count`.
6161

62-
The advisory means that it would be good to reconsider the number of regions per server, the heap size, and the global memstore size configuration along with the tuning of flush threads to avoid updates getting blocked.
62+
The advisory means that it would be good to reconsider the number of regions per server, the heap size, and the global `memstore` size configuration along with the tuning of flush threads to avoid updates getting blocked.
6363

6464
## Compaction queue tuning
6565

0 commit comments

Comments
 (0)