Skip to content

Commit b93acd7

Browse files
committed
freshness149
1 parent fda1f43 commit b93acd7

File tree

1 file changed

+14
-15
lines changed

1 file changed

+14
-15
lines changed

articles/hdinsight/hbase/apache-hbase-phoenix-performance.md

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,14 @@
22
title: Phoenix performance in Azure HDInsight
33
description: Best practices to optimize Apache Phoenix performance for Azure HDInsight clusters
44
author: ashishthaps
5+
ms.author: ashishth
56
ms.reviewer: jasonh
6-
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive
98
ms.topic: conceptual
10-
ms.date: 01/22/2018
11-
ms.author: ashishth
12-
9+
ms.custom: hdinsightactive
10+
ms.date: 12/27/2019
1311
---
12+
1413
# Apache Phoenix performance best practices
1514

1615
The most important aspect of [Apache Phoenix](https://phoenix.apache.org/) performance is to optimize the underlying [Apache HBase](https://hbase.apache.org/). Phoenix creates a relational data model atop HBase that converts SQL queries into HBase operations, such as scans. The design of your table schema, the selection and ordering of the fields in your primary key, and your use of indexes all affect Phoenix performance.
@@ -23,7 +22,7 @@ The schema design of a Phoenix table includes the primary key design, column fam
2322

2423
### Primary key design
2524

26-
The primary key defined on a table in Phoenix determines how data is stored within the rowkey of the underlying HBase table. In HBase, the only way to access a particular row is with the rowkey. In addition, data stored in an HBase table is sorted by the rowkey. Phoenix builds the rowkey value by concatenating the values of each of the columns in the row, in the order they are defined in the primary key.
25+
The primary key defined on a table in Phoenix determines how data is stored within the rowkey of the underlying HBase table. In HBase, the only way to access a particular row is with the rowkey. In addition, data stored in an HBase table is sorted by the rowkey. Phoenix builds the rowkey value by concatenating the values of each of the columns in the row, in the order they're defined in the primary key.
2726

2827
For example, a table for contacts has the first name, last name, phone number, and address, all in the same column family. You could define a primary key based on an increasing sequence number:
2928

@@ -48,13 +47,13 @@ With this new primary key the row keys generated by Phoenix would be:
4847

4948
In the first row above, the data for the rowkey is represented as shown:
5049

51-
|rowkey| key| value|
50+
|rowkey| key| value|
5251
|------|--------------------|---|
5352
| Dole-John-111|address |1111 San Gabriel Dr.|
5453
| Dole-John-111|phone |1-425-000-0002|
5554
| Dole-John-111|firstName |John|
5655
| Dole-John-111|lastName |Dole|
57-
| Dole-John-111|socialSecurityNum |111|
56+
| Dole-John-111|socialSecurityNum |111|
5857

5958
This rowkey now stores a duplicate copy of the data. Consider the size and number of columns you include in your primary key, because this value is included with every cell in the underlying HBase table.
6059

@@ -68,8 +67,8 @@ Also, if certain columns tend to be accessed together, put those columns in the
6867

6968
### Column design
7069

71-
* Keep VARCHAR columns under about 1 MB due to the I/O costs of large columns. When processing queries, HBase materializes cells in full before sending them over to the client, and the client receives them in full before handing them off to the application code.
72-
* Store column values using a compact format such as protobuf, Avro, msgpack, or BSON. JSON is not recommended, as it is larger.
70+
* Keep VARCHAR columns under about 1 MB because of the I/O costs of large columns. When processing queries, HBase materializes cells in full before sending them over to the client, and the client receives them in full before handing them off to the application code.
71+
* Store column values using a compact format such as protobuf, Avro, msgpack, or BSON. JSON isn't recommended, as it's larger.
7372
* Consider compressing data before storage to cut latency and I/O costs.
7473

7574
### Partition data
@@ -105,7 +104,7 @@ Secondary indexes can improve read performance by turning what would be a full t
105104

106105
### Use covered indexes
107106

108-
Covered indexes are indexes that include data from the row in addition to the values that are indexed. After finding the desired index entry, there is no need to access the primary table.
107+
Covered indexes are indexes that include data from the row in addition to the values that are indexed. After finding the desired index entry, there's no need to access the primary table.
109108

110109
For example, in the example contact table you could create a secondary index on just the socialSecurityNum column. This secondary index would speed up queries that filter by socialSecurityNum values, but retrieving other field values will require another read against the main table.
111110

@@ -149,7 +148,7 @@ In [SQLLine](http://sqlline.sourceforge.net/), use EXPLAIN followed by your SQL
149148

150149
As an example, say you have a table called FLIGHTS that stores flight delay information.
151150

152-
To select all the flights with an airlineid of `19805`, where airlineid is a field that is not in the primary key or in any index:
151+
To select all the flights with an airlineid of `19805`, where airlineid is a field that isn't in the primary key or in any index:
153152

154153
select * from "FLIGHTS" where airlineid = '19805';
155154

@@ -204,15 +203,15 @@ The following guidelines describe some common patterns.
204203

205204
### Read-heavy workloads
206205

207-
For read-heavy use cases, make sure you are using indexes. Additionally, to save read-time overhead, consider creating covered indexes.
206+
For read-heavy use cases, make sure you're using indexes. Additionally, to save read-time overhead, consider creating covered indexes.
208207

209208
### Write-heavy workloads
210209

211-
For write-heavy workloads where the primary key is monotonically increasing, create salt buckets to help avoid write hotspots, at the expense of overall read throughput due to the additional scans needed. Also, when using UPSERT to write a large number of records, turn off autoCommit and batch up the records.
210+
For write-heavy workloads where the primary key is monotonically increasing, create salt buckets to help avoid write hotspots, at the expense of overall read throughput because of the additional scans needed. Also, when using UPSERT to write a large number of records, turn off autoCommit and batch up the records.
212211

213212
### Bulk deletes
214213

215-
When deleting a large data set, turn on autoCommit before issuing the DELETE query, so that the client does not need to remember the row keys for all deleted rows. AutoCommit prevents the client from buffering the rows affected by the DELETE, so that Phoenix can delete them directly on the region servers without the expense of returning them to the client.
214+
When deleting a large data set, turn on autoCommit before issuing the DELETE query, so that the client doesn't need to remember the row keys for all deleted rows. AutoCommit prevents the client from buffering the rows affected by the DELETE, so that Phoenix can delete them directly on the region servers without the expense of returning them to the client.
216215

217216
### Immutable and Append-only
218217

0 commit comments

Comments
 (0)