freshness149

dagiro · dagiro · commit b93acd7cade0 · 2019-12-27T17:03:43.000-08:00
diff --git a/articles/hdinsight/hbase/apache-hbase-phoenix-performance.md b/articles/hdinsight/hbase/apache-hbase-phoenix-performance.md
@@ -2,15 +2,14 @@
 title: Phoenix performance in Azure HDInsight 
 description: Best practices to optimize Apache Phoenix performance for Azure HDInsight clusters
 author: ashishthaps
+ms.author: ashishth
 ms.reviewer: jasonh
-
 ms.service: hdinsight
-ms.custom: hdinsightactive
 ms.topic: conceptual
-ms.date: 01/22/2018
-ms.author: ashishth
-
+ms.custom: hdinsightactive
+ms.date: 12/27/2019
 ---
+
 # Apache Phoenix performance best practices
 
 The most important aspect of [Apache Phoenix](https://phoenix.apache.org/) performance is to optimize the underlying [Apache HBase](https://hbase.apache.org/). Phoenix creates a relational data model atop HBase that converts SQL queries into HBase operations, such as scans. The design of your table schema, the selection and ordering of the fields in your primary key, and your use of indexes all affect Phoenix performance.
@@ -23,7 +22,7 @@ The schema design of a Phoenix table includes the primary key design, column fam
 
 ### Primary key design
 
-The primary key defined on a table in Phoenix determines how data is stored within the rowkey of the underlying HBase table. In HBase, the only way to access a particular row is with the rowkey. In addition, data stored in an HBase table is sorted by the rowkey. Phoenix builds the rowkey value by concatenating the values of each of the columns in the row, in the order they are defined in the primary key.
+The primary key defined on a table in Phoenix determines how data is stored within the rowkey of the underlying HBase table. In HBase, the only way to access a particular row is with the rowkey. In addition, data stored in an HBase table is sorted by the rowkey. Phoenix builds the rowkey value by concatenating the values of each of the columns in the row, in the order they're defined in the primary key.
 
 For example, a table for contacts has the first name, last name, phone number, and address, all in the same column family. You could define a primary key based on an increasing sequence number:
 
@@ -48,13 +47,13 @@ With this new primary key the row keys generated by Phoenix would be:
 
 In the first row above, the data for the rowkey is represented as shown:
 
-|rowkey|       key|   value| 
+|rowkey|       key|   value|
 |------|--------------------|---|
 |  Dole-John-111|address |1111 San Gabriel Dr.|  
 |  Dole-John-111|phone |1-425-000-0002|  
 |  Dole-John-111|firstName |John|  
 |  Dole-John-111|lastName |Dole|  
-|  Dole-John-111|socialSecurityNum |111| 
+|  Dole-John-111|socialSecurityNum |111|
 
 This rowkey now stores a duplicate copy of the data. Consider the size and number of columns you include in your primary key, because this value is included with every cell in the underlying HBase table.
 
@@ -68,8 +67,8 @@ Also, if certain columns tend to be accessed together, put those columns in the
 
 ### Column design
 
-* Keep VARCHAR columns under about 1 MB due to the I/O costs of large columns. When processing queries, HBase materializes cells in full before sending them over to the client, and the client receives them in full before handing them off to the application code.
-* Store column values using a compact format such as protobuf, Avro, msgpack, or BSON. JSON is not recommended, as it is larger.
+* Keep VARCHAR columns under about 1 MB because of the I/O costs of large columns. When processing queries, HBase materializes cells in full before sending them over to the client, and the client receives them in full before handing them off to the application code.
+* Store column values using a compact format such as protobuf, Avro, msgpack, or BSON. JSON isn't recommended, as it's larger.
 * Consider compressing data before storage to cut latency and I/O costs.
 
 ### Partition data
@@ -105,7 +104,7 @@ Secondary indexes can improve read performance by turning what would be a full t
 
 ### Use covered indexes
 
-Covered indexes are indexes that include data from the row in addition to the values that are indexed. After finding the desired index entry, there is no need to access the primary table.
+Covered indexes are indexes that include data from the row in addition to the values that are indexed. After finding the desired index entry, there's no need to access the primary table.
 
 For example, in the example contact table you could create a secondary index on just the socialSecurityNum column. This secondary index would speed up queries that filter by socialSecurityNum values, but retrieving other field values will require another read against the main table.
 
@@ -149,7 +148,7 @@ In [SQLLine](http://sqlline.sourceforge.net/), use EXPLAIN followed by your SQL
 
 As an example, say you have a table called FLIGHTS that stores flight delay information.
 
-To select all the flights with an airlineid of `19805`, where airlineid is a field that is not in the primary key or in any index:
+To select all the flights with an airlineid of `19805`, where airlineid is a field that isn't in the primary key or in any index:
 
     select * from "FLIGHTS" where airlineid = '19805';
 
@@ -204,15 +203,15 @@ The following guidelines describe some common patterns.
 
 ### Read-heavy workloads
 
-For read-heavy use cases, make sure you are using indexes. Additionally, to save read-time overhead, consider creating covered indexes.
+For read-heavy use cases, make sure you're using indexes. Additionally, to save read-time overhead, consider creating covered indexes.
 
 ### Write-heavy workloads
 
-For write-heavy workloads where the primary key is monotonically increasing, create salt buckets to help avoid write hotspots, at the expense of overall read throughput due to the additional scans needed. Also, when using UPSERT to write a large number of records, turn off autoCommit and batch up the records.
+For write-heavy workloads where the primary key is monotonically increasing, create salt buckets to help avoid write hotspots, at the expense of overall read throughput because of the additional scans needed. Also, when using UPSERT to write a large number of records, turn off autoCommit and batch up the records.
 
 ### Bulk deletes
 
-When deleting a large data set, turn on autoCommit before issuing the DELETE query, so that the client does not need to remember the row keys for all deleted rows. AutoCommit prevents the client from buffering the rows affected by the DELETE, so that Phoenix can delete them directly on the region servers without the expense of returning them to the client.
+When deleting a large data set, turn on autoCommit before issuing the DELETE query, so that the client doesn't need to remember the row keys for all deleted rows. AutoCommit prevents the client from buffering the rows affected by the DELETE, so that Phoenix can delete them directly on the region servers without the expense of returning them to the client.
 
 ### Immutable and Append-only