Skip to content

Commit 6a6d39c

Browse files
committed
edit pass: benefits-of-migrating-to-hdinsight-40
1 parent 60d1075 commit 6a6d39c

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

articles/hdinsight/benefits-of-migrating-to-hdinsight-40.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ HDInsight 4.0 has several advantages over HDInsight 3.6. Here's an overview of w
4343
### HBase
4444

4545
- Advanced features:
46-
- Procedure V2 (`procv2`), an updated framework for executing multistep HBase administrative operations.
46+
- Procedure V2 (procv2), an updated framework for executing multistep HBase administrative operations.
4747
- Fully off-heap read/write path.
4848
- In-memory compactions.
4949
- HBase cluster support of the Azure Data Lake Storage Gen2 Premium tier.
@@ -56,7 +56,7 @@ HDInsight 4.0 has several advantages over HDInsight 3.6. Here's an overview of w
5656

5757
- Advanced features:
5858
- Kafka partition distribution on Azure fault domains.
59-
- Zstandard (`zstd`) compression support.
59+
- Zstandard (zstd) compression support.
6060
- Kafka Consumer Incremental Rebalance.
6161
- Support for MirrorMaker 2.0.
6262
- Performance advantage:
@@ -145,7 +145,7 @@ Vectorized query execution is a feature that greatly reduces the CPU usage for t
145145
- Aggregate
146146
- Join
147147

148-
Vectorization is also implemented for the ORC format. Spark also uses whole-stage code generation and this vectorization (for Parquet) since Spark 2.0. There's an added timestamp column for Parquet vectorization and format under LLAP.
148+
Vectorization is also implemented for the ORC format. Spark also uses whole-stage code generation and this vectorization (for Parquet) since Spark 2.0. There's an added time-stamp column for Parquet vectorization and format under LLAP.
149149

150150
> [!WARNING]
151151
> Parquet writes are slow when you convert to zoned times from the time stamp. For more information, see the [issue details](https://issues.apache.org/jira/browse/HIVE-24693) on the Apache Hive site.
@@ -183,17 +183,17 @@ For more information, see the [Azure blog post on Hive materialized views](https
183183

184184
## Surrogate keys
185185

186-
Use the built-in `SURROGATE_KEY` user-defined function (UDF) to automatically generate numerical IDs for rows as you enter data into a table. The generated surrogate keys can replace wide, multiple composite keys.
186+
Use the built-in `SURROGATE_KEY` UDF to automatically generate numerical IDs for rows as you enter data into a table. The generated surrogate keys can replace wide, multiple composite keys.
187187

188188
Hive supports the surrogate keys on ACID tables only. The table that you want to join by using surrogate keys can't have column types that need to cast. These data types must be primitives, such as `INT` or `STRING`.
189189

190190
Joins that use the generated keys are faster than joins that use strings. Using generated keys doesn't force data into a single node by a row number. You can generate keys as abstractions of natural keys. Surrogate keys have an advantage over universally unique identifiers (UUIDs), which are slower and probabilistic.
191191

192192
The `SURROGATE_KEY` UDF generates a unique ID for every row that you insert into a table. It generates keys based on the execution environment in a distributed system, which includes many factors such as:
193193

194-
- Internal data structures.
195-
- State of a table.
196-
- Last transaction ID.
194+
- Internal data structures
195+
- State of a table
196+
- Last transaction ID
197197

198198
Surrogate key generation doesn't require any coordination between compute tasks. The UDF takes no arguments, or two arguments are:
199199

0 commit comments

Comments
 (0)