Skip to content

Commit 2b5c2a4

Browse files
authored
Merge pull request #111368 from dagiro/freshness53
freshness53
2 parents 47b3583 + 98a23ee commit 2b5c2a4

File tree

1 file changed

+14
-14
lines changed

1 file changed

+14
-14
lines changed

articles/hdinsight/hdinsight-version-release.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,34 +6,34 @@ ms.author: hrasheed
66
ms.reviewer: hrasheed
77
ms.service: hdinsight
88
ms.topic: conceptual
9-
ms.date: 10/22/2019
9+
ms.date: 04/14/2020
1010
---
1111

1212
# Azure HDInsight 4.0 overview
1313

14-
Azure HDInsight is one of the most popular services among enterprise customers for open-source Apache Hadoop and Apache Spark analytics on Azure. HDInsight 4.0 is a cloud distribution of Apache Hadoop components. This article provides information about the most recent Azure HDInsight release and how to upgrade.
14+
Azure HDInsight is one of the most popular services among enterprise customers for Apache Hadoop and Apache Spark. HDInsight 4.0 is a cloud distribution of Apache Hadoop components. This article provides information about the most recent Azure HDInsight release and how to upgrade.
1515

1616
## What's new in HDInsight 4.0?
1717

18-
### Apache Hive 3.0 and LLAP
18+
### Apache Hive 3.0 and low-latency analytical processing
1919

20-
Apache Hive low-latency analytical processing (LLAP) uses persistent query servers and in-memory caching to deliver quick SQL query results on data in remote cloud storage. Hive LLAP leverages a set of persistent daemons that execute fragments of Hive queries. Query execution on LLAP is similar to Hive without LLAP, with worker tasks running inside LLAP daemons instead of containers.
20+
Apache Hive low-latency analytical processing (LLAP) uses persistent query servers and in-memory caching. This process delivers quick SQL query results on data in remote cloud storage. Hive LLAP uses a set of persistent daemons that execute fragments of Hive queries. Query execution on LLAP is similar to Hive without LLAP, with worker tasks running inside LLAP daemons instead of containers.
2121

2222
Benefits of Hive LLAP include:
2323

24-
* Ability to perform deep SQL analytics, such as complex joins, subqueries, windowing functions, sorting, user-defined functions, and complex aggregations, without sacrificing performance and scalability.
24+
* Ability to do deep SQL analytics without sacrificing performance and adaptability. Such as complex joins, subqueries, windowing functions, sorting, user-defined functions, and complex aggregations.
2525

2626
* Interactive queries against data in the same storage where data is prepared, eliminating the need to move data from storage to another engine for analytical processing.
2727

28-
* Caching query results allows previously computed query results to be reused, which saves time and resources spent running the cluster tasks required for the query.
28+
* Caching query results allows previously computed query results to be reused. This cache saves time and resources spent running the cluster tasks required for the query.
2929

3030
### Hive dynamic materialized views
3131

32-
Hive now supports dynamic materialized views, or pre-computation of relevant summaries, used to accelerate query processing in data warehouses. Materialized views can be stored natively in Hive, and can seamlessly use LLAP acceleration.
32+
Hive now supports dynamic materialized views, or pre-computation of relevant summaries. The views accelerate query processing in data warehouses. Materialized views can be stored natively in Hive, and can seamlessly use LLAP acceleration.
3333

3434
### Hive transactional tables
3535

36-
HDI 4.0 includes Apache Hive 3, which requires atomicity, consistency, isolation, and durability (ACID) compliance for transactional tables that reside in the Hive warehouse. ACID-compliant tables and table data are accessed and managed by Hive. Data in create, retrieve, update, and delete (CRUD) tables must be in Optimized Row Column (ORC) file format, but insert-only tables support all file formats.
36+
HDI 4.0 includes Apache Hive 3. Hive 3 requires atomicity, consistency, isolation, and durability compliance for transactional tables that live in the Hive warehouse. ACID-compliant tables and table data are accessed and managed by Hive. Data in create, retrieve, update, and delete (CRUD) tables must be in Optimized Row Column (ORC) file format. Insert-only tables support all file formats.
3737

3838
* ACID v2 has performance improvements in both storage format and the execution engine.
3939

@@ -51,7 +51,7 @@ Learn more about [Apache Hive 3](https://docs.hortonworks.com/HDPDocuments/HDP3/
5151

5252
### Apache Spark
5353

54-
Apache Spark gets updatable tables and ACID transactions with Hive Warehouse Connector. Hive Warehouse Connector allows you to register Hive transactional tables as external tables in Spark to access full transactional functionality. Previous versions only supported table partition manipulation. Hive Warehouse Connector also supports Streaming DataFrames for streaming reads and writes into transactional and streaming Hive tables from Spark.
54+
Apache Spark gets updatable tables and ACID transactions with Hive Warehouse Connector. Hive Warehouse Connector allows you to register Hive transactional tables as external tables in Spark to access full transactional functionality. Previous versions only supported table partition manipulation. Hive Warehouse Connector also supports Streaming DataFrames. This process streams reads and writes into transactional and streaming Hive tables from Spark.
5555

5656
Spark executors can connect directly to Hive LLAP daemons to retrieve and update data in a transactional manner, allowing Hive to keep control of the data.
5757

@@ -62,7 +62,7 @@ Apache Spark on HDInsight 4.0 supports the following scenarios:
6262
* Run a Spark streaming job on the change feed from a Hive streaming table.
6363
* Create ORC files directly from a Spark Structured Streaming job.
6464

65-
You no longer have to worry about accidentally trying to access Hive transactional tables directly from Spark, resulting in inconsistent results, duplicate data, or data corruption. In HDInsight 4.0, Spark tables and Hive tables are kept in separate Metastores. Use Hive Data Warehouse Connector to explicitly register Hive transactional tables as Spark external tables.
65+
You no longer have to worry about accidentally trying to access Hive transactional tables directly from Spark. Resulting in inconsistent results, duplicate data, or data corruption. In HDInsight 4.0, Spark tables and Hive tables are kept in separate Metastores. Use Hive Data Warehouse Connector to explicitly register Hive transactional tables as Spark external tables.
6666

6767
Learn more about [Apache Spark](https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/spark-overview/content/analyzing_data_with_apache_spark.html).
6868

@@ -78,18 +78,18 @@ Learn more about [Apache Oozie](https://docs.hortonworks.com/HDPDocuments/HDP3/H
7878

7979
## How to upgrade to HDInsight 4.0
8080

81-
As with any major release, it's important to thoroughly test your components before implementing the latest version in a production environment. HDInsight 4.0 is available for you to begin the upgrade process, but HDInsight 3.6 is the default option to prevent accidental mishaps.
81+
Thoroughly test your components before implementing the latest version in a production environment. HDInsight 4.0 is available for you to begin the upgrade process. HDInsight 3.6 is the default option to prevent accidental mishaps.
8282

83-
There's no supported upgrade path from previous versions of HDInsight to HDInsight 4.0. Because Metastore and blob data formats have changed, HDInsight 4.0 isn't compatible with previous versions. It's important that you keep your new HDInsight 4.0 environment separate from your current production environment. If you deploy HDInsight 4.0 to your current environment, your Metastore will be upgraded and can't be reversed.
83+
There's no supported upgrade path from previous versions of HDInsight to HDInsight 4.0. Because Metastore and blob data formats have changed, 4.0 isn't compatible with previous versions. It's important you keep your new HDInsight 4.0 environment separate from your current production environment. If you deploy HDInsight 4.0 to your current environment, your Metastore will be permanently upgraded.
8484

8585
## Limitations
8686

8787
* HDInsight 4.0 doesn't support MapReduce for Apache Hive. Use Apache Tez instead. Learn more about [Apache Tez](https://tez.apache.org/).
8888
* HDInsight 4.0 doesn't support Apache Storm.
8989
* Hive View is no longer available in HDInsight 4.0.
9090
* Shell interpreter in Apache Zeppelin isn't supported in Spark and Interactive Query clusters.
91-
* You can't *disable* LLAP on a Spark-LLAP cluster. You can only turn LLAP off.
92-
* Azure Data Lake Storage Gen2 can't save Juypter notebooks in a Spark cluster.
91+
* You can't *disable* LLAP on a Spark-LLAP cluster. You can only turn off LLAP.
92+
* Azure Data Lake Storage Gen2 can't save Jupyter notebooks in a Spark cluster.
9393

9494
## Next steps
9595

0 commit comments

Comments
 (0)