You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-version-release.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,34 +6,34 @@ ms.author: hrasheed
6
6
ms.reviewer: hrasheed
7
7
ms.service: hdinsight
8
8
ms.topic: conceptual
9
-
ms.date: 10/22/2019
9
+
ms.date: 04/14/2020
10
10
---
11
11
12
12
# Azure HDInsight 4.0 overview
13
13
14
-
Azure HDInsight is one of the most popular services among enterprise customers for open-source Apache Hadoop and Apache Spark analytics on Azure. HDInsight 4.0 is a cloud distribution of Apache Hadoop components. This article provides information about the most recent Azure HDInsight release and how to upgrade.
14
+
Azure HDInsight is one of the most popular services among enterprise customers for Apache Hadoop and Apache Spark. HDInsight 4.0 is a cloud distribution of Apache Hadoop components. This article provides information about the most recent Azure HDInsight release and how to upgrade.
15
15
16
16
## What's new in HDInsight 4.0?
17
17
18
-
### Apache Hive 3.0 and LLAP
18
+
### Apache Hive 3.0 and low-latency analytical processing
19
19
20
-
Apache Hive low-latency analytical processing (LLAP) uses persistent query servers and in-memory caching to deliver quick SQL query results on data in remote cloud storage. Hive LLAP leverages a set of persistent daemons that execute fragments of Hive queries. Query execution on LLAP is similar to Hive without LLAP, with worker tasks running inside LLAP daemons instead of containers.
20
+
Apache Hive low-latency analytical processing (LLAP) uses persistent query servers and in-memory caching. This process delivers quick SQL query results on data in remote cloud storage. Hive LLAP uses a set of persistent daemons that execute fragments of Hive queries. Query execution on LLAP is similar to Hive without LLAP, with worker tasks running inside LLAP daemons instead of containers.
21
21
22
22
Benefits of Hive LLAP include:
23
23
24
-
* Ability to perform deep SQL analytics, such as complex joins, subqueries, windowing functions, sorting, user-defined functions, and complex aggregations, without sacrificing performance and scalability.
24
+
* Ability to do deep SQL analytics without sacrificing performance and adaptability. Such as complex joins, subqueries, windowing functions, sorting, user-defined functions, and complex aggregations.
25
25
26
26
* Interactive queries against data in the same storage where data is prepared, eliminating the need to move data from storage to another engine for analytical processing.
27
27
28
-
* Caching query results allows previously computed query results to be reused, which saves time and resources spent running the cluster tasks required for the query.
28
+
* Caching query results allows previously computed query results to be reused. This cache saves time and resources spent running the cluster tasks required for the query.
29
29
30
30
### Hive dynamic materialized views
31
31
32
-
Hive now supports dynamic materialized views, or pre-computation of relevant summaries, used to accelerate query processing in data warehouses. Materialized views can be stored natively in Hive, and can seamlessly use LLAP acceleration.
32
+
Hive now supports dynamic materialized views, or pre-computation of relevant summaries. The views accelerate query processing in data warehouses. Materialized views can be stored natively in Hive, and can seamlessly use LLAP acceleration.
33
33
34
34
### Hive transactional tables
35
35
36
-
HDI 4.0 includes Apache Hive 3, which requires atomicity, consistency, isolation, and durability (ACID) compliance for transactional tables that reside in the Hive warehouse. ACID-compliant tables and table data are accessed and managed by Hive. Data in create, retrieve, update, and delete (CRUD) tables must be in Optimized Row Column (ORC) file format, but insert-only tables support all file formats.
36
+
HDI 4.0 includes Apache Hive 3. Hive 3 requires atomicity, consistency, isolation, and durability compliance for transactional tables that live in the Hive warehouse. ACID-compliant tables and table data are accessed and managed by Hive. Data in create, retrieve, update, and delete (CRUD) tables must be in Optimized Row Column (ORC) file format. Insert-only tables support all file formats.
37
37
38
38
* ACID v2 has performance improvements in both storage format and the execution engine.
39
39
@@ -51,7 +51,7 @@ Learn more about [Apache Hive 3](https://docs.hortonworks.com/HDPDocuments/HDP3/
51
51
52
52
### Apache Spark
53
53
54
-
Apache Spark gets updatable tables and ACID transactions with Hive Warehouse Connector. Hive Warehouse Connector allows you to register Hive transactional tables as external tables in Spark to access full transactional functionality. Previous versions only supported table partition manipulation. Hive Warehouse Connector also supports Streaming DataFrames for streaming reads and writes into transactional and streaming Hive tables from Spark.
54
+
Apache Spark gets updatable tables and ACID transactions with Hive Warehouse Connector. Hive Warehouse Connector allows you to register Hive transactional tables as external tables in Spark to access full transactional functionality. Previous versions only supported table partition manipulation. Hive Warehouse Connector also supports Streaming DataFrames. This process streams reads and writes into transactional and streaming Hive tables from Spark.
55
55
56
56
Spark executors can connect directly to Hive LLAP daemons to retrieve and update data in a transactional manner, allowing Hive to keep control of the data.
57
57
@@ -62,7 +62,7 @@ Apache Spark on HDInsight 4.0 supports the following scenarios:
62
62
* Run a Spark streaming job on the change feed from a Hive streaming table.
63
63
* Create ORC files directly from a Spark Structured Streaming job.
64
64
65
-
You no longer have to worry about accidentally trying to access Hive transactional tables directly from Spark, resulting in inconsistent results, duplicate data, or data corruption. In HDInsight 4.0, Spark tables and Hive tables are kept in separate Metastores. Use Hive Data Warehouse Connector to explicitly register Hive transactional tables as Spark external tables.
65
+
You no longer have to worry about accidentally trying to access Hive transactional tables directly from Spark. Resulting in inconsistent results, duplicate data, or data corruption. In HDInsight 4.0, Spark tables and Hive tables are kept in separate Metastores. Use Hive Data Warehouse Connector to explicitly register Hive transactional tables as Spark external tables.
66
66
67
67
Learn more about [Apache Spark](https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/spark-overview/content/analyzing_data_with_apache_spark.html).
68
68
@@ -78,18 +78,18 @@ Learn more about [Apache Oozie](https://docs.hortonworks.com/HDPDocuments/HDP3/H
78
78
79
79
## How to upgrade to HDInsight 4.0
80
80
81
-
As with any major release, it's important to thoroughly test your components before implementing the latest version in a production environment. HDInsight 4.0 is available for you to begin the upgrade process, but HDInsight 3.6 is the default option to prevent accidental mishaps.
81
+
Thoroughly test your components before implementing the latest version in a production environment. HDInsight 4.0 is available for you to begin the upgrade process. HDInsight 3.6 is the default option to prevent accidental mishaps.
82
82
83
-
There's no supported upgrade path from previous versions of HDInsight to HDInsight 4.0. Because Metastore and blob data formats have changed, HDInsight 4.0 isn't compatible with previous versions. It's important that you keep your new HDInsight 4.0 environment separate from your current production environment. If you deploy HDInsight 4.0 to your current environment, your Metastore will be upgraded and can't be reversed.
83
+
There's no supported upgrade path from previous versions of HDInsight to HDInsight 4.0. Because Metastore and blob data formats have changed, 4.0 isn't compatible with previous versions. It's important you keep your new HDInsight 4.0 environment separate from your current production environment. If you deploy HDInsight 4.0 to your current environment, your Metastore will be permanently upgraded.
84
84
85
85
## Limitations
86
86
87
87
* HDInsight 4.0 doesn't support MapReduce for Apache Hive. Use Apache Tez instead. Learn more about [Apache Tez](https://tez.apache.org/).
88
88
* HDInsight 4.0 doesn't support Apache Storm.
89
89
* Hive View is no longer available in HDInsight 4.0.
90
90
* Shell interpreter in Apache Zeppelin isn't supported in Spark and Interactive Query clusters.
91
-
* You can't *disable* LLAP on a Spark-LLAP cluster. You can only turn LLAP off.
92
-
* Azure Data Lake Storage Gen2 can't save Juypter notebooks in a Spark cluster.
91
+
* You can't *disable* LLAP on a Spark-LLAP cluster. You can only turn off LLAP.
92
+
* Azure Data Lake Storage Gen2 can't save Jupyter notebooks in a Spark cluster.
0 commit comments