You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/apache-spark-improve-performance-iocache.md
+9-11Lines changed: 9 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.author: hrasheed
6
6
ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.topic: conceptual
9
-
ms.date: 10/29/2019
9
+
ms.date: 12/23/2019
10
10
---
11
11
12
12
# Improve performance of Apache Spark workloads using Azure HDInsight IO Cache
@@ -17,7 +17,7 @@ Most SSDs provide more than 1 GByte per second of bandwidth. This bandwidth, com
17
17
18
18
> [!Note]
19
19
> IO Cache currently uses RubiX as a caching component, but this may change in future versions of the service. Please use IO Cache interfaces and don't take any dependencies directly on the RubiX implementation.
20
-
>IO Cache is only supported with Azure BLOB Storage at this time.
20
+
>IO Cache is only supported with Azure BLOB Storage at this time.
21
21
22
22
## Benefits of Azure HDInsight IO Cache
23
23
@@ -27,21 +27,19 @@ You don't have to make any changes to your Spark jobs to see performance increas
27
27
28
28
## Getting started
29
29
30
-
Azure HDInsight IO Cache is deactivated by default in preview. IO Cache is available on Azure HDInsight 3.6+ Spark clusters, which run Apache Spark 2.3. To activate IO Cache, do the following:
30
+
Azure HDInsight IO Cache is deactivated by default in preview. IO Cache is available on Azure HDInsight 3.6+ Spark clusters, which run Apache Spark 2.3. To activate IO Cache on HDInsight 4.0, do the following steps:
31
31
32
-
1. Select your HDInsight cluster in [the Azure portal](https://portal.azure.com).
33
-
34
-
1. In the **Overview** page (opened by default when you select the cluster) select **Ambari Home** under **Cluster dashboards**.
32
+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster.
35
33
36
34
1. Select the **IO Cache** service on the left.
37
35
38
-
1. Select **Actions** and **Activate**.
36
+
1. Select **Actions**(**Service Actions** in HDI 3.6) and **Activate**.
39
37
40
38

41
39
42
40
1. Confirm restart of all the affected services on the cluster.
43
41
44
-
>[!NOTE]
42
+
>[!NOTE]
45
43
> Even though the progress bar shows activated, IO Cache isn't actually enabled until you restart the other affected services.
46
44
47
45
## Troubleshooting
@@ -66,12 +64,12 @@ You may get disk space errors running Spark jobs after enabling IO Cache. These
66
64
67
65
1. Select **Restart** > **Restart All Affected**.
68
66
69
-

67
+

70
68
71
69
1. Select **Confirm Restart All**.
72
70
73
-
If that does not work, disable IO Cache.
71
+
If that doesn't work, disable IO Cache.
74
72
75
73
## Next Steps
76
74
77
-
-Read more about IO Cache, including performance benchmarks in this blog post: [Apache Spark jobs gain up to 9x speed up with HDInsight IO Cache](https://azure.microsoft.com/blog/apache-spark-speedup-with-hdinsight-io-cache/)
75
+
Read more about IO Cache, including performance benchmarks in this blog post: [Apache Spark jobs gain up to 9x speed up with HDInsight IO Cache](https://azure.microsoft.com/blog/apache-spark-speedup-with-hdinsight-io-cache/)
0 commit comments