Skip to content

Commit 550862f

Browse files
authored
Merge pull request #99661 from dagiro/freshness134
freshness134
2 parents 15a8779 + 1c3f278 commit 550862f

File tree

1 file changed

+9
-11
lines changed

1 file changed

+9
-11
lines changed

articles/hdinsight/spark/apache-spark-improve-performance-iocache.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
9-
ms.date: 10/29/2019
9+
ms.date: 12/23/2019
1010
---
1111

1212
# Improve performance of Apache Spark workloads using Azure HDInsight IO Cache
@@ -17,7 +17,7 @@ Most SSDs provide more than 1 GByte per second of bandwidth. This bandwidth, com
1717

1818
> [!Note]
1919
> IO Cache currently uses RubiX as a caching component, but this may change in future versions of the service. Please use IO Cache interfaces and don't take any dependencies directly on the RubiX implementation.
20-
>IO Cache is only supported with Azure BLOB Storage at this time.
20+
>IO Cache is only supported with Azure BLOB Storage at this time.
2121
2222
## Benefits of Azure HDInsight IO Cache
2323

@@ -27,21 +27,19 @@ You don't have to make any changes to your Spark jobs to see performance increas
2727

2828
## Getting started
2929

30-
Azure HDInsight IO Cache is deactivated by default in preview. IO Cache is available on Azure HDInsight 3.6+ Spark clusters, which run Apache Spark 2.3. To activate IO Cache, do the following:
30+
Azure HDInsight IO Cache is deactivated by default in preview. IO Cache is available on Azure HDInsight 3.6+ Spark clusters, which run Apache Spark 2.3. To activate IO Cache on HDInsight 4.0, do the following steps:
3131

32-
1. Select your HDInsight cluster in [the Azure portal](https://portal.azure.com).
33-
34-
1. In the **Overview** page (opened by default when you select the cluster) select **Ambari Home** under **Cluster dashboards**.
32+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster.
3533

3634
1. Select the **IO Cache** service on the left.
3735

38-
1. Select **Actions** and **Activate**.
36+
1. Select **Actions** (**Service Actions** in HDI 3.6) and **Activate**.
3937

4038
![Enabling the IO Cache service in Ambari](./media/apache-spark-improve-performance-iocache/ambariui-enable-iocache.png "Enabling the IO Cache service in Ambari")
4139

4240
1. Confirm restart of all the affected services on the cluster.
4341

44-
>[!NOTE]
42+
> [!NOTE]
4543
> Even though the progress bar shows activated, IO Cache isn't actually enabled until you restart the other affected services.
4644
4745
## Troubleshooting
@@ -66,12 +64,12 @@ You may get disk space errors running Spark jobs after enabling IO Cache. These
6664

6765
1. Select **Restart** > **Restart All Affected**.
6866

69-
![Apache Ambari restart all affected](./media/apache-spark-improve-performance-iocache/ambariui-restart-all-affected.png "Restart all affected")
67+
![Apache Ambari restarts all affected](./media/apache-spark-improve-performance-iocache/ambariui-restart-all-affected.png "Restart all affected")
7068

7169
1. Select **Confirm Restart All**.
7270

73-
If that does not work, disable IO Cache.
71+
If that doesn't work, disable IO Cache.
7472

7573
## Next Steps
7674

77-
- Read more about IO Cache, including performance benchmarks in this blog post: [Apache Spark jobs gain up to 9x speed up with HDInsight IO Cache](https://azure.microsoft.com/blog/apache-spark-speedup-with-hdinsight-io-cache/)
75+
Read more about IO Cache, including performance benchmarks in this blog post: [Apache Spark jobs gain up to 9x speed up with HDInsight IO Cache](https://azure.microsoft.com/blog/apache-spark-speedup-with-hdinsight-io-cache/)

0 commit comments

Comments
 (0)