Skip to content

Commit 17d8ec3

Browse files
authored
Added a new note
Spark 3.1.2 doesn’t support IO Cache in HDInsight 5.0
1 parent 6c2f44a commit 17d8ec3

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

articles/hdinsight/spark/apache-spark-improve-performance-iocache.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,14 @@ title: Apache Spark performance - Azure HDInsight IO Cache (Preview)
33
description: Learn about Azure HDInsight IO Cache and how to use it to improve Apache Spark performance.
44
ms.service: hdinsight
55
ms.topic: how-to
6-
ms.date: 05/26/2022
6+
ms.date: 10/29/2022
77
---
88

99
# Improve performance of Apache Spark workloads using Azure HDInsight IO Cache
1010

11+
> [!NOTE]
12+
> Spark 3.1.2 doesn’t support IO Cache in HDInsight 5.0
13+
1114
IO Cache is a data caching service for Azure HDInsight that improves the performance of Apache Spark jobs. IO Cache also works with [Apache TEZ](https://tez.apache.org/) and [Apache Hive](https://hive.apache.org/) workloads, which can be run on [Apache Spark](https://spark.apache.org/) clusters. IO Cache uses an open-source caching component called RubiX. RubiX is a local disk cache for use with big data analytics engines that access data from cloud storage systems. RubiX is unique among caching systems, because it uses Solid-State Drives (SSDs) rather than reserve operating memory for caching purposes. The IO Cache service launches and manages RubiX Metadata Servers on each worker node of the cluster. It also configures all services of the cluster for transparent use of RubiX cache.
1215

1316
Most SSDs provide more than 1 GByte per second of bandwidth. This bandwidth, complemented by the operating system in-memory file cache, provides enough bandwidth to load big data compute processing engines, such as Apache Spark. The operating memory is left available for Apache Spark to process heavily memory-dependent tasks, such as shuffles. Having exclusive use of operating memory allows Apache Spark to achieve optimal resource usage.

0 commit comments

Comments
 (0)