adding toc entries

hsrasheed · hsrasheed · commit 123974657207 · 2020-05-18T18:19:32.000-05:00
diff --git a/articles/hdinsight/TOC.yml b/articles/hdinsight/TOC.yml
@@ -328,8 +328,18 @@
         href: ./spark/spark-best-practices.md
       - name: Configure Apache Spark settings
         href: ./spark/apache-spark-settings.md
-      - name: Optimize Apache Spark jobs
-        href: ./spark/apache-spark-perf.md
+      - name: Optimization
+        items:
+        - name: Optimize Apache Spark jobs
+          href: ./spark/apache-spark-perf.md
+        - name: Optimize data processing
+          href: ./spark/optimize-data-processing.md
+        - name: Optimize data storage
+          href: ./spark/optimize-data-storage.md
+        - name: Optimize memory usage
+          href: ./spark/optimize-memory-usage.md
+        - name: Optimize cluster configuration
+          href: ./spark/optimize-cluster-configuration.md
   - name: How to
     items:
     - name: Use tools
diff --git a/articles/hdinsight/spark/optimize-cluster-configuration.md b/articles/hdinsight/spark/optimize-cluster-configuration.md
@@ -9,7 +9,7 @@ ms.topic: conceptual
 ms.custom: hdinsightactive,seomay2020
 ms.date: 05/18/2020
 ---
-# Optimize cluster configuration
+# Cluster configuration optimization
 
 Depending on your Spark cluster workload, you may determine  a non-default Spark configuration would result in more optimized Spark job execution.  Do benchmark testing with sample workloads to validate any non-default cluster configurations.
 
diff --git a/articles/hdinsight/spark/optimize-data-storage.md b/articles/hdinsight/spark/optimize-data-storage.md
@@ -9,7 +9,17 @@ ms.topic: conceptual
 ms.custom: hdinsightactive,seomay2020
 ms.date: 05/18/2020
 ---
-# Optimize data storage
+# Data storage optimization
+
+This article discusses strategies to optimize data storage for efficient Apache Spark job execution.
+
+## Use optimal data format
+
+Spark supports many formats, such as csv, json, xml, parquet, orc, and avro. Spark can be extended to support many more formats with external data sources - for more information, see [Apache Spark packages](https://spark-packages.org).
+
+The best format for performance is parquet with *snappy compression*, which is the default in Spark 2.x. Parquet stores data in columnar format, and is highly optimized in Spark.
+
+## Choose data abstraction
 
 Earlier Spark versions use RDDs to abstract data, Spark 1.3, and 1.6 introduced DataFrames and DataSets, respectively. Consider the following relative merits:
 
@@ -35,12 +45,6 @@ Earlier Spark versions use RDDs to abstract data, Spark 1.3, and 1.6 introduced
     * High GC overhead.
     * Must use Spark 1.x legacy APIs.
 
-## Use optimal data format
-
-Spark supports many formats, such as csv, json, xml, parquet, orc, and avro. Spark can be extended to support many more formats with external data sources - for more information, see [Apache Spark packages](https://spark-packages.org).
-
-The best format for performance is parquet with *snappy compression*, which is the default in Spark 2.x. Parquet stores data in columnar format, and is highly optimized in Spark.
-
 ## Select default storage
 
 When you create a new Spark cluster, you can select Azure Blob Storage or Azure Data Lake Storage as your cluster's default storage. Both options give you the benefit of long-term storage for transient clusters. So your data doesn't get automatically deleted when you delete your cluster. You can recreate a transient cluster and still access your data.
diff --git a/articles/hdinsight/spark/optimize-memory-usage.md b/articles/hdinsight/spark/optimize-memory-usage.md
@@ -9,7 +9,7 @@ ms.topic: conceptual
 ms.custom: hdinsightactive,seomay2020
 ms.date: 05/18/2020
 ---
-# Optimize memory usage
+# Memory usage optimization
 
 Spark operates by placing data in memory. So managing memory resources is a key aspect of optimizing the execution of Spark jobs.  There are several techniques you can apply to use your cluster's memory efficiently.
 
@@ -20,7 +20,7 @@ Spark operates by placing data in memory. So managing memory resources is a key
 
 For your reference, the Spark memory structure and some key executor memory parameters are shown in the next image.
 
-### Spark memory considerations
+## Spark memory considerations
 
 If you're using Apache Hadoop YARN, then YARN controls the memory used by all containers on each Spark node.  The following diagram shows the key objects and their relationships.