Clarifying info on EMR cluster updates page, copyediting data lakes setup page

forstisabella · forstisabella · commit 515ad92dab7f · 2021-10-28T11:15:49.000-04:00
diff --git a/src/connections/storage/data-lakes/data-lakes-manual-setup.md b/src/connections/storage/data-lakes/data-lakes-manual-setup.md
@@ -13,7 +13,8 @@ The instructions below will guide you through the process required to configure
 In this step, you'll create the S3 bucket that will store both the intermediate and final data. For instructions on creating an S3 bucket, please see Amazon's documentation, [Create your first S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html).
 
 > info ""
-> Take note of the S3 bucket name you set in this step, as the rest of the set up flow requires it. In these instructions, the name is `segment-data-lake`.
+> Take note of the S3 bucket name you set in this step, as the rest of the set up flow requires it. 
+<!--- In these instructions, the name is `segment-data-lake`. --->
 
 After you create your S3 bucket, create a lifecycle rule for the bucket and set it to expire staging data after **14 days**. For help on setting lifecycle configurations, see Amazon's documentation, [Setting lifecycle configuration on a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-set-lifecycle-configuration-intro.html).
 
@@ -32,7 +33,7 @@ Segment requires access to an EMR cluster to perform necessary data processing.
 
 1. Select EMR from the AWS console by navigating to Services > Analytics > EMR.
 2. Click **Create Cluster**, and select **Go to advanced options**.
-3. In Advanced Options, on Step 1: Software and Steps, select the `emr-5.33.0` release and the following software libraries:
+3. In Advanced Options, on Step 1: Software and Steps, select the `emr-5.33.0` release and the following applications:
     - Hadoop 2.10.1
     - Hive 2.3.7
     - Hue 4.9.0
@@ -43,8 +44,7 @@ Segment requires access to an EMR cluster to perform necessary data processing.
     - Use for Spark table metadata
     <!--- ![Select to use for both Have and Spark table metadata](images/02_hive-spark-table.png) --->
 5. Select **Next** to move to Step 2: Hardware.
-6. Under the Networking section, select a Network and EC2 Subnet for your EMR instance. You can create EMR instances in either a public or private subnet. Creating the cluster in a private subnet is more secure, but requires additional configuration, while creating a cluster in a public subnet makes it accessible from the Internet. You can configure strict security groups to prevent inbound access to the cluster. See Amazon's document, [Amazon VPC Options - Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-clusters-in-a-vpc.html) for more information. <br />
-As a best practice, Segment recommends that you consult with your network and security teams before you configure your EMR cluster.
+6. Under the Networking section, select a Network and EC2 Subnet for your EMR instance. EMR instances can be created in either a public or private subnet. Creating the cluster in a private subnet is more secure, but requires additional configuration, while creating a cluster in a public subnet leaves it accessible from the Internet. You can configure strict security groups for EMR clusters on public subnets to prevent inbound access. See Amazon's document, [Amazon VPC Options - Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-clusters-in-a-vpc.html) for more information. As a best practice, Segment recommends that you consult with your network and security teams before you configure your EMR cluster.
 
 7. In the Hardware Configuration section, create a cluster that includes the following on-demand nodes:
    - **1** master node
@@ -55,12 +55,12 @@ For more information about configuring cluster hardware and networking, see Amaz
 
 8. Select **Next** to proceed to Step 3: General Cluster Settings.
 
-
 ### Configure logging
 
-9. On Step 3: General Cluster Settings, configure logging to use the same S3 bucket you configured as the destination for the final data (`segment-data-lakes` in this case). Once configured, logs are given a new prefix, and separated from the final processed data.
+9. On Step 3: General Cluster Settings, configure logging to use the same S3 bucket you configured as the destination for the final data. Once configured, logs are assigned a new prefix and separated from the final processed data.
+<!--- (`segment-data-lakes` in this case) --->
 
-10. Add a new key-value pair to the Tags section, a **vendor** key with a value of `segment`. The IAM policy uses this to provide Segment access to submit jobs in the EMR cluster.
+10. Add a new key-value pair to the Tags section, a **vendor** key with a value of **segment**. The IAM policy uses this to provide Segment access to submit jobs in the EMR cluster.
 
 11. Select **Next** to proceed to Step 4: Security.
 
@@ -77,7 +77,7 @@ For more information about configuring cluster hardware and networking, see Amaz
 The image uses the default settings. You can make these settings more restrictive, if required. --->
 
 > note ""
-> If you are updating your Data Lakes instance, take note of the EMR cluster ID. 
+> **NOTE:** If you are updating the EMR cluster for your Data Lakes instance, note the EMR cluster ID. 
 
 ## Step 3 - Create an Access Management role and policy
 
diff --git a/src/connections/storage/data-lakes/upgrade-emr-cluster.md b/src/connections/storage/data-lakes/upgrade-emr-cluster.md
@@ -1,42 +1,25 @@
-# Upgrading Data Lakes
+---
+hidden: true
+title: Upgrading EMR Clusters
+---
+{% include content/plan-grid.md name="data-lakes" %}
 
+# Upgrading EMR Clusters
 This document contains the instructions to manually update an existing Segment
-Data Lake destination to use a new EMR cluster with version 5.33.0. The Segment Data Lake on the new version will continue to use the Glue data catalog you have previously configured. 
+Data Lake destination to use a new v5.33.0 EMR cluster. The Segment Data Lake on the new version will continue to use the Glue data catalog you have previously configured.
 
-The Segment Data Lake does not need to be disabled during the upgrade process, and any ongoing syncs will complete on the old cluster. 
+By updating your EMR cluster from 5.27.0 to 5.33.0, you can participate in [AWS Lake Formation](https://aws.amazon.com/lake-formation/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc). Clusters running version 5.33.0 also allow for faster Parquet jobs and dynamic auto-scaling. 
 
-<!--- Any existing EMR clusters will 
-
-What happens to the existing EMR cluster? If there’s an ongoing sync, what will
-happen to that?
-If there is an ongoing sync in the existing cluster, the sync will complete (success/
-fail) in the existing cluster. If the sync ends up failing and if the cluster setting has
-been updated to use the new cluster, the next retry will be performed in the new
-cluster.
-. Does one need to stop a sync or disable the Segment Data Lake when
-performing this update?
-No, on-going syncs don’t need not be stopped nor Segment Data Lake needs to be
-disabled. We will automatically restart any failed sync on the new cluster so there
-should not be any manual intervention required.
-
-. When can the customer safely delete the old EMR cluster?
-The old EMR cluster could be deleted after all the Segment Data Lakes have been
-updated to use the new cluster and the old EMR cluster doesn’t have any on-going
-syncs. General recommendation is
-Update EMR cluster setting in all the Segment Data Lakes
-Wait for the next sync to be started and completed in the new cluster
-Confirm new data is synced using the new cluster
-Confirm no on-going jobs in the old cluster
-Delete the old cluster --->
+> info""
+> Your Segment Data Lake does not need to be disabled during the upgrade process, and any ongoing syncs will complete on the old cluster. Any syncs that fail while you are setting up a new EMR cluster will be restarted on the new cluster.
 
 ## Prerequisites
 * S3 bucket with a lifecycle rule of 14 days 
-* An EMR cluster version 5.33.0 (for instructions)
-* The ID of your EMR Cluster
+* An EMR cluster version 5.33.0 (for help creating an v 5.33.0 EMR cluster, please see [Configure the Data Lakes AWS Environment](data-lakes-manual-setup.md))
 
 ## Procedure
 1. Open your Segment App workspace and select your Data Lakes destination. 
-2. On the Settings tab, select EMR Cluster ID field and enter your EMR ID. For more information about your EMR Cluster, please see Amazon's [View cluster status and details](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-clusters.html) documentation. <br/>
+2. On the Settings tab, select EMR Cluster ID field and enter the ID of your new EMR cluster. For more information about your EMR Cluster, please see Amazon's [View cluster status and details](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-clusters.html) documentation. <br/>
 **Note:** Your Glue Catalog ID, IAM Role ARN, and Glue database name should remain the same.
 3. Select **Save**.
 4. You can delete your old EMR cluster from AWS when the following conditions have been met: