MicrosoftDocs
diff --git a/‎articles/storage/blobs/data-lake-storage-events.md
Lines changed: 116 additions & 143 deletions b/‎articles/storage/blobs/data-lake-storage-events.md
Lines changed: 116 additions & 143 deletions
diff --git a/‎articles/storage/blobs/data-lake-storage-integrate-with-services-tutorials.md
Lines changed: 1 addition & 1 deletion b/‎articles/storage/blobs/data-lake-storage-integrate-with-services-tutorials.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/storage/blobs/data-lake-storage-introduction.md
Lines changed: 3 additions & 3 deletions b/‎articles/storage/blobs/data-lake-storage-introduction.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎articles/storage/blobs/data-lake-storage-tutorial-extract-transform-load-hive.md
Lines changed: 85 additions & 82 deletions b/‎articles/storage/blobs/data-lake-storage-tutorial-extract-transform-load-hive.md
Lines changed: 85 additions & 82 deletions
@@ -6,7 +6,7 @@ author: normesta
 
 ms.topic: conceptual
 ms.author: normesta
-ms.date: 10/06/2021
+ms.date: 03/07/2023
 ms.service: storage
 ms.subservice: data-lake-storage-gen2
 ---
 
@@ -6,7 +6,7 @@ author: normesta
 
 ms.service: storage
 ms.topic: overview
-ms.date: 02/23/2022
+ms.date: 03/01/2023
 ms.author: normesta
 ms.reviewer: jamesbak
 ms.subservice: data-lake-storage-gen2
@@ -36,9 +36,9 @@ Also, Data Lake Storage Gen2 is very cost effective because it's built on top of
 
 ## Key features of Data Lake Storage Gen2
 
-- **Hadoop compatible access:** Data Lake Storage Gen2 allows you to manage and access data just as you would with a [Hadoop Distributed File System (HDFS)](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html). The new [ABFS driver](data-lake-storage-abfs-driver.md) (used to access data) is available within all Apache Hadoop environments. These environments include [Azure HDInsight](../../hdinsight/index.yml)*,* [Azure Databricks](/azure/databricks/), and [Azure Synapse Analytics](../../synapse-analytics/index.yml).
+- **Hadoop compatible access:** Data Lake Storage Gen2 allows you to manage and access data just as you would with a [Hadoop Distributed File System (HDFS)](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html). The [ABFS driver](data-lake-storage-abfs-driver.md) (used to access data) is available within all Apache Hadoop environments. These environments include [Azure HDInsight](../../hdinsight/index.yml)*,* [Azure Databricks](/azure/databricks/), and [Azure Synapse Analytics](../../synapse-analytics/index.yml).
 
-- **A superset of POSIX permissions:** The security model for Data Lake Gen2 supports ACL and POSIX permissions along with some extra granularity specific to Data Lake Storage Gen2. Settings may be configured through Storage Explorer or through frameworks like Hive and Spark.
+- **A superset of POSIX permissions:** The security model for Data Lake Gen2 supports ACL and POSIX permissions along with some extra granularity specific to Data Lake Storage Gen2. Settings can be configured by using Storage Explorer, the Azure portal, PowerShell, Azure CLI, REST APIs, Azure Storage SDKs, or by using frameworks like Hive and Spark.
 
 - **Cost-effective:** Data Lake Storage Gen2 offers low-cost storage capacity and transactions. Features such as [Azure Blob Storage lifecycle](./lifecycle-management-overview.md) optimize costs as data transitions through its lifecycle.
 
 
@@ -7,7 +7,7 @@ author: normesta
 ms.subservice: data-lake-storage-gen2
 ms.service: storage
 ms.topic: tutorial
-ms.date: 11/19/2019
+ms.date: 03/07/2023
 ms.author: normesta
 ms.reviewer: jamesbak
 #Customer intent: As an analytics user, I want to perform an ETL operation so that I can work with my data in my preferred environment.
@@ -28,38 +28,41 @@ If you don't have an Azure subscription, [create a free account](https://azure.m
 
 ## Prerequisites
 
-- **An Azure Data Lake Storage Gen2 storage account that is configured for HDInsight**
+- A storage account that has a hierarchical namespace (Azure Data Lake Storage Gen2) that is configured for HDInsight
 
-    See [Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters](../../hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2.md).
+  See [Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters](../../hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2.md).
 
-- **A Linux-based Hadoop cluster on HDInsight**
+- A Linux-based Hadoop cluster on HDInsight
+  
+  See [Quickstart: Get started with Apache Hadoop and Apache Hive in Azure HDInsight using the Azure portal](../../hdinsight/hadoop/apache-hadoop-linux-create-cluster-get-started-portal.md).
 
-    See [Quickstart: Get started with Apache Hadoop and Apache Hive in Azure HDInsight using the Azure portal](../../hdinsight/hadoop/apache-hadoop-linux-create-cluster-get-started-portal.md).
+- Azure SQL Database
 
-- **Azure SQL Database**: You use Azure SQL Database as a destination data store. If you don't have a database in SQL Database, see [Create a database in Azure SQL Database in the Azure portal](/azure/azure-sql/database/single-database-create-quickstart).
+  You use Azure SQL Database as a destination data store. If you don't have a database in SQL Database, see [Create a database in Azure SQL Database in the Azure portal](/azure/azure-sql/database/single-database-create-quickstart).
 
-- **Azure CLI**: If you haven't installed the Azure CLI, see [Install the Azure CLI](/cli/azure/install-azure-cli).
+- Azure CLI 
 
-- **A Secure Shell (SSH) client**: For more information, see [Connect to HDInsight (Hadoop) by using SSH](../../hdinsight/hdinsight-hadoop-linux-use-ssh-unix.md).
+  If you haven't installed the Azure CLI, see [Install the Azure CLI](/cli/azure/install-azure-cli).
 
+- A Secure Shell (SSH) client 
+
+  For more information, see [Connect to HDInsight (Hadoop) by using SSH](../../hdinsight/hdinsight-hadoop-linux-use-ssh-unix.md).
 
 ## Download, extract and then upload the data
 
-In this section, you'll download sample flight data. Then, you'll upload that data to your HDInsight cluster and then copy that data to your Data Lake Storage Gen2 account.
+In this section, you download sample flight data. Then, you upload that data to your HDInsight cluster and then copy that data to your Data Lake Storage Gen2 account.
 
 1. Download the [On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2016_1.zip](https://github.com/Azure-Samples/AzureStorageSnippets/blob/master/blobs/tutorials/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2016_1.zip) file. This file contains the flight data.
 
 2. Open a command prompt and use the following Secure Copy (Scp) command to upload the .zip file to the HDInsight cluster head node:
 
    ```bash
-   scp <file-name>.zip <ssh-user-name>@<cluster-name>-ssh.azurehdinsight.net:<file-name.zip>
+   scp On_Time_Reporting_Carrier_On_Time_Performance_1987_present_2016_1.zip <ssh-user-name>@<cluster-name>-ssh.azurehdinsight.net:
    ```
-
-   - Replace the `<file-name>` placeholder with the name of the .zip file.
-   - Replace the `<ssh-user-name>` placeholder with the SSH login for the HDInsight cluster.
+   - Replace the `<ssh-user-name>` placeholder with the SSH username for the HDInsight cluster.
    - Replace the `<cluster-name>` placeholder with the name of the HDInsight cluster.
 
-   If you use a password to authenticate your SSH login, you're prompted for the password.
+   If you use a password to authenticate your SSH username, you're prompted for the password.
 
    If you use a public key, you might need to use the `-i` parameter and specify the path to the matching private key. For example, `scp -i ~/.ssh/id_rsa <file_name>.zip <user-name>@<cluster-name>-ssh.azurehdinsight.net:`.
 
@@ -96,7 +99,7 @@ In this section, you'll download sample flight data. Then, you'll upload that da
 7. Use the following command to copy the *.csv* file to the directory:
 
    ```bash
-   hdfs dfs -put "<file-name>.csv" abfs://<container-name>@<storage-account-name>.dfs.core.windows.net/tutorials/flightdelays/data/
+   hdfs dfs -put "On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2016_1.csv" abfs://<container-name>@<storage-account-name>.dfs.core.windows.net/tutorials/flightdelays/data/
    ```
 
    Use quotes around the file name if the file name contains spaces or special characters.
@@ -113,71 +116,71 @@ As part of the Apache Hive job, you import the data from the .csv file into an A
    nano flightdelays.hql
    ```
 
-2. Modify the following text by replace the `<container-name>` and `<storage-account-name>` placeholders with your container and storage account name. Then copy and paste the text into the nano console by using pressing the SHIFT key along with the right-mouse click button.
+2. Modify the following text by replacing the `<container-name>` and `<storage-account-name>` placeholders with your container and storage account name. Then copy and paste the text into the nano console by using pressing the SHIFT key along with the right-mouse select button.
 
     ```hiveql
-    DROP TABLE delays_raw;
-    -- Creates an external table over the csv file
-    CREATE EXTERNAL TABLE delays_raw (
-        YEAR string,
-        FL_DATE string,
-        UNIQUE_CARRIER string,
-        CARRIER string,
-        FL_NUM string,
-        ORIGIN_AIRPORT_ID string,
-        ORIGIN string,
-        ORIGIN_CITY_NAME string,
-        ORIGIN_CITY_NAME_TEMP string,
-        ORIGIN_STATE_ABR string,
-        DEST_AIRPORT_ID string,
-        DEST string,
-        DEST_CITY_NAME string,
-        DEST_CITY_NAME_TEMP string,
-        DEST_STATE_ABR string,
-        DEP_DELAY_NEW float,
-        ARR_DELAY_NEW float,
-        CARRIER_DELAY float,
-        WEATHER_DELAY float,
-        NAS_DELAY float,
-        SECURITY_DELAY float,
-        LATE_AIRCRAFT_DELAY float)
-    -- The following lines describe the format and location of the file
-    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
-    LINES TERMINATED BY '\n'
-    STORED AS TEXTFILE
-    LOCATION 'abfs://<container-name>@<storage-account-name>.dfs.core.windows.net/tutorials/flightdelays/data';
-
-    -- Drop the delays table if it exists
-    DROP TABLE delays;
-    -- Create the delays table and populate it with data
-    -- pulled in from the CSV file (via the external table defined previously)
-    CREATE TABLE delays
-    LOCATION 'abfs://<container-name>@<storage-account-name>.dfs.core.windows.net/tutorials/flightdelays/processed'
-    AS
-    SELECT YEAR AS year,
-        FL_DATE AS flight_date,
-        substring(UNIQUE_CARRIER, 2, length(UNIQUE_CARRIER) -1) AS unique_carrier,
-        substring(CARRIER, 2, length(CARRIER) -1) AS carrier,
-        substring(FL_NUM, 2, length(FL_NUM) -1) AS flight_num,
-        ORIGIN_AIRPORT_ID AS origin_airport_id,
-        substring(ORIGIN, 2, length(ORIGIN) -1) AS origin_airport_code,
-        substring(ORIGIN_CITY_NAME, 2) AS origin_city_name,
-        substring(ORIGIN_STATE_ABR, 2, length(ORIGIN_STATE_ABR) -1)  AS origin_state_abr,
-        DEST_AIRPORT_ID AS dest_airport_id,
-        substring(DEST, 2, length(DEST) -1) AS dest_airport_code,
-        substring(DEST_CITY_NAME,2) AS dest_city_name,
-        substring(DEST_STATE_ABR, 2, length(DEST_STATE_ABR) -1) AS dest_state_abr,
-        DEP_DELAY_NEW AS dep_delay_new,
-        ARR_DELAY_NEW AS arr_delay_new,
-        CARRIER_DELAY AS carrier_delay,
-        WEATHER_DELAY AS weather_delay,
-        NAS_DELAY AS nas_delay,
-        SECURITY_DELAY AS security_delay,
-        LATE_AIRCRAFT_DELAY AS late_aircraft_delay
-    FROM delays_raw;
+      DROP TABLE delays_raw;
+      -- Creates an external table over the csv file
+      CREATE EXTERNAL TABLE delays_raw (
+         YEAR string,
+         FL_DATE string,
+         UNIQUE_CARRIER string,
+         CARRIER string,
+         FL_NUM string,
+         ORIGIN_AIRPORT_ID string,
+         ORIGIN string,
+         ORIGIN_CITY_NAME string,
+         ORIGIN_CITY_NAME_TEMP string,
+         ORIGIN_STATE_ABR string,
+         DEST_AIRPORT_ID string,
+         DEST string,
+         DEST_CITY_NAME string,
+         DEST_CITY_NAME_TEMP string,
+         DEST_STATE_ABR string,
+         DEP_DELAY_NEW float,
+         ARR_DELAY_NEW float,
+         CARRIER_DELAY float,
+         WEATHER_DELAY float,
+         NAS_DELAY float,
+         SECURITY_DELAY float,
+         LATE_AIRCRAFT_DELAY float)
+      -- The following lines describe the format and location of the file
+      ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+      LINES TERMINATED BY '\n'
+      STORED AS TEXTFILE
+      LOCATION 'abfs://<container-name>@<storage-account-name>.dfs.core.windows.net/tutorials/flightdelays/data';
+
+      -- Drop the delays table if it exists
+      DROP TABLE delays;
+      -- Create the delays table and populate it with data
+      -- pulled in from the CSV file (via the external table defined previously)
+      CREATE TABLE delays
+      LOCATION 'abfs://<container-name>@<storage-account-name>.dfs.core.windows.net/tutorials/flightdelays/processed'
+      AS
+      SELECT YEAR AS year,
+         FL_DATE AS FlightDate, 
+         substring(UNIQUE_CARRIER, 2, length(UNIQUE_CARRIER) -1) AS IATA_CODE_Reporting_Airline,
+         substring(CARRIER, 2, length(CARRIER) -1) AS Reporting_Airline, 
+         substring(FL_NUM, 2, length(FL_NUM) -1) AS Flight_Number_Reporting_Airline,
+         ORIGIN_AIRPORT_ID AS OriginAirportID,
+         substring(ORIGIN, 2, length(ORIGIN) -1) AS OriginAirportSeqID,
+         substring(ORIGIN_CITY_NAME, 2) AS OriginCityName,
+         substring(ORIGIN_STATE_ABR, 2, length(ORIGIN_STATE_ABR) -1)  AS OriginState,
+         DEST_AIRPORT_ID AS DestAirportID,
+         substring(DEST, 2, length(DEST) -1) AS DestAirportSeqID,
+         substring(DEST_CITY_NAME,2) AS DestCityName,
+         substring(DEST_STATE_ABR, 2, length(DEST_STATE_ABR) -1) AS DestState,
+         DEP_DELAY_NEW AS DepDelay,
+         ARR_DELAY_NEW AS ArrDelay,
+         CARRIER_DELAY AS CarrierDelay,
+         WEATHER_DELAY AS WeatherDelay,
+         NAS_DELAY AS NASDelay,
+         SECURITY_DELAY AS SecurityDelay,
+         LATE_AIRCRAFT_DELAY AS LateAircraftDelay
+      FROM delays_raw;
     ```
 
-3. Save the file by using use CTRL+X and then type `Y` when prompted.
+3. Save the file by typing CTRL+X and then typing `Y` when prompted.
 
 4. To start Hive and run the `flightdelays.hql` file, use the following command:
 
@@ -196,11 +199,11 @@ As part of the Apache Hive job, you import the data from the .csv file into an A
     ```hiveql
     INSERT OVERWRITE DIRECTORY '/tutorials/flightdelays/output'
     ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
-    SELECT regexp_replace(origin_city_name, '''', ''),
-        avg(weather_delay)
+    SELECT regexp_replace(OriginCityName, '''', ''),
+      avg(WeatherDelay)
     FROM delays
-    WHERE weather_delay IS NOT NULL
-    GROUP BY origin_city_name;
+    WHERE WeatherDelay IS NOT NULL
+    GROUP BY OriginCityName;
     ```
 
    This query retrieves a list of cities that experienced weather delays, along with the average delay time, and saves it to `abfs://<container-name>@<storage-account-name>.dfs.core.windows.net/tutorials/flightdelays/output`. Later, Sqoop reads the data from this location and exports it to Azure SQL Database.
@@ -237,11 +240,11 @@ You need the server name from SQL Database for this operation. Complete these st
 
    - Replace the `<server-name>` placeholder with the logical SQL server name.
 
-   - Replace the `<admin-login>` placeholder with the admin login for SQL Database.
+   - Replace the `<admin-login>` placeholder with the admin username for SQL Database.
 
    - Replace the `<database-name>` placeholder with the database name
 
-   When you're prompted, enter the password for the SQL Database admin login.
+   When you're prompted, enter the password for the SQL Database admin username.
 
    You receive output similar to the following text: