You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/storage/blobs/data-lake-storage-introduction.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: normesta
6
6
7
7
ms.service: storage
8
8
ms.topic: overview
9
-
ms.date: 02/23/2022
9
+
ms.date: 03/01/2023
10
10
ms.author: normesta
11
11
ms.reviewer: jamesbak
12
12
ms.subservice: data-lake-storage-gen2
@@ -36,9 +36,9 @@ Also, Data Lake Storage Gen2 is very cost effective because it's built on top of
36
36
37
37
## Key features of Data Lake Storage Gen2
38
38
39
-
-**Hadoop compatible access:** Data Lake Storage Gen2 allows you to manage and access data just as you would with a [Hadoop Distributed File System (HDFS)](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html). The new [ABFS driver](data-lake-storage-abfs-driver.md) (used to access data) is available within all Apache Hadoop environments. These environments include [Azure HDInsight](../../hdinsight/index.yml)*,*[Azure Databricks](/azure/databricks/), and [Azure Synapse Analytics](../../synapse-analytics/index.yml).
39
+
-**Hadoop compatible access:** Data Lake Storage Gen2 allows you to manage and access data just as you would with a [Hadoop Distributed File System (HDFS)](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html). The [ABFS driver](data-lake-storage-abfs-driver.md) (used to access data) is available within all Apache Hadoop environments. These environments include [Azure HDInsight](../../hdinsight/index.yml)*,*[Azure Databricks](/azure/databricks/), and [Azure Synapse Analytics](../../synapse-analytics/index.yml).
40
40
41
-
-**A superset of POSIX permissions:** The security model for Data Lake Gen2 supports ACL and POSIX permissions along with some extra granularity specific to Data Lake Storage Gen2. Settings may be configured through Storage Exploreror through frameworks like Hive and Spark.
41
+
-**A superset of POSIX permissions:** The security model for Data Lake Gen2 supports ACL and POSIX permissions along with some extra granularity specific to Data Lake Storage Gen2. Settings can be configured by using Storage Explorer, the Azure portal, PowerShell, Azure CLI, REST APIs, Azure Storage SDKs, or by using frameworks like Hive and Spark.
42
42
43
43
-**Cost-effective:** Data Lake Storage Gen2 offers low-cost storage capacity and transactions. Features such as [Azure Blob Storage lifecycle](./lifecycle-management-overview.md) optimize costs as data transitions through its lifecycle.
Copy file name to clipboardExpand all lines: articles/storage/blobs/data-lake-storage-tutorial-extract-transform-load-hive.md
+78-75Lines changed: 78 additions & 75 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ author: normesta
7
7
ms.subservice: data-lake-storage-gen2
8
8
ms.service: storage
9
9
ms.topic: tutorial
10
-
ms.date: 11/19/2019
10
+
ms.date: 03/01/2023
11
11
ms.author: normesta
12
12
ms.reviewer: jamesbak
13
13
#Customer intent: As an analytics user, I want to perform an ETL operation so that I can work with my data in my preferred environment.
@@ -28,20 +28,25 @@ If you don't have an Azure subscription, [create a free account](https://azure.m
28
28
29
29
## Prerequisites
30
30
31
-
-**An Azure Data Lake Storage Gen2 storage account that is configured for HDInsight**
31
+
-A storage account that has a hierarchical namespace (Azure Data Lake Storage Gen2) that is configured for HDInsight
32
32
33
-
See [Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters](../../hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2.md).
33
+
See [Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters](../../hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2.md).
34
34
35
-
-**A Linux-based Hadoop cluster on HDInsight**
35
+
- A Linux-based Hadoop cluster on HDInsight
36
+
37
+
See [Quickstart: Get started with Apache Hadoop and Apache Hive in Azure HDInsight using the Azure portal](../../hdinsight/hadoop/apache-hadoop-linux-create-cluster-get-started-portal.md).
36
38
37
-
See [Quickstart: Get started with Apache Hadoop and Apache Hive in Azure HDInsight using the Azure portal](../../hdinsight/hadoop/apache-hadoop-linux-create-cluster-get-started-portal.md).
39
+
-Azure SQL Database
38
40
39
-
-**Azure SQL Database**: You use Azure SQL Database as a destination data store. If you don't have a database in SQL Database, see [Create a database in Azure SQL Database in the Azure portal](/azure/azure-sql/database/single-database-create-quickstart).
41
+
You'll use Azure SQL Database as a destination data store. If you don't have a database in SQL Database, see [Create a database in Azure SQL Database in the Azure portal](/azure/azure-sql/database/single-database-create-quickstart).
40
42
41
-
-**Azure CLI**: If you haven't installed the Azure CLI, see [Install the Azure CLI](/cli/azure/install-azure-cli).
43
+
- Azure CLI
42
44
43
-
-**A Secure Shell (SSH) client**: For more information, see [Connect to HDInsight (Hadoop) by using SSH](../../hdinsight/hdinsight-hadoop-linux-use-ssh-unix.md).
45
+
If you haven't installed the Azure CLI, see [Install the Azure CLI](/cli/azure/install-azure-cli).
44
46
47
+
- A Secure Shell (SSH) client
48
+
49
+
For more information, see [Connect to HDInsight (Hadoop) by using SSH](../../hdinsight/hdinsight-hadoop-linux-use-ssh-unix.md).
45
50
46
51
## Download, extract and then upload the data
47
52
@@ -52,10 +57,8 @@ In this section, you'll download sample flight data. Then, you'll upload that da
52
57
2. Open a command prompt and use the following Secure Copy (Scp) command to upload the .zip file to the HDInsight cluster head node:
Use quotes around the file name if the file name contains spaces or special characters.
@@ -116,65 +119,65 @@ As part of the Apache Hive job, you import the data from the .csv file into an A
116
119
2. Modify the following text by replace the `<container-name>` and `<storage-account-name>` placeholders with your container and storage account name. Then copy and paste the text into the nano console by using pressing the SHIFT key along with the right-mouse click button.
117
120
118
121
```hiveql
119
-
DROP TABLE delays_raw;
120
-
-- Creates an external table over the csv file
121
-
CREATE EXTERNAL TABLE delays_raw (
122
-
YEAR string,
123
-
FL_DATE string,
124
-
UNIQUE_CARRIER string,
125
-
CARRIER string,
126
-
FL_NUM string,
127
-
ORIGIN_AIRPORT_ID string,
128
-
ORIGIN string,
129
-
ORIGIN_CITY_NAME string,
130
-
ORIGIN_CITY_NAME_TEMP string,
131
-
ORIGIN_STATE_ABR string,
132
-
DEST_AIRPORT_ID string,
133
-
DEST string,
134
-
DEST_CITY_NAME string,
135
-
DEST_CITY_NAME_TEMP string,
136
-
DEST_STATE_ABR string,
137
-
DEP_DELAY_NEW float,
138
-
ARR_DELAY_NEW float,
139
-
CARRIER_DELAY float,
140
-
WEATHER_DELAY float,
141
-
NAS_DELAY float,
142
-
SECURITY_DELAY float,
143
-
LATE_AIRCRAFT_DELAY float)
144
-
-- The following lines describe the format and location of the file
This query retrieves a list of cities that experienced weather delays, along with the average delay time, and saves it to `abfs://<container-name>@<storage-account-name>.dfs.core.windows.net/tutorials/flightdelays/output`. Later, Sqoop reads the data from this location and exports it to Azure SQL Database.
0 commit comments