Skip to content

Commit 5c015d5

Browse files
authored
Merge pull request #78478 from dagiro/freshness106
freshness106
2 parents 3d71fbc + 5d6cab5 commit 5c015d5

File tree

5 files changed

+21
-28
lines changed

5 files changed

+21
-28
lines changed

articles/hdinsight/hdinsight-hadoop-compare-storage-options.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ HDInsight provides access to the distributed file system that is locally attache
131131

132132
Through HDInsight you can also access data in Azure Storage. The syntax is as follows:
133133

134-
wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
134+
wasb://<containername>@<accountname>.blob.core.windows.net/<path>
135135

136136
Consider the following principles when using an Azure Storage account with HDInsight clusters:
137137

articles/hdinsight/hdinsight-hadoop-customize-cluster-linux.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ A script action is Bash script that runs on the nodes in an HDInsight cluster. C
5454

5555
* For clusters with ESP:
5656

57-
* The wasb[s]:// or http[s]:// URIs are supported.
57+
* The wasb:// or wasbs:// or http[s]:// URIs are supported.
5858

5959
* Can be restricted to run on only certain node types. Examples are head nodes or worker nodes.
6060

articles/hdinsight/hdinsight-hadoop-use-blob-storage.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ HDInsight provides access to the distributed file system that is locally attache
4848

4949
In addition, HDInsight allows you to access data that is stored in Azure Storage. The syntax is:
5050

51-
wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
51+
wasb://<containername>@<accountname>.blob.core.windows.net/<path>
5252

5353
Here are some considerations when using Azure Storage account with HDInsight clusters.
5454

@@ -91,7 +91,7 @@ Certain MapReduce jobs and packages may create intermediate results that you don
9191
The URI scheme for accessing files in Azure storage from HDInsight is:
9292

9393
```config
94-
wasb[s]://<BlobStorageContainerName>@<StorageAccountName>.blob.core.windows.net/<path>
94+
wasb://<BlobStorageContainerName>@<StorageAccountName>.blob.core.windows.net/<path>
9595
```
9696

9797
The URI scheme provides unencrypted access (with the *wasb:* prefix) and SSL encrypted access (with *wasbs*). We recommend using *wasbs* wherever possible, even when accessing data that lives inside the same region in Azure.

articles/hdinsight/hdinsight-upload-data.md

Lines changed: 16 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,27 @@
11
---
22
title: Upload data for Apache Hadoop jobs in HDInsight
33
description: Learn how to upload and access data for Apache Hadoop jobs in HDInsight using the Azure classic CLI, Azure Storage Explorer, Azure PowerShell, the Hadoop command line, or Sqoop.
4-
keywords: etl hadoop, getting data into hadoop, hadoop load data
54
author: hrasheed-msft
6-
ms.reviewer: jasonh
75
ms.author: hrasheed
6+
ms.reviewer: jasonh
87
ms.service: hdinsight
9-
ms.custom: hdinsightactive,hdiseo17may2017
8+
ms.custom: hdiseo17may2017
109
ms.topic: conceptual
11-
ms.date: 02/08/2019
10+
ms.date: 06/03/2019
1211
---
12+
1313
# Upload data for Apache Hadoop jobs in HDInsight
1414

15-
Azure HDInsight provides a full-featured Hadoop distributed file system (HDFS) over Azure Storage and Azure Data Lake Storage (Gen1 and Gen2). Azure Storage and Data Lake Storage Gen1 and Gen2 are designed as HDFS extensions to provide a seamless experience to customers. They enable the full set of components in the Hadoop ecosystem to operate directly on the data it manages. Azure Storage, Data Lake Storage Gen1, and Gen2 are distinct file systems that are optimized for storage of data and computations on that data. For information about the benefits of using Azure Storage, see [Use Azure Storage with HDInsight][hdinsight-storage], [Use Data Lake Storage Gen1 with HDInsight](hdinsight-hadoop-use-data-lake-store.md), and [Use Data Lake Storage Gen2 with HDInsight](hdinsight-hadoop-use-data-lake-storage-gen2.md).
15+
Azure HDInsight provides a full-featured Hadoop distributed file system (HDFS) over Azure Storage and Azure Data Lake Storage (Gen1 and Gen2). Azure Storage and Data Lake Storage Gen1 and Gen2 are designed as HDFS extensions to provide a seamless experience to customers. They enable the full set of components in the Hadoop ecosystem to operate directly on the data it manages. Azure Storage, Data Lake Storage Gen1, and Gen2 are distinct file systems that are optimized for storage of data and computations on that data. For information about the benefits of using Azure Storage, see [Use Azure Storage with HDInsight](hdinsight-hadoop-use-blob-storage.md), [Use Data Lake Storage Gen1 with HDInsight](hdinsight-hadoop-use-data-lake-store.md), and [Use Data Lake Storage Gen2 with HDInsight](hdinsight-hadoop-use-data-lake-storage-gen2.md).
1616

1717
## Prerequisites
1818

1919
Note the following requirements before you begin:
2020

21-
* An Azure HDInsight cluster. For instructions, see [Get started with Azure HDInsight][hdinsight-get-started] or [Create HDInsight clusters](hdinsight-hadoop-provision-linux-clusters.md).
21+
* An Azure HDInsight cluster. For instructions, see [Get started with Azure HDInsight](hadoop/apache-hadoop-linux-tutorial-get-started.md) or [Create HDInsight clusters](hdinsight-hadoop-provision-linux-clusters.md).
2222
* Knowledge of the following articles:
2323

24-
- [Use Azure Storage with HDInsight][hdinsight-storage]
24+
- [Use Azure Storage with HDInsight](hdinsight-hadoop-use-blob-storage.md)
2525
- [Use Data Lake Storage Gen1 with HDInsight](hdinsight-hadoop-use-data-lake-store.md)
2626
- [Use Data Lake Storage Gen2 with HDInsight](hdinsight-hadoop-use-data-lake-storage-gen2.md)
2727

@@ -58,11 +58,11 @@ For example, `hadoop fs -copyFromLocal data.txt /example/data/data.txt`
5858

5959
Because the default file system for HDInsight is in Azure Storage, /example/data.txt is actually in Azure Storage. You can also refer to the file as:
6060

61-
wasb:///example/data/data.txt
61+
wasbs:///example/data/data.txt
6262

6363
or
6464

65-
wasb://<ContainerName>@<StorageAccountName>.blob.core.windows.net/example/data/davinci.txt
65+
wasbs://<ContainerName>@<StorageAccountName>.blob.core.windows.net/example/data/davinci.txt
6666

6767
For a list of other Hadoop commands that work with files, see [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html)
6868

@@ -98,7 +98,7 @@ The Azure Data Factory service is a fully managed service for composing data sto
9898
### <a id="sqoop"></a>Apache Sqoop
9999
Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use it to import data from a relational database management system (RDBMS), such as SQL Server, MySQL, or Oracle into the Hadoop distributed file system (HDFS), transform the data in Hadoop with MapReduce or Hive, and then export the data back into an RDBMS.
100100

101-
For more information, see [Use Sqoop with HDInsight][hdinsight-use-sqoop].
101+
For more information, see [Use Sqoop with HDInsight](hadoop/hdinsight-use-sqoop.md).
102102

103103
### Development SDKs
104104
Azure Storage can also be accessed using an Azure SDK from the following programming languages:
@@ -146,28 +146,21 @@ hadoop -fs -D fs.azure.write.request.size=4194304 -copyFromLocal test_large_file
146146

147147
You can also increase the value of `fs.azure.write.request.size` globally by using Apache Ambari. The following steps can be used to change the value in the Ambari Web UI:
148148

149-
1. In your browser, go to the Ambari Web UI for your cluster. This is https://CLUSTERNAME.azurehdinsight.net, where **CLUSTERNAME** is the name of your cluster.
149+
1. In your browser, go to the Ambari Web UI for your cluster. This is `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster.
150150

151151
When prompted, enter the admin name and password for the cluster.
152152
2. From the left side of the screen, select **HDFS**, and then select the **Configs** tab.
153153
3. In the **Filter...** field, enter `fs.azure.write.request.size`. This displays the field and current value in the middle of the page.
154154
4. Change the value from 262144 (256 KB) to the new value. For example, 4194304 (4 MB).
155155

156-
![Image of changing the value through Ambari Web UI](./media/hdinsight-upload-data/hbase-change-block-write-size.png)
156+
![Image of changing the value through Ambari Web UI](./media/hdinsight-upload-data/hbase-change-block-write-size.png)
157157

158158
For more information on using Ambari, see [Manage HDInsight clusters using the Apache Ambari Web UI](hdinsight-hadoop-manage-ambari.md).
159159

160160
## Next steps
161161
Now that you understand how to get data into HDInsight, read the following articles to learn how to perform analysis:
162162

163-
* [Get started with Azure HDInsight][hdinsight-get-started]
164-
* [Submit Apache Hadoop jobs programmatically][hdinsight-submit-jobs]
165-
* [Use Apache Hive with HDInsight][hdinsight-use-hive]
166-
* [Use Apache Pig with HDInsight][hdinsight-use-pig]
167-
168-
[hdinsight-use-sqoop]:hadoop/hdinsight-use-sqoop.md
169-
[hdinsight-storage]: hdinsight-hadoop-use-blob-storage.md
170-
[hdinsight-submit-jobs]:hadoop/submit-apache-hadoop-jobs-programmatically.md
171-
[hdinsight-get-started]:hadoop/apache-hadoop-linux-tutorial-get-started.md
172-
[hdinsight-use-hive]:hadoop/hdinsight-use-hive.md
173-
[hdinsight-use-pig]:hadoop/hdinsight-use-pig.md
163+
* [Get started with Azure HDInsight](hadoop/apache-hadoop-linux-tutorial-get-started.md)
164+
* [Submit Apache Hadoop jobs programmatically](hadoop/submit-apache-hadoop-jobs-programmatically.md)
165+
* [Use Apache Hive with HDInsight](hadoop/hdinsight-use-hive.md)
166+
* [Use Apache Pig with HDInsight](hadoop/hdinsight-use-pig.md)

articles/hdinsight/spark/apache-spark-perf.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ When you create a new Spark cluster, you have the option to select Azure Blob St
5454

5555
| Store Type | File System | Speed | Transient | Use Cases |
5656
| --- | --- | --- | --- | --- |
57-
| Azure Blob Storage | **wasb[s]:**//url/ | **Standard** | Yes | Transient cluster |
57+
| Azure Blob Storage | **wasb:**//url/ | **Standard** | Yes | Transient cluster |
5858
| Azure Data Lake Storage Gen 2| **abfs[s]:**//url/ | **Faster** | Yes | Transient cluster |
5959
| Azure Data Lake Storage Gen 1| **adl:**//url/ | **Faster** | Yes | Transient cluster |
6060
| Local HDFS | **hdfs:**//url/ | **Fastest** | No | Interactive 24/7 cluster |

0 commit comments

Comments
 (0)