Skip to content

Commit 9d6f757

Browse files
committed
freshness106
1 parent cbb5782 commit 9d6f757

File tree

1 file changed

+14
-20
lines changed

1 file changed

+14
-20
lines changed

articles/hdinsight/hdinsight-upload-data.md

Lines changed: 14 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,21 @@ ms.author: hrasheed
88
ms.service: hdinsight
99
ms.custom: hdinsightactive,hdiseo17may2017
1010
ms.topic: conceptual
11-
ms.date: 02/08/2019
11+
ms.date: 05/31/2019
1212
---
13+
1314
# Upload data for Apache Hadoop jobs in HDInsight
1415

15-
Azure HDInsight provides a full-featured Hadoop distributed file system (HDFS) over Azure Storage and Azure Data Lake Storage (Gen1 and Gen2). Azure Storage and Data Lake Storage Gen1 and Gen2 are designed as HDFS extensions to provide a seamless experience to customers. They enable the full set of components in the Hadoop ecosystem to operate directly on the data it manages. Azure Storage, Data Lake Storage Gen1, and Gen2 are distinct file systems that are optimized for storage of data and computations on that data. For information about the benefits of using Azure Storage, see [Use Azure Storage with HDInsight][hdinsight-storage], [Use Data Lake Storage Gen1 with HDInsight](hdinsight-hadoop-use-data-lake-store.md), and [Use Data Lake Storage Gen2 with HDInsight](hdinsight-hadoop-use-data-lake-storage-gen2.md).
16+
Azure HDInsight provides a full-featured Hadoop distributed file system (HDFS) over Azure Storage and Azure Data Lake Storage (Gen1 and Gen2). Azure Storage and Data Lake Storage Gen1 and Gen2 are designed as HDFS extensions to provide a seamless experience to customers. They enable the full set of components in the Hadoop ecosystem to operate directly on the data it manages. Azure Storage, Data Lake Storage Gen1, and Gen2 are distinct file systems that are optimized for storage of data and computations on that data. For information about the benefits of using Azure Storage, see [Use Azure Storage with HDInsight](hdinsight-hadoop-use-blob-storage.md), [Use Data Lake Storage Gen1 with HDInsight](hdinsight-hadoop-use-data-lake-store.md), and [Use Data Lake Storage Gen2 with HDInsight](hdinsight-hadoop-use-data-lake-storage-gen2.md).
1617

1718
## Prerequisites
1819

1920
Note the following requirements before you begin:
2021

21-
* An Azure HDInsight cluster. For instructions, see [Get started with Azure HDInsight][hdinsight-get-started] or [Create HDInsight clusters](hdinsight-hadoop-provision-linux-clusters.md).
22+
* An Azure HDInsight cluster. For instructions, see [Get started with Azure HDInsight](hadoop/apache-hadoop-linux-tutorial-get-started.md) or [Create HDInsight clusters](hdinsight-hadoop-provision-linux-clusters.md).
2223
* Knowledge of the following articles:
2324

24-
- [Use Azure Storage with HDInsight][hdinsight-storage]
25+
- [Use Azure Storage with HDInsight](hdinsight-hadoop-use-blob-storage.md)
2526
- [Use Data Lake Storage Gen1 with HDInsight](hdinsight-hadoop-use-data-lake-store.md)
2627
- [Use Data Lake Storage Gen2 with HDInsight](hdinsight-hadoop-use-data-lake-storage-gen2.md)
2728

@@ -58,11 +59,11 @@ For example, `hadoop fs -copyFromLocal data.txt /example/data/data.txt`
5859

5960
Because the default file system for HDInsight is in Azure Storage, /example/data.txt is actually in Azure Storage. You can also refer to the file as:
6061

61-
wasb:///example/data/data.txt
62+
wasb[s]:///example/data/data.txt
6263

6364
or
6465

65-
wasb://<ContainerName>@<StorageAccountName>.blob.core.windows.net/example/data/davinci.txt
66+
wasb[s]://<ContainerName>@<StorageAccountName>.blob.core.windows.net/example/data/davinci.txt
6667

6768
For a list of other Hadoop commands that work with files, see [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html)
6869

@@ -98,7 +99,7 @@ The Azure Data Factory service is a fully managed service for composing data sto
9899
### <a id="sqoop"></a>Apache Sqoop
99100
Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use it to import data from a relational database management system (RDBMS), such as SQL Server, MySQL, or Oracle into the Hadoop distributed file system (HDFS), transform the data in Hadoop with MapReduce or Hive, and then export the data back into an RDBMS.
100101

101-
For more information, see [Use Sqoop with HDInsight][hdinsight-use-sqoop].
102+
For more information, see [Use Sqoop with HDInsight](hadoop/hdinsight-use-sqoop.md).
102103

103104
### Development SDKs
104105
Azure Storage can also be accessed using an Azure SDK from the following programming languages:
@@ -146,28 +147,21 @@ hadoop -fs -D fs.azure.write.request.size=4194304 -copyFromLocal test_large_file
146147

147148
You can also increase the value of `fs.azure.write.request.size` globally by using Apache Ambari. The following steps can be used to change the value in the Ambari Web UI:
148149

149-
1. In your browser, go to the Ambari Web UI for your cluster. This is https://CLUSTERNAME.azurehdinsight.net, where **CLUSTERNAME** is the name of your cluster.
150+
1. In your browser, go to the Ambari Web UI for your cluster. This is `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster.
150151

151152
When prompted, enter the admin name and password for the cluster.
152153
2. From the left side of the screen, select **HDFS**, and then select the **Configs** tab.
153154
3. In the **Filter...** field, enter `fs.azure.write.request.size`. This displays the field and current value in the middle of the page.
154155
4. Change the value from 262144 (256 KB) to the new value. For example, 4194304 (4 MB).
155156

156-
![Image of changing the value through Ambari Web UI](./media/hdinsight-upload-data/hbase-change-block-write-size.png)
157+
![Image of changing the value through Ambari Web UI](./media/hdinsight-upload-data/hbase-change-block-write-size.png)
157158

158159
For more information on using Ambari, see [Manage HDInsight clusters using the Apache Ambari Web UI](hdinsight-hadoop-manage-ambari.md).
159160

160161
## Next steps
161162
Now that you understand how to get data into HDInsight, read the following articles to learn how to perform analysis:
162163

163-
* [Get started with Azure HDInsight][hdinsight-get-started]
164-
* [Submit Apache Hadoop jobs programmatically][hdinsight-submit-jobs]
165-
* [Use Apache Hive with HDInsight][hdinsight-use-hive]
166-
* [Use Apache Pig with HDInsight][hdinsight-use-pig]
167-
168-
[hdinsight-use-sqoop]:hadoop/hdinsight-use-sqoop.md
169-
[hdinsight-storage]: hdinsight-hadoop-use-blob-storage.md
170-
[hdinsight-submit-jobs]:hadoop/submit-apache-hadoop-jobs-programmatically.md
171-
[hdinsight-get-started]:hadoop/apache-hadoop-linux-tutorial-get-started.md
172-
[hdinsight-use-hive]:hadoop/hdinsight-use-hive.md
173-
[hdinsight-use-pig]:hadoop/hdinsight-use-pig.md
164+
* [Get started with Azure HDInsight](hadoop/apache-hadoop-linux-tutorial-get-started.md)
165+
* [Submit Apache Hadoop jobs programmatically](hadoop/submit-apache-hadoop-jobs-programmatically.md)
166+
* [Use Apache Hive with HDInsight](hadoop/hdinsight-use-hive.md)
167+
* [Use Apache Pig with HDInsight](hadoop/hdinsight-use-pig.md)

0 commit comments

Comments
 (0)