Skip to content

Commit b9b4650

Browse files
authored
Merge pull request #79422 from hrasheed-msft/hdi_hdp_cleanup2
HDInsight remove hortonworks mentions
2 parents 8fbca6a + 8fbd4df commit b9b4650

23 files changed

+28
-328
lines changed

.openpublishing.redirection.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39543,6 +39543,11 @@
3954339543
"redirect_url": "/azure/hdinsight/",
3954439544
"redirect_document_id": true
3954539545
},
39546+
{
39547+
"source_path": "articles/hdinsight/hdinsight-migrate-from-windows-to-linux.md",
39548+
"redirect_url": "/azure/hdinsight/",
39549+
"redirect_document_id": true
39550+
},
3954639551
{
3954739552
"source_path" : "articles/active-directory-domain-services/active-directory-ds-troubleshoot-service-principals.md",
3954839553
"redirect_url" : "/azure/active-directory-domain-services/alert-service-principal",

articles/hdinsight/domain-joined/apache-domain-joined-introduction.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ In the past, Azure HDInsight supported only a single user: local admin. This wor
1616

1717
You can create an HDInsight cluster with Enterprise Security Package (ESP) that's joined to an Active Directory domain. You can then configure a list of employees from the enterprise who can authenticate through Azure Active Directory to sign in to the HDInsight cluster. No one from outside the enterprise can sign in or access the HDInsight cluster.
1818

19-
The enterprise admin can configure role-based access control (RBAC) for Apache Hive security by using [Apache Ranger](https://hortonworks.com/apache/ranger/). Configuring RBAC restricts data access to only what's needed. Finally, the admin can audit the data access by employees and any changes done to access control policies. The admin can then achieve a high degree of governance of their corporate resources.
19+
The enterprise admin can configure role-based access control (RBAC) for Apache Hive security by using [Apache Ranger](https://ranger.apache.org/). Configuring RBAC restricts data access to only what's needed. Finally, the admin can audit the data access by employees and any changes done to access control policies. The admin can then achieve a high degree of governance of their corporate resources.
2020

2121
> [!NOTE]
2222
> Apache Oozie is now enabled on ESP clusters. To access the Oozie web UI, users should enable [tunneling](../hdinsight-linux-ambari-ssh-tunnel.md).
@@ -38,14 +38,14 @@ With this setup, enterprise employees can sign in to the cluster nodes by using
3838
## Authorization
3939
A best practice that most enterprises follow is making sure that not every employee has access to all enterprise resources. Likewise, the admin can define role-based access control policies for the cluster resources.
4040

41-
For example, the admin can configure [Apache Ranger](https://hortonworks.com/apache/ranger/) to set access control policies for Hive. This functionality ensures that employees can access only as much data as they need to be successful in their jobs. SSH access to the cluster is also restricted to only the administrator.
41+
For example, the admin can configure [Apache Ranger](https://ranger.apache.org/) to set access control policies for Hive. This functionality ensures that employees can access only as much data as they need to be successful in their jobs. SSH access to the cluster is also restricted to only the administrator.
4242

4343
## Auditing
4444
Auditing of all access to the cluster resources, and the data, is necessary to track unauthorized or unintentional access of the resources. It's as important as protecting the HDInsight cluster resources from unauthorized users and securing the data.
4545

4646
The admin can view and report all access to the HDInsight cluster resources and data. The admin can also view and report all changes to the access control policies created in Apache Ranger supported endpoints.
4747

48-
A HDInsight cluster with ESP uses the familiar Apache Ranger UI to search audit logs. On the back end, Ranger uses [Apache Solr](https://hortonworks.com/apache/solr/) for storing and searching the logs.
48+
A HDInsight cluster with ESP uses the familiar Apache Ranger UI to search audit logs. On the back end, Ranger uses [Apache Solr](http://lucene.apache.org/solr/) for storing and searching the logs.
4949

5050
## Encryption
5151
Protecting data is important for meeting organizational security and compliance requirements. Along with restricting access to data from unauthorized employees, you should encrypt it.

articles/hdinsight/hadoop/TOC.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -471,9 +471,6 @@
471471
items:
472472
- name: Windows clusters
473473
items:
474-
- name: Migrate Windows clusters to Linux clusters
475-
href: ../hdinsight-migrate-from-windows-to-linux.md
476-
maintainContext: true
477474
- name: Migrate .NET solutions to Linux clusters
478475
href: ../hdinsight-hadoop-migrate-dotnet-to-linux.md
479476
maintainContext: true

articles/hdinsight/hadoop/apache-hadoop-introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ This article provides an introduction to Apache Hadoop on Azure HDInsight. Azure
2121
[Apache Hadoop](https://hadoop.apache.org/) was the original open-source framework for distributed processing and analysis of big data sets on clusters. The Hadoop technology stack includes related software and utilities, including Apache Hive, Apache HBase, Spark, Kafka, and many others.
2222

2323

24-
Azure HDInsight is a cloud distribution of the Hadoop components from the [Hortonworks Data Platform (HDP)](https://hortonworks.com/products/data-center/hdp/). Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. You can use the most popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R, and more. With these frameworks, you can enable a broad range of scenarios such as extract, transform, and load (ETL), data warehousing, machine learning, and IoT.
24+
Azure HDInsight is a cloud distribution of Hadoop components. Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. You can use the most popular open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, R, and more. With these frameworks, you can enable a broad range of scenarios such as extract, transform, and load (ETL), data warehousing, machine learning, and IoT.
2525

2626
To see available Hadoop technology stack components on HDInsight, see [Components and versions available with HDInsight](../hdinsight-component-versioning.md). To read more about Hadoop in HDInsight, see the [Azure features page for HDInsight](https://azure.microsoft.com/services/hdinsight/).
2727

articles/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-infrastructure.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ See [Default node configuration and virtual machine sizes for clusters](../hdins
3131

3232
## Check Hadoop components availability in HDInsight
3333

34-
Each HDInsight version is a cloud distribution of a version of Hortonworks Data Platform (HDP) and consists of a set of Hadoop eco-system components. See [HDInsight Component Versioning](../hdinsight-component-versioning.md) for details on all HDInsight components and their current versions.
34+
Each HDInsight version is a cloud distribution of a set of Hadoop eco-system components. See [HDInsight Component Versioning](../hdinsight-component-versioning.md) for details on all HDInsight components and their current versions.
3535

3636
You can also use Apache Ambari UI or Ambari REST API to check the Hadoop components and versions in HDInsight.
3737

articles/hdinsight/hadoop/apache-hadoop-on-premises-migration-motivation.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ This article is the first in a series on best-practices for migrating on-premise
1515

1616
## Why to migrate to Azure HDInsight
1717

18-
Azure HDInsight is a cloud distribution of the Hadoop components from the [Hortonworks Data Platform(HDP)](https://hortonworks.com/products/data-center/hdp/). Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. HDInsight includes the most popular open-source frameworks such as:
18+
Azure HDInsight is a cloud distribution of Hadoop components. Azure HDInsight makes it easy, fast, and cost-effective to process massive amounts of data. HDInsight includes the most popular open-source frameworks such as:
1919

2020
- Apache Hadoop
2121
- Apache Spark
@@ -84,14 +84,13 @@ This section provides template questionnaires to help gather important informati
8484
| **Question** | **Example** | **Answer** |
8585
|---|---|---|
8686
|**Topic**: **Environment**|||
87-
|Cluster Distribution type|Hortonworks, Cloudera, MapR| |
8887
|Cluster Distribution version|HDP 2.6.5, CDH 5.7|
8988
|Big Data eco-system components|HDFS, Yarn, Hive, LLAP, Impala, Kudu, HBase, Spark, MapReduce, Kafka, Zookeeper, Solr, Sqoop, Oozie, Ranger, Atlas, Falcon, Zeppelin, R|
9089
|Cluster types|Hadoop, Spark, Confluent Kafka, Storm, Solr|
9190
|Number of clusters|4|
92-
|Number of Master Nodes|2|
93-
|Number of Worker Nodes|100|
94-
|Number of Edge Nodes| 5|
91+
|Number of master nodes|2|
92+
|Number of worker nodes|100|
93+
|Number of edge nodes| 5|
9594
|Total Disk space|100 TB|
9695
|Master Node configuration|m/y, cpu, disk, etc.|
9796
|Data Nodes configuration|m/y, cpu, disk, etc.|
@@ -193,7 +192,6 @@ This section provides template questionnaires to help gather important informati
193192
|Share metastores between different clusters?|Yes||
194193
|Deconstruct workloads?|Replace Hive jobs with Spark jobs||
195194
|Use ADF for data orchestration?|No||
196-
|HDInsight vs Hortonworks Data Platform on IaaS?|HDInsight||
197195

198196
## Next steps
199197

articles/hdinsight/hbase/TOC.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,5 @@
264264
items:
265265
- name: Windows clusters
266266
items:
267-
- name: Migrate Windows clusters to Linux clusters
268-
href: ../hdinsight-migrate-from-windows-to-linux.md
269267
- name: Migrate .NET solutions to Linux clusters
270268
href: ../hdinsight-hadoop-migrate-dotnet-to-linux.md

articles/hdinsight/hdinsight-component-versioning.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@ ms.date: 06/07/2019
1515

1616
Learn about the [Apache Hadoop](https://hadoop.apache.org/) ecosystem components and versions in Microsoft Azure HDInsight, as well as the Enterprise Security Package. Also, learn how to check Hadoop component versions in HDInsight.
1717

18-
Each HDInsight version is a cloud distribution of a version of Hortonworks Data Platform (HDP).
19-
2018
## Apache Hadoop components available with different HDInsight versions
2119

2220
Azure HDInsight supports multiple Hadoop cluster versions that can be deployed at any time. Each version choice creates a specific version of the HDP distribution and a set of components that are contained within that distribution. As of April 4, 2017, the default cluster version used by Azure HDInsight is 3.6 and is based on HDP 2.6.
@@ -143,7 +141,7 @@ For information on pricing and SLA for the Enterprise Security Package, see [HDI
143141

144142
## Service level agreement for HDInsight cluster versions
145143

146-
The service level agreement (SLA) is defined in terms of a _support window_. The support window is the period of time that an HDInsight cluster version is supported by Microsoft Customer Service and Support. If the version has a _support expiration date_ that has passed, the HDInsight cluster is outside the support window. For more information about supported versions, see the list of [supported HDInsight cluster versions](hdinsight-migrate-from-windows-to-linux.md). The support expiration date for a specified HDInsight version X (after a newer X+1 version is available) is calculated as the later of:
144+
The service level agreement (SLA) is defined in terms of a _support window_. The support window is the period of time that an HDInsight cluster version is supported by Microsoft Customer Service and Support. If the version has a _support expiration date_ that has passed, the HDInsight cluster is outside the support window. The support expiration date for a specified HDInsight version X (after a newer X+1 version is available) is calculated as the later of:
147145

148146
* Formula 1: Add 180 days to the date when the HDInsight cluster version X was released.
149147
* Formula 2: Add 90 days to the date when the HDInsight cluster version X+1 is made available in Azure portal.

articles/hdinsight/hdinsight-hadoop-access-yarn-app-logs-linux.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,9 @@ YARN Timeline Server includes the following type of data:
3030

3131
YARN supports multiple programming models ([Apache Hadoop MapReduce](https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html) being one of them) by decoupling resource management from application scheduling/monitoring. YARN uses a global *ResourceManager* (RM), per-worker-node *NodeManagers* (NMs), and per-application *ApplicationMasters* (AMs). The per-application AM negotiates resources (CPU, memory, disk, network) for running your application with the RM. The RM works with NMs to grant these resources, which are granted as *containers*. The AM is responsible for tracking the progress of the containers assigned to it by the RM. An application may require many containers depending on the nature of the application.
3232

33-
Each application may consist of multiple *application attempts*. If an application fails, it may be retried as a new attempt. Each attempt runs in a container. In a sense, a container provides the context for basic unit of work performed by a YARN application. All work that is done within the context of a container is performed on the single worker node on which the container was allocated. See [Apache Hadoop YARN Concepts][YARN-concepts] for further reference.
33+
Each application may consist of multiple *application attempts*. If an application fails, it may be retried as a new attempt. Each attempt runs in a container. In a sense, a container provides the context for basic unit of work performed by a YARN application. All work that is done within the context of a container is performed on the single worker node on which the container was allocated. See [Apache Hadoop YARN Concepts](https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html) for further reference.
3434

35-
Application logs (and the associated container logs) are critical in debugging problematic Hadoop applications. YARN provides a nice framework for collecting, aggregating, and storing application logs with the [Log Aggregation][log-aggregation] feature. The Log Aggregation feature makes accessing application logs more deterministic. It aggregates logs across all containers on a worker node and stores them as one aggregated log file per worker node. The log is stored on the default file system after an application finishes. Your application may use hundreds or thousands of containers, but logs for all containers run on a single worker node are always aggregated to a single file. So there is only 1 log per worker node used by your application. Log Aggregation is enabled by default on HDInsight clusters version 3.0 and above. Aggregated logs are located in default storage for the cluster. The following path is the HDFS path to the logs:
35+
Application logs (and the associated container logs) are critical in debugging problematic Hadoop applications. YARN provides a nice framework for collecting, aggregating, and storing application logs with the [Log Aggregation](https://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/) feature. The Log Aggregation feature makes accessing application logs more deterministic. It aggregates logs across all containers on a worker node and stores them as one aggregated log file per worker node. The log is stored on the default file system after an application finishes. Your application may use hundreds or thousands of containers, but logs for all containers run on a single worker node are always aggregated to a single file. So there is only 1 log per worker node used by your application. Log Aggregation is enabled by default on HDInsight clusters version 3.0 and above. Aggregated logs are located in default storage for the cluster. The following path is the HDFS path to the logs:
3636

3737
/app-logs/<user>/logs/<applicationId>
3838

@@ -66,7 +66,5 @@ The YARN ResourceManager UI runs on the cluster headnode. It is accessed through
6666
You are presented with a list of links to YARN logs.
6767

6868
[YARN-timeline-server]:https://hadoop.apache.org/docs/r2.4.0/hadoop-yarn/hadoop-yarn-site/TimelineServer.html
69-
[log-aggregation]:https://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
7069
[T-file]:https://issues.apache.org/jira/secure/attachment/12396286/TFile%20Specification%2020081217.pdf
7170
[binary-format]:https://issues.apache.org/jira/browse/HADOOP-3315
72-
[YARN-concepts]:https://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/

articles/hdinsight/hdinsight-hadoop-linux-information.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,6 @@ To use a different version of a component, upload the version you need and use i
280280
281281
## Next steps
282282
283-
* [Migrate from Windows-based HDInsight to Linux-based](hdinsight-migrate-from-windows-to-linux.md)
284283
* [Manage HDInsight clusters by using the Apache Ambari REST API](./hdinsight-hadoop-manage-ambari-rest-api.md)
285284
* [Use Apache Hive with HDInsight](hadoop/hdinsight-use-hive.md)
286285
* [Use Apache Pig with HDInsight](hadoop/hdinsight-use-pig.md)

0 commit comments

Comments
 (0)