Skip to content

Commit e90d53b

Browse files
authored
Merge pull request #260711 from MicrosoftDocs/main
Publish to live, Friday 4 AM PST, 12/8
2 parents 204431f + 42b2ad6 commit e90d53b

35 files changed

+145
-109
lines changed

articles/ai-services/openai/reference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -615,7 +615,7 @@ POST https://{your-resource-name}.openai.azure.com/openai/deployments/{deploymen
615615

616616
**Supported versions**
617617

618-
- `2023-12-01-preview`
618+
- `2023-12-01-preview` [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2023-12-01-preview/inference.json)
619619

620620
**Request body**
621621

articles/aks/concepts-network.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,10 @@ For more information, see [Configure kubenet networking for an AKS cluster][aks-
108108

109109
With Azure CNI, every pod gets an IP address from the subnet and can be accessed directly. These IP addresses must be planned in advance and unique across your network space. Each node has a configuration parameter for the maximum number of pods it supports. The equivalent number of IP addresses per node are then reserved up front. This approach can lead to IP address exhaustion or the need to rebuild clusters in a larger subnet as your application demands grow, so it's important to plan properly. To avoid these planning challenges, it is possible to enable the feature [Azure CNI networking for dynamic allocation of IPs and enhanced subnet support][configure-azure-cni-dynamic-ip-allocation].
110110

111+
> [!NOTE]
112+
> Due to Kubernetes limitations, the Resource Group name, the Virtual Network name and the subnet name must be 63 characters or less.
113+
114+
111115
Unlike kubenet, traffic to endpoints in the same virtual network isn't NAT'd to the node's primary IP. The source address for traffic inside the virtual network is the pod IP. Traffic that's external to the virtual network still NATs to the node's primary IP.
112116

113117
Nodes use the [Azure CNI][cni-networking] Kubernetes plugin.

articles/data-factory/connector-microsoft-fabric-lakehouse.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: data-factory
88
ms.subservice: data-movement
99
ms.topic: conceptual
1010
ms.custom: synapse
11-
ms.date: 11/03/2023
11+
ms.date: 12/08/2023
1212
---
1313

1414
# Copy and transform data in Microsoft Fabric Lakehouse (Preview) using Azure Data Factory or Azure Synapse Analytics
@@ -29,7 +29,7 @@ This Microsoft Fabric Lakehouse connector is supported for the following capabil
2929
| Supported capabilities|IR | Managed private endpoint|
3030
|---------| --------| --------|
3131
|[Copy activity](copy-activity-overview.md) (source/sink)|① ②||
32-
|[Mapping data flow](concepts-data-flow-overview.md) (-/sink)|① |- |
32+
|[Mapping data flow](concepts-data-flow-overview.md) (source/sink)|① |- |
3333

3434
*① Azure integration runtime ② Self-hosted integration runtime*
3535

@@ -514,7 +514,7 @@ For more information, see the [source transformation](data-flow-source.md) and [
514514

515515
To use Microsoft Fabric Lakehouse Files dataset as a source or sink dataset in mapping data flow, go to the following sections for the detailed configurations.
516516

517-
#### Microsoft Fabric Lakehouse Files as a sink type
517+
#### Microsoft Fabric Lakehouse Files as a source or sink type
518518

519519
Microsoft Fabric Lakehouse connector supports the following file formats. Refer to each article for format-based settings.
520520

@@ -528,6 +528,12 @@ Microsoft Fabric Lakehouse connector supports the following file formats. Refer
528528

529529
To use Microsoft Fabric Lakehouse Table dataset as a source or sink dataset in mapping data flow, go to the following sections for the detailed configurations.
530530

531+
#### Microsoft Fabric Lakehouse Table as a source type
532+
533+
There are no configurable properties under source options.
534+
> [!NOTE]
535+
> CDC support for Lakehouse table source is currently not available.
536+
531537
#### Microsoft Fabric Lakehouse Table as a sink type
532538

533539
The following properties are supported in the Mapping Data Flows **sink** section:

articles/hdinsight/hadoop/apache-hadoop-on-premises-migration-motivation.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn the motivation and benefits for migrating on-premises Hadoop
44
ms.service: hdinsight
55
ms.custom: ignite-2022
66
ms.topic: how-to
7-
ms.date: 11/30/2022
7+
ms.date: 12/08/2023
88
---
99

1010
# Migrate on-premises Apache Hadoop clusters to Azure HDInsight - motivation and benefits
@@ -93,7 +93,7 @@ This section provides template questionnaires to help gather important informati
9393
|Edge Nodes configuration|m/y, cpu, disk, etc.|
9494
|HDFS Encryption?|Yes|
9595
|High Availability|HDFS HA, Metastore HA|
96-
|Disaster Recovery / Backup|Backup cluster?|
96+
|Disaster Recovery / Back up|Backup cluster?|
9797
|Systems that are dependent on Cluster|SQL Server, Teradata, Power BI, MongoDB|
9898
|Third-party integrations|Tableau, GridGain, Qubole, Informatica, Splunk|
9999
|**Topic**: **Security**|||
@@ -102,21 +102,21 @@ This section provides template questionnaires to help gather important informati
102102
|HDFS Access Control| Manual, ssh users|
103103
|Hive authentication & authorization|Sentry, LDAP, AD with Kerberos, Ranger|
104104
|Auditing|Ambari, Cloudera Navigator, Ranger|
105-
|Monitoring|Graphite, collectd, statsd, Telegraf, InfluxDB|
106-
|Alerting|Kapacitor, Prometheus, Datadog|
107-
|Data Retention duration| 3 years, 5 years|
105+
|Monitoring|Graphite, collectd, `statsd`, Telegraf, InfluxDB|
106+
|Alerting|`Kapacitor`, Prometheus, Datadog|
107+
|Data Retention duration| Three years, five years|
108108
|Cluster Administrators|Single Administrator, Multiple Administrators|
109109

110110
### Project details questionnaire
111111

112112
|**Question**|**Example**|**Answer**|
113113
|---|---|---|
114114
|**Topic**: **Workloads and Frequency**|||
115-
|MapReduce jobs|10 jobs -- twice daily||
116-
|Hive jobs|100 jobs -- every hour||
117-
|Spark batch jobs|50 jobs -- every 15 minutes||
118-
|Spark Streaming jobs|5 jobs -- every 3 minutes||
119-
|Structured Streaming jobs|5 jobs -- every minute||
115+
|MapReduce jobs|10 jobs--twice daily||
116+
|Hive jobs|100 jobs--every hour||
117+
|Spark batch jobs|50 jobs--every 15 minutes||
118+
|Spark Streaming jobs|5 jobs--every 3 minutes||
119+
|Structured Streaming jobs|5 jobs--every minute||
120120
|Programming Languages|Python, Scala, Java||
121121
|Scripting|Shell, Python||
122122
|**Topic**: **Data**|||
@@ -164,7 +164,7 @@ This section provides template questionnaires to help gather important informati
164164
|Data transfer delta|DistCp, AzCopy||
165165
|Ongoing incremental data transfer|DistCp, Sqoop||
166166
|**Topic**: **Monitoring & Alerting** |||
167-
|Use Azure Monitoring & Alerting Vs Integrate third-party monitoring|Use Azure Monitoring & Alerting||
167+
|Use Azure Monitoring & Alerting vs Integrate third-party monitoring|Use Azure Monitoring & Alerting||
168168
|**Topic**: **Security preferences** |||
169169
|Private and protected data pipeline?|Yes||
170170
|Domain Joined cluster (ESP)?| Yes||

articles/hdinsight/hdinsight-apps-install-applications.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn how to install third-party Apache Hadoop applications on Azur
44
ms.service: hdinsight
55
ms.custom: hdinsightactive
66
ms.topic: how-to
7-
ms.date: 11/17/2022
7+
ms.date: 12/08/2023
88

99
---
1010
# Install third-party Apache Hadoop applications on Azure HDInsight
@@ -20,14 +20,14 @@ The following list shows the published applications:
2020
|[AtScale Intelligence Platform](https://aws.amazon.com/marketplace/pp/AtScale-AtScale-Intelligence-Platform/B07BWWHH18) |Hadoop |AtScale turns your HDInsight cluster into a scale-out OLAP server, allowing you to query billions of rows of data interactively using the BI tools you already know, own, and love – from Microsoft Excel, Power BI, Tableau Software to QlikView. |
2121
|[Datameer](https://azuremarketplace.microsoft.com/marketplace/apps/datameer.datameer) |Hadoop |Datameer's self-service scalable platform for preparing, exploring, and governing your data for analytics accelerates turning complex multisource data into valuable business-ready information, delivering faster, smarter insights at an enterprise-scale. |
2222
|[Dataiku DSS on HDInsight](https://azuremarketplace.microsoft.com/marketplace/apps/dataiku.dataiku-data-science-studio) |Hadoop, Spark |Dataiku DSS in an enterprise data science platform that lets data scientists and data analysts collaborate to design and run new data products and services more efficiently, turning raw data into impactful predictions. |
23-
|[WANdisco Fusion HDI App](https://community.wandisco.com/s/article/Use-WANdisco-Fusion-for-parallel-operation-of-ADLS-Gen1-and-Gen2) |Hadoop, Spark,HBase,Kafka |Keeping data consistent in a distributed environment is a massive data operations challenge. WANdisco Fusion, an enterprise-class software platform, solves this problem by enabling unstructured data consistency across any environment. |
23+
|[WANdisco Fusion HDI App](https://community.wandisco.com/s/article/Use-WANdisco-Fusion-for-parallel-operation-of-ADLS-Gen1-and-Gen2) |Hadoop, Spark, HBase, Kafka |Keeping data consistent in a distributed environment is a massive data operations challenge. WANdisco Fusion, an enterprise-class software platform, solves this problem by enabling unstructured data consistency across any environment. |
2424
|H2O SparklingWater for HDInsight |Spark |H2O Sparkling Water supports the following distributed algorithms: GLM, Naïve Bayes, Distributed Random Forest, Gradient Boosting Machine, Deep Neural Networks, Deep learning, K-means, PCA, Generalized Low Rank Models, Anomaly Detection, Autoencoders. |
25-
|[Striim for Real-Time Data Integration to HDInsight](https://azuremarketplace.microsoft.com/marketplace/apps/striim.striimbyol) |Hadoop,HBase,Spark,Kafka |Striim (pronounced "stream") is an end-to-end streaming data integration + intelligence platform, enabling continuous ingestion, processing, and analytics of disparate data streams. |
25+
|[Striim for Real-Time Data Integration to HDInsight](https://azuremarketplace.microsoft.com/marketplace/apps/striim.striimbyol) |Hadoop, HBase, Spark, Kafka |Striim (pronounced "stream") is an end-to-end streaming data integration + intelligence platform, enabling continuous ingestion, processing, and analytics of disparate data streams. |
2626
|[Jumbune Enterprise-Accelerating BigData Analytics](https://azuremarketplace.microsoft.com/marketplace/apps/impetus-infotech-india-pvt-ltd.impetus_jumbune) |Hadoop, Spark |At a high level, Jumbune assists enterprises by, 1. Accelerating Tez, MapReduce & Spark engine based Hive, Java, Scala workload performance. 2. Proactive Hadoop Cluster Monitoring, 3. Establishing Data Quality management on distributed file system. |
27-
|[Kyligence Enterprise](https://azuremarketplace.microsoft.com/marketplace/apps/kyligence.kyligence-cloud-saas) |Hadoop,HBase,Spark |Powered by Apache Kylin, Kyligence Enterprise Enables BI on Big Data. As an enterprise OLAP engine on Hadoop, Kyligence Enterprise empowers business analyst to architect BI on Hadoop with industry-standard data warehouse and BI methodology. |
28-
|[StreamSets Data Collector for HDInsight Cloud](https://azuremarketplace.microsoft.com/marketplace/apps/streamsets.streamsets-data-collector-hdinsight) |Hadoop,HBase,Spark,Kafka |StreamSets Data Collector is a lightweight, powerful engine that streams data in real time. Use Data Collector to route and process data in your data streams. It comes with a 30 day trial license. |
29-
|[Trifacta Wrangler Enterprise](https://azuremarketplace.microsoft.com/marketplace/apps/trifactainc1587522950142.trifactaazure) |Hadoop, Spark,HBase |Trifacta Wrangler Enterprise for HDInsight supports enterprise-wide data wrangling for any scale of data. The cost of running Trifacta on Azure is a combination of Trifacta subscription costs plus the Azure infrastructure costs for the virtual machines. |
30-
|[Unifi Data Platform](https://www.crunchbase.com/organization/unifi-software) |Hadoop,HBase,Spark |The Unifi Data Platform is a seamlessly integrated suite of self-service data tools designed to empower the business user to tackle data challenges that drive incremental revenue, reduce costs or operational complexity. |
27+
|[Kyligence Enterprise](https://azuremarketplace.microsoft.com/marketplace/apps/kyligence.kyligence-cloud-saas) | Hadoop, HBase, Spark | Powered by Apache `Kylin`, Kyligence Enterprise Enables BI on Big Data. As an enterprise OLAP engine on Hadoop, Kyligence Enterprise empowers business analyst to architect BI on Hadoop with industry-standard data warehouse and BI methodology. |
28+
|[StreamSets Data Collector for HDInsight Cloud](https://azuremarketplace.microsoft.com/marketplace/apps/streamsets.streamsets-data-collector-hdinsight) |Hadoop, HBase, Spark, Kafka |StreamSets Data Collector is a lightweight, powerful engine that streams data in real time. Use Data Collector to route and process data in your data streams. It comes with a 30 day trial license. |
29+
|[Trifacta Wrangler Enterprise](https://azuremarketplace.microsoft.com/marketplace/apps/trifactainc1587522950142.trifactaazure) |Hadoop, Spark, HBase |Trifacta Wrangler Enterprise for HDInsight supports enterprise-wide data wrangling for any scale of data. The cost of running Trifacta on Azure is a combination of Trifacta subscription costs plus the Azure infrastructure costs for the virtual machines. |
30+
|[Unifi Data Platform](https://www.crunchbase.com/organization/unifi-software) |Hadoop, HBase, Spark |The `Unifi Data Platform` is a seamlessly integrated suite of self-service data tools designed to empower the business user to tackle data challenges that drive incremental revenue, reduce costs or operational complexity. |
3131

3232
The instructions provided in this article use Azure portal. You can also export the Azure Resource Manager template from the portal or obtain a copy of the Resource Manager template from vendors, and use Azure PowerShell and Azure Classic CLI to deploy the template. See [Create Apache Hadoop clusters on HDInsight using Resource Manager templates](hdinsight-hadoop-create-linux-clusters-arm-templates.md).
3333

@@ -84,7 +84,7 @@ The portal shows a list of the installed HDInsight applications for a cluster, a
8484
## Connect to the edge node
8585
You can connect to the edge node using HTTP and SSH. The endpoint information can be found from the [portal](#list-installed-hdinsight-apps-and-properties). For information, see [Use SSH with HDInsight](hdinsight-hadoop-linux-use-ssh-unix.md).
8686

87-
The HTTP endpoint credentials are the HTTP user credentials that you have configured for the HDInsight cluster; the SSH endpoint credentials are the SSH credentials that you have configured for the HDInsight cluster.
87+
The HTTP endpoint credentials are the HTTP user credentials configured for the HDInsight cluster. The SSH endpoint credentials are the SSH credentials configured for the HDInsight cluster.
8888

8989
## Troubleshoot
9090
See [Troubleshoot the installation](hdinsight-apps-install-custom-applications.md#troubleshoot-the-installation).

articles/hdinsight/hdinsight-hadoop-customize-cluster-bootstrap.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn how to customize HDInsight cluster configuration programmatic
44
ms.service: hdinsight
55
ms.topic: how-to
66
ms.custom: hdinsightactive, devx-track-azurepowershell, devx-track-dotnet
7-
ms.date: 11/17/2022
7+
ms.date: 12/08/2023
88
---
99

1010
# Customize HDInsight clusters using Bootstrap
@@ -35,18 +35,18 @@ For example, using these programmatic methods, you can configure options in thes
3535
* yarn-site.xml
3636
* server.properties (kafka-broker configuration)
3737

38-
For information on installing additional components on HDInsight cluster during the creation time, see [Customize HDInsight clusters using Script Action (Linux)](hdinsight-hadoop-customize-cluster-linux.md).
38+
For information on installing more components on HDInsight cluster during the creation time, see [Customize HDInsight clusters using Script Action (Linux)](hdinsight-hadoop-customize-cluster-linux.md).
3939

4040
## Prerequisites
4141

42-
* If using PowerShell, you'll need the [Az Module](/powershell/azure/).
42+
* If using PowerShell, you need the [Az Module](/powershell/azure/).
4343

4444
## Use Azure PowerShell
4545

4646
The following PowerShell code customizes an [Apache Hive](https://hive.apache.org/) configuration:
4747

4848
> [!IMPORTANT]
49-
> The parameter `Spark2Defaults` may need to be used with [Add-AzHDInsightConfigValue](/powershell/module/az.hdinsight/add-azhdinsightconfigvalue). You can pass empty values to the parameter as shown in the code example below.
49+
> The parameter `Spark2Defaults` may need to be used with [Add-AzHDInsightConfigValue](/powershell/module/az.hdinsight/add-azhdinsightconfigvalue). You can pass empty values to the parameter as shown in the following code example.
5050
5151
```powershell
5252
# hive-site.xml configuration
@@ -117,7 +117,7 @@ You can use bootstrap in Resource Manager template:
117117

118118
:::image type="content" source="./media/hdinsight-hadoop-customize-cluster-bootstrap/hdinsight-customize-cluster-bootstrap-arm.png" alt-text="Hadoop customizes cluster bootstrap Azure Resource Manager template":::
119119

120-
Sample Resource Manager template snippet to switch configuration in spark2-defaults to periodically clean up event logs from storage.
120+
Sample Resource Manager template snippet to switch configuration in spark2-defaults to periodically clean-up event logs from storage.
121121

122122
```json
123123
"configurations": {

articles/hdinsight/hdinsight-virtual-network-architecture.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ title: Azure HDInsight virtual network architecture
33
description: Learn the resources available when you create an HDInsight cluster in an Azure Virtual Network.
44
ms.service: hdinsight
55
ms.topic: conceptual
6-
ms.date: 11/17/2022
6+
ms.date: 12/05/2023
77
---
88

99
# Azure HDInsight virtual network architecture
1010

11-
This article explains the resources that are present when you deploy an HDInsight cluster into a custom Azure Virtual Network. This information will help you to connect on-premises resources to your HDInsight cluster in Azure. For more information on Azure Virtual Networks, see [What is Azure Virtual Network?](../virtual-network/virtual-networks-overview.md).
11+
This article explains the resources that are present when you deploy an HDInsight cluster into a custom Azure Virtual Network. This information helps you to connect on-premises resources to your HDInsight cluster in Azure. For more information on Azure Virtual Networks, see [What is Azure Virtual Network?](../virtual-network/virtual-networks-overview.md).
1212

1313
## Resource types in Azure HDInsight clusters
1414

@@ -26,7 +26,7 @@ Use Fully Qualified Domain Names (FQDNs) when addressing nodes in your cluster.
2626

2727
These FQDNs will be of the form `<node-type-prefix><instance-number>-<abbreviated-clustername>.<unique-identifier>.cx.internal.cloudapp.net`.
2828

29-
The `<node-type-prefix>` will be *hn* for headnodes, *wn* for worker nodes and *zn* for zookeeper nodes.
29+
The `<node-type-prefix>` will be `hn` for headnodes, `wn` for worker nodes and `zn` for zookeeper nodes.
3030

3131
If you need just the host name, use only the first part of the FQDN: `<node-type-prefix><instance-number>-<abbreviated-clustername>`
3232

@@ -61,7 +61,7 @@ You can access your HDInsight cluster in three ways:
6161

6262
- An HTTPS endpoint outside of the virtual network at `CLUSTERNAME.azurehdinsight.net`.
6363
- An SSH endpoint for directly connecting to the headnode at `CLUSTERNAME-ssh.azurehdinsight.net`.
64-
- An HTTPS endpoint within the virtual network `CLUSTERNAME-int.azurehdinsight.net`. Notice the "`-int`" in this URL. This endpoint will resolve to a private IP in that virtual network and isn't accessible from the public internet.
64+
- An HTTPS endpoint within the virtual network `CLUSTERNAME-int.azurehdinsight.net`. Notice the "`-int`" in this URL. This endpoint resolves to a private IP in that virtual network and isn't accessible from the public internet.
6565

6666
These three endpoints are each assigned a load balancer.
6767

0 commit comments

Comments
 (0)