Skip to content

Commit 6cf73af

Browse files
committed
Merge branch 'master' of https://github.com/MicrosoftDocs/azure-docs-pr into freshness199
2 parents d79e5b7 + 732410b commit 6cf73af

File tree

2 files changed

+19
-16
lines changed

2 files changed

+19
-16
lines changed

articles/hdinsight/spark/apache-spark-overview.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.custom: hdinsightactive,mvc
99
ms.topic: overview
10-
ms.date: 10/01/2019
10+
ms.date: 02/25/2020
1111

1212
#customer intent: As a developer new to Apache Spark and Apache Spark in Azure HDInsight, I want to have a basic understanding of Microsoft's implementation of Apache Spark in Azure HDInsight so I can decide if I want to use it rather than build my own cluster.
1313
---
@@ -29,17 +29,17 @@ Spark clusters in HDInsight offer a fully managed Spark service. Benefits of cre
2929
| Feature | Description |
3030
| --- | --- |
3131
| Ease creation |You can create a new Spark cluster in HDInsight in minutes using the Azure portal, Azure PowerShell, or the HDInsight .NET SDK. See [Get started with Apache Spark cluster in HDInsight](apache-spark-jupyter-spark-sql-use-portal.md). |
32-
| Ease of use |Spark cluster in HDInsight include Jupyter and Apache Zeppelin notebooks. You can use these notebooks for interactive data processing and visualization.|
32+
| Ease of use |Spark cluster in HDInsight include Jupyter and Apache Zeppelin notebooks. You can use these notebooks for interactive data processing and visualization. See [Use Apache Zeppelin notebooks with Apache Spark](apache-spark-zeppelin-notebook.md) and [Load data and run queries on an Apache Spark cluster](apache-spark-load-data-run-query.md).|
3333
| REST APIs |Spark clusters in HDInsight include [Apache Livy](https://github.com/cloudera/hue/tree/master/apps/spark/java#welcome-to-livy-the-rest-spark-server), a REST API-based Spark job server to remotely submit and monitor jobs. See [Use Apache Spark REST API to submit remote jobs to an HDInsight Spark cluster](apache-spark-livy-rest-interface.md).|
3434
| Support for Azure Data Lake Storage | Spark clusters in HDInsight can use Azure Data Lake Storage as both the primary storage or additional storage. For more information on Data Lake Storage, see [Overview of Azure Data Lake Storage](../../data-lake-store/data-lake-store-overview.md). |
3535
| Integration with Azure services |Spark cluster in HDInsight comes with a connector to Azure Event Hubs. You can build streaming applications using the Event Hubs, in addition to [Apache Kafka](https://kafka.apache.org/), which is already available as part of Spark. |
3636
| Support for ML Server | Support for ML Server in HDInsight is provided as the **ML Services** cluster type. You can set up an ML Services cluster to run distributed R computations with the speeds promised with a Spark cluster. For more information, see [What is ML Services in Azure HDInsight](../r-server/r-server-overview.md). |
3737
| Integration with third-party IDEs | HDInsight provides several IDE plugins that are useful to create and submit applications to an HDInsight Spark cluster. For more information, see [Use Azure Toolkit for IntelliJ IDEA](apache-spark-intellij-tool-plugin.md), [Use Spark & Hive Tools for VSCode](../hdinsight-for-vscode.md), and [Use Azure Toolkit for Eclipse](apache-spark-eclipse-tool-plugin.md).|
3838
| Concurrent Queries |Spark clusters in HDInsight support concurrent queries. This capability enables multiple queries from one user or multiple queries from various users and applications to share the same cluster resources. |
39-
| Caching on SSDs |You can choose to cache data either in memory or in SSDs attached to the cluster nodes. Caching in memory provides the best query performance but could be expensive. Caching in SSDs provides a great option for improving query performance without the need to create a cluster of a size that is required to fit the entire dataset in memory. |
39+
| Caching on SSDs |You can choose to cache data either in memory or in SSDs attached to the cluster nodes. Caching in memory provides the best query performance but could be expensive. Caching in SSDs provides a great option for improving query performance without the need to create a cluster of a size that is required to fit the entire dataset in memory. See [Improve performance of Apache Spark workloads using Azure HDInsight IO Cache](apache-spark-improve-performance-iocache.md). |
4040
| Integration with BI Tools |Spark clusters in HDInsight provide connectors for BI tools such as [Power BI](https://www.powerbi.com/) for data analytics. |
4141
| Pre-loaded Anaconda libraries |Spark clusters in HDInsight come with Anaconda libraries pre-installed. [Anaconda](https://docs.continuum.io/anaconda/) provides close to 200 libraries for machine learning, data analysis, visualization, and so on. |
42-
| Scalability | HDInsight allows you to change the number of cluster nodes. Also, Spark clusters can be dropped with no loss of data since all the data is stored in Azure Storage or Data Lake Storage. |
42+
| Scalability | HDInsight allows you to change the number of cluster nodes dynamically with the Autoscale feature. See [Automatically scale Azure HDInsight clusters](../hdinsight-autoscale-clusters.md). Also, Spark clusters can be dropped with no loss of data since all the data is stored in Azure Storage or Data Lake Storage. |
4343
| SLA |Spark clusters in HDInsight come with 24/7 support and an SLA of 99.9% up-time. |
4444

4545
Apache Spark clusters in HDInsight include the following components that are available on the clusters by default.
@@ -70,22 +70,25 @@ The SparkContext connects to the Spark master and is responsible for converting
7070

7171
Spark clusters in HDInsight enable the following key scenarios:
7272

73-
* Interactive data analysis and BI
73+
### Interactive data analysis and BI
7474

75-
Apache Spark in HDInsight stores data in Azure Storage or Azure Data Lake Storage. Business experts and key decision makers can analyze and build reports over that data and use Microsoft Power BI to build interactive reports from the analyzed data. Analysts can start from unstructured/semi structured data in cluster storage, define a schema for the data using notebooks, and then build data models using Microsoft Power BI. Spark clusters in HDInsight also support a number of third-party BI tools such as Tableau making it easier for data analysts, business experts, and key decision makers.
75+
Apache Spark in HDInsight stores data in Azure Storage or Azure Data Lake Storage. Business experts and key decision makers can analyze and build reports over that data and use Microsoft Power BI to build interactive reports from the analyzed data. Analysts can start from unstructured/semi structured data in cluster storage, define a schema for the data using notebooks, and then build data models using Microsoft Power BI. Spark clusters in HDInsight also support a number of third-party BI tools such as Tableau making it easier for data analysts, business experts, and key decision makers.
7676

77-
[Tutorial: Visualize Spark data using Power BI](apache-spark-use-bi-tools.md)
77+
* [Tutorial: Visualize Spark data using Power BI](apache-spark-use-bi-tools.md)
7878

79-
* Spark Machine Learning
79+
### Spark Machine Learning
8080

81-
Apache Spark comes with [MLlib](https://spark.apache.org/mllib/), a machine learning library built on top of Spark that you can use from a Spark cluster in HDInsight. Spark cluster in HDInsight also includes Anaconda, a Python distribution with different kinds of packages for machine learning. Couple this with a built-in support for Jupyter and Zeppelin notebooks, and you have an environment for creating machine learning applications.
81+
Apache Spark comes with [MLlib](https://spark.apache.org/mllib/), a machine learning library built on top of Spark that you can use from a Spark cluster in HDInsight. Spark cluster in HDInsight also includes Anaconda, a Python distribution with different kinds of packages for machine learning. Couple this with a built-in support for Jupyter and Zeppelin notebooks, and you have an environment for creating machine learning applications.
8282

83-
[Tutorial: Predict building temperatures using HVAC data](apache-spark-ipython-notebook-machine-learning.md)
84-
[Tutorial: Predict food inspection results](apache-spark-machine-learning-mllib-ipython.md)
83+
* [Tutorial: Predict building temperatures using HVAC data](apache-spark-ipython-notebook-machine-learning.md)
84+
* [Tutorial: Predict food inspection results](apache-spark-machine-learning-mllib-ipython.md)
8585

86-
* Spark streaming and real-time data analysis
86+
### Spark streaming and real-time data analysis
8787

88-
Spark clusters in HDInsight offer a rich support for building real-time analytics solutions. While Spark already has connectors to ingest data from many sources like Kafka, Flume, Twitter, ZeroMQ, or TCP sockets, Spark in HDInsight adds first-class support for ingesting data from Azure Event Hubs. Event Hubs is the most widely used queuing service on Azure. Having an out-of-the-box support for Event Hubs makes Spark clusters in HDInsight an ideal platform for building real-time analytics pipeline.
88+
Spark clusters in HDInsight offer a rich support for building real-time analytics solutions. While Spark already has connectors to ingest data from many sources like Kafka, Flume, Twitter, ZeroMQ, or TCP sockets, Spark in HDInsight adds first-class support for ingesting data from Azure Event Hubs. Event Hubs is the most widely used queuing service on Azure. Having an out-of-the-box support for Event Hubs makes Spark clusters in HDInsight an ideal platform for building real-time analytics pipeline.
89+
90+
* [Overview of Apache Spark Streaming](apache-spark-streaming-overview.md)
91+
* [Overview of Apache Spark Structured Streaming](apache-spark-structured-streaming-overview.md)
8992

9093
## Where do I start?
9194

articles/media-services/video-indexer/compare-video-indexer-with-media-services-presets.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.workload: media
1313
ms.tgt_pltfrm: na
1414
ms.devlang: na
1515
ms.topic: article
16-
ms.date: 05/15/2019
16+
ms.date: 02/24/2020
1717
ms.author: juliako
1818

1919
---
@@ -31,9 +31,9 @@ Currently, there is an overlap between features offered by the [Video Indexer AP
3131
|Media Insights|[Enhanced](video-indexer-output-json-v2.md) |[Fundamentals](../latest/intelligence-concept.md)|
3232
|Experiences|See the full list of supported features: <br/> [Overview](video-indexer-overview.md)|Returns video insights only|
3333
|Billing|[Media Services pricing](https://azure.microsoft.com/pricing/details/media-services/#analytics)|[Media Services pricing](https://azure.microsoft.com/pricing/details/media-services/#analytics)|
34-
|Compliance|[ISO 27001](https://www.microsoft.com/TrustCenter/Compliance/ISO-IEC-27001), [ISO 27018](https://www.microsoft.com/trustcenter/Compliance/ISO-IEC-27018), [SOC 1,2,3](https://www.microsoft.com/TrustCenter/Compliance/SOC), [HIPAA](https://www.microsoft.com/trustcenter/compliance/hipaa), [FedRAMP](https://www.microsoft.com/TrustCenter/Compliance/fedramp), [PCI](https://www.microsoft.com/trustcenter/compliance/pci), and [HITRUST](https://www.microsoft.com/TrustCenter/Compliance/hitrust) certified. For the most current updates, visit [current certifications status of Video Indexer](https://gallery.technet.microsoft.com/Overview-of-Azure-c1be3942).|Media Services is compliant with many certifications. Check out [Azure Compliance Offerings.pdf](https://gallery.technet.microsoft.com/Overview-of-Azure-c1be3942/file/178110/23/Microsoft%20Azure%20Compliance%20Offerings.pdf) and search for "Media Services" to see if it complies with a certificate of interest.|
34+
|Compliance|For the most current compliance updates, visit [Azure Compliance Offerings.pdf](https://gallery.technet.microsoft.com/Overview-of-Azure-c1be3942/file/178110/23/Microsoft%20Azure%20Compliance%20Offerings.pdf) and search for "Video Indexer" to see if it complies with a certificate of interest.|For the most current compliance updates, visit [Azure Compliance Offerings.pdf](https://gallery.technet.microsoft.com/Overview-of-Azure-c1be3942/file/178110/23/Microsoft%20Azure%20Compliance%20Offerings.pdf) and search for "Media Services" to see if it complies with a certificate of interest.|
3535
|Free Trial|East US|Not available|
36-
|Region availability|East US 2, South Central US, West US 2, North Europe, West Europe, Southeast Asia, East Asia, and Australia East. For the most current updates, visit the [products by region](https://azure.microsoft.com/global-infrastructure/services/?products=cognitive-services) page.|See [Azure status](https://azure.microsoft.com/global-infrastructure/services/?products=media-services).|
36+
|Region availability|See [Cognitive Services availability by region](https://azure.microsoft.com/global-infrastructure/services/?products=cognitive-services)|See [Media Services availability by region](https://azure.microsoft.com/global-infrastructure/services/?products=media-services).|
3737

3838
## Next steps
3939

0 commit comments

Comments
 (0)