Skip to content

Commit 402fd8d

Browse files
committed
lower-case big data, updates for synapseml, formatting fixes
1 parent 3f53942 commit 402fd8d

10 files changed

+92
-76
lines changed

articles/cognitive-services/TOC.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@
8787
href: ./security-controls-policy.md
8888
- name: Use with big data
8989
items:
90-
- name: Cognitive Services for Big Data
90+
- name: Cognitive Services for big data
9191
href: ./big-data/cognitive-services-for-big-data.md
9292
- name: Getting started
9393
href: ./big-data/getting-started.md

articles/cognitive-services/big-data/cognitive-services-for-big-data.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: "Cognitive Services for Big Data"
3-
description: Learn how to leverage Azure Cognitive Services on large datasets using Python, Java, and Scala. With Cognitive Services for Big Data you can embed continuously improving, intelligent models directly into Apache Spark™ and SQL computations.
2+
title: "Cognitive Services for big data"
3+
description: Learn how to leverage Azure Cognitive Services on large datasets using Python, Java, and Scala. With Cognitive Services for big data you can embed continuously improving, intelligent models directly into Apache Spark™ and SQL computations.
44
services: cognitive-services
55
author: mhamilton723
66
manager: nitinme
@@ -10,17 +10,17 @@ ms.date: 10/28/2021
1010
ms.author: marhamil
1111
---
1212

13-
# Azure Cognitive Services for Big Data
13+
# Azure Cognitive Services for big data
1414

15-
![Azure Cognitive Services for Big Data](media/cognitive-services-big-data-overview.svg)
15+
![Azure Cognitive Services for big data](media/cognitive-services-big-data-overview.svg)
1616

17-
The Azure Cognitive Services for Big Data lets users channel terabytes of data through Cognitive Services using [Apache Spark™](/dotnet/spark/what-is-spark). With the Cognitive Services for Big Data, it's easy to create large-scale intelligent applications with any datastore.
17+
The Azure Cognitive Services for big data lets users channel terabytes of data through Cognitive Services using [Apache Spark™](/dotnet/spark/what-is-spark). With the Cognitive Services for big data, it's easy to create large-scale intelligent applications with any datastore.
1818

19-
With Cognitive Services for Big Data you can embed continuously improving, intelligent models directly into Apache Spark™ and SQL computations. These tools liberate developers from low-level networking details, so that they can focus on creating smart, distributed applications.
19+
With Cognitive Services for big data you can embed continuously improving, intelligent models directly into Apache Spark™ and SQL computations. These tools liberate developers from low-level networking details, so that they can focus on creating smart, distributed applications.
2020

2121
## Features and benefits
2222

23-
Cognitive Services for Big Data can use services from any region in the world, as well as [containerized Cognitive Services](../cognitive-services-container-support.md). Containers support low or no connectivity deployments with ultra-low latency responses. Containerized Cognitive Services can be run locally, directly on the worker nodes of your Spark cluster, or on an external orchestrator like Kubernetes.
23+
Cognitive Services for big data can use services from any region in the world, as well as [containerized Cognitive Services](../cognitive-services-container-support.md). Containers support low or no connectivity deployments with ultra-low latency responses. Containerized Cognitive Services can be run locally, directly on the worker nodes of your Spark cluster, or on an external orchestrator like Kubernetes.
2424

2525
## Supported services
2626

@@ -57,9 +57,9 @@ Cognitive Services for Big Data can use services from any region in the world, a
5757
|:-----------|:------------------|
5858
|[Bing Image Search](/azure/cognitive-services/bing-image-search "Bing Image Search")|The Bing Image Search service returns a display of images determined to be relevant to the user's query.|
5959

60-
## Supported programming languages for Cognitive Services for Big Data
60+
## Supported programming languages for Cognitive Services for big data
6161

62-
The Cognitive Services for Big Data are built on Apache Spark. Apache Spark is a distributed computing library that supports Java, Scala, Python, R, and many other languages. These languages are currently supported.
62+
The Cognitive Services for big data are built on Apache Spark. Apache Spark is a distributed computing library that supports Java, Scala, Python, R, and many other languages. These languages are currently supported.
6363

6464
### Python
6565

@@ -71,7 +71,7 @@ We provide a Scala and Java-based Spark API in the `com.microsoft.ml.spark.cogni
7171

7272
## Supported platforms and connectors
7373

74-
The Cognitive Services for Big Data requires Apache Spark. There are several Apache Spark platforms that support the Cognitive Services for Big Data.
74+
The Cognitive Services for big data requires Apache Spark. There are several Apache Spark platforms that support the Cognitive Services for big data.
7575

7676
### Azure Databricks
7777

@@ -100,15 +100,15 @@ The basis of Spark is the DataFrame: a tabular collection of data distributed ac
100100
- Do SQL-style computations such as join and filter tables.
101101
- Apply functions to large datasets using MapReduce style parallelism.
102102
- Apply Distributed Machine Learning using Microsoft Machine Learning for Apache Spark.
103-
- Use the Cognitive Services for Big Data to enrich your data with ready-to-use intelligent services.
103+
- Use the Cognitive Services for big data to enrich your data with ready-to-use intelligent services.
104104

105105
### Microsoft Machine Learning for Apache Spark (MMLSpark)
106106

107-
[Microsoft Machine Learning for Apache Spark](https://mmlspark.blob.core.windows.net/website/index.html#install) (MMLSpark) is an open-source, distributed machine learning library (ML) built on Apache Spark. The Cognitive Services for Big Data is included in this package. Additionally, MMLSpark contains several other ML tools for Apache Spark, such as LightGBM, Vowpal Wabbit, OpenCV, LIME, and more. With MMLSpark, you can build powerful predictive and analytical models from any Spark datasource.
107+
[Microsoft Machine Learning for Apache Spark](https://mmlspark.blob.core.windows.net/website/index.html#install) (MMLSpark) is an open-source, distributed machine learning library (ML) built on Apache Spark. The Cognitive Services for big data is included in this package. Additionally, MMLSpark contains several other ML tools for Apache Spark, such as LightGBM, Vowpal Wabbit, OpenCV, LIME, and more. With MMLSpark, you can build powerful predictive and analytical models from any Spark datasource.
108108

109109
### HTTP on Spark
110110

111-
Cognitive Services for Big Data is an example of how we can integrate intelligent web services with big data. Web services power many applications across the globe and most services communicate through the Hypertext Transfer Protocol (HTTP). To work with *arbitrary* web services at large scales, we provide HTTP on Spark. With HTTP on Spark, you can pass terabytes of data through any web service. Under the hood, we use this technology to power Cognitive Services for Big Data.
111+
Cognitive Services for big data is an example of how we can integrate intelligent web services with big data. Web services power many applications across the globe and most services communicate through the Hypertext Transfer Protocol (HTTP). To work with *arbitrary* web services at large scales, we provide HTTP on Spark. With HTTP on Spark, you can pass terabytes of data through any web service. Under the hood, we use this technology to power Cognitive Services for big data.
112112

113113
## Developer samples
114114

@@ -126,11 +126,11 @@ Cognitive Services for Big Data is an example of how we can integrate intelligen
126126

127127
- [The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Services](https://databricks.com/session/the-azure-cognitive-services-on-spark-clusters-with-embedded-intelligent-services)
128128
- [Spark Summit Keynote: Scalable AI for Good](https://databricks.com/session_eu19/scalable-ai-for-good)
129-
- [The Cognitive Services for Big Data in Cosmos DB](https://medius.studios.ms/Embed/Video-nc/B19-BRK3004?latestplayer=true&l=2571.208093)
129+
- [The Cognitive Services for big data in Cosmos DB](https://medius.studios.ms/Embed/Video-nc/B19-BRK3004?latestplayer=true&l=2571.208093)
130130
- [Lightning Talk on Large Scale Intelligent Microservices](https://www.youtube.com/watch?v=BtuhmdIy9Fk&t=6s)
131131

132132
## Next steps
133133

134-
- [Getting Started with the Cognitive Services for Big Data](getting-started.md)
134+
- [Getting Started with the Cognitive Services for big data](getting-started.md)
135135
- [Simple Python Examples](samples-python.md)
136136
- [Simple Scala Examples](samples-scala.md)

articles/cognitive-services/big-data/getting-started.md

Lines changed: 57 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: "Get started with Cognitive Services for Big Data"
3-
description: Set up your MMLSpark pipeline with Cognitive Services in Azure Databricks and run a sample.
2+
title: "Get started with Cognitive Services for big data"
3+
description: Set up your SynapseML or MMLSpark pipeline with Cognitive Services in Azure Databricks and run a sample.
44
services: cognitive-services
55
author: mhamilton723
66
manager: nitinme
77
ms.service: cognitive-services
88
ms.topic: how-to
9-
ms.date: 10/28/2021
9+
ms.date: 08/16/2022
1010
ms.author: marhamil
1111
ms.devlang: python
1212
ms.custom: mode-other
@@ -16,15 +16,16 @@ ms.custom: mode-other
1616

1717
Setting up your environment is the first step to building a pipeline for your data. After your environment is ready, running a sample is quick and easy.
1818

19-
In this article, we'll perform these steps to get you started:
19+
In this article, you'll perform these steps to get started:
2020

21-
1. [Create a Cognitive Services resource](#create-a-cognitive-services-resource)
22-
1. [Create an Apache Spark Cluster](#create-an-apache-spark-cluster)
23-
1. [Try a sample](#try-a-sample)
21+
> [!div class="checklist"]
22+
> * [Create a Cognitive Services resource](#create-a-cognitive-services-resource)
23+
> * [Create an Apache Spark cluster](#create-an-apache-spark-cluster)
24+
> * [Try a sample](#try-a-sample)
2425
2526
## Create a Cognitive Services resource
2627

27-
To use the Big Data Cognitive Services, you must first create a Cognitive Service for your workflow. There are two main types of Cognitive Services: cloud services hosted in Azure and containerized services managed by users. We recommend starting with the simpler cloud-based Cognitive Services.
28+
To work with big data in Cognitive Services, first create a Cognitive Services resource for your workflow. There are two main types of Cognitive Services: cloud services hosted in Azure and containerized services managed by users. We recommend starting with the simpler cloud-based Cognitive Services.
2829

2930
### Cloud services
3031

@@ -46,21 +47,30 @@ Follow [this guide](../cognitive-services-container-support.md?tabs=luis) to cre
4647

4748
## Create an Apache Spark cluster
4849

49-
[Apache Spark™](http://spark.apache.org/) is a distributed computing framework designed for big-data data processing. Users can work with Apache Spark in Azure with services like Azure Databricks, Azure Synapse Analytics, HDInsight, and Azure Kubernetes Services. To use the Big Data Cognitive Services, you must first create a cluster. If you already have a Spark cluster, feel free to try an example.
50+
[Apache Spark™](http://spark.apache.org/) is a distributed computing framework designed for big-data data processing. Users can work with Apache Spark in Azure with services like Azure Databricks, Azure Synapse Analytics, HDInsight, and Azure Kubernetes Services. To use the big data Cognitive Services, you must first create a cluster. If you already have a Spark cluster, feel free to try an example.
5051

5152
### Azure Databricks
5253

53-
Azure Databricks is an Apache Spark-based analytics platform with a one-click setup, streamlined workflows, and an interactive workspace. It's often used to collaborate between data scientists, engineers, and business analysts. To use the Big Data Cognitive Services on Azure Databricks, follow these steps:
54+
Azure Databricks is an Apache Spark-based analytics platform with a one-click setup, streamlined workflows, and an interactive workspace. It's often used to collaborate between data scientists, engineers, and business analysts. To use the big data Cognitive Services on Azure Databricks, follow these steps:
5455

5556
1. [Create an Azure Databricks workspace](/azure/databricks/scenarios/quickstart-create-databricks-workspace-portal#create-an-azure-databricks-workspace)
57+
5658
1. [Create a Spark cluster in Databricks](/azure/databricks/scenarios/quickstart-create-databricks-workspace-portal#create-a-spark-cluster-in-databricks)
57-
1. Install the Big Data Cognitive Services
59+
60+
1. Install the SynapseML open-source library (or MMLSpark library if you're supporting a legacy application):
61+
5862
* Create a new library in your databricks workspace
5963
<img src="media/create-library.png" alt="Create library" width="50%"/>
60-
* Input the following maven coordinates
64+
65+
* For SynapseML: input the following maven coordinates
66+
Coordinates: `com.microsoft.azure:synapseml_2.12:0.10.0`
67+
Repository: default
68+
69+
* For MMLSpark (legacy): input the following maven coordinates
6170
Coordinates: `com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc3`
6271
Repository: `https://mmlspark.azureedge.net/maven`
6372
<img src="media/library-coordinates.png" alt="Library Coordinates" width="50%"/>
73+
6474
* Install the library onto a cluster
6575
<img src="media/install-library.png" alt="Install Library on Cluster" width="50%"/>
6676

@@ -69,9 +79,10 @@ Azure Databricks is an Apache Spark-based analytics platform with a one-click se
6979
Optionally, you can use Synapse Analytics to create a spark cluster. Azure Synapse Analytics brings together enterprise data warehousing and big data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources at scale. To get started using Azure Synapse Analytics, follow these steps:
7080

7181
1. [Create a Synapse Workspace (preview)](../../synapse-analytics/quickstart-create-workspace.md).
82+
7283
1. [Create a new serverless Apache Spark pool (preview) using the Azure portal](../../synapse-analytics/quickstart-create-apache-spark-pool-portal.md).
7384

74-
In Azure Synapse Analytics, Big Data for Cognitive Services is installed by default.
85+
In Azure Synapse Analytics, big data for Cognitive Services is installed by default.
7586

7687
### Azure Kubernetes Service
7788

@@ -80,12 +91,14 @@ If you're using containerized Cognitive Services, one popular option for deployi
8091
To get started on Azure Kubernetes Service, follow these steps:
8192

8293
1. [Deploy an Azure Kubernetes Service (AKS) cluster using the Azure portal](../../aks/learn/quick-kubernetes-deploy-portal.md)
94+
8395
1. [Install the Apache Spark 2.4.0 helm chart](https://hub.helm.sh/charts/microsoft/spark)
96+
8497
1. [Install a cognitive service container using Helm](../computer-vision/deploy-computer-vision-on-premises.md)
8598

8699
## Try a sample
87100

88-
After you set up your Spark cluster and environment, you can run a short sample. This section demonstrates how to use the Big Data for Cognitive Services in Azure Databricks.
101+
After you set up your Spark cluster and environment, you can run a short sample. This sample assumes Azure Databricks and the `mmlspark.cognitive` package.
89102

90103
First, you can create a notebook in Azure Databricks. For other Spark cluster providers, use their notebooks or Spark Submit.
91104

@@ -101,36 +114,39 @@ First, you can create a notebook in Azure Databricks. For other Spark cluster pr
101114

102115
1. Paste this code snippet into your new notebook.
103116

104-
```python
105-
from mmlspark.cognitive import *
106-
from pyspark.sql.functions import col
107-
108-
# Add your subscription key from the Language service (or a general Cognitive Service key)
109-
service_key = "ADD-SUBSCRIPTION-KEY-HERE"
110-
111-
df = spark.createDataFrame([
112-
("I am so happy today, its sunny!", "en-US"),
113-
("I am frustrated by this rush hour traffic", "en-US"),
114-
("The cognitive services on spark aint bad", "en-US"),
115-
], ["text", "language"])
116-
117-
sentiment = (TextSentiment()
118-
.setTextCol("text")
119-
.setLocation("eastus")
120-
.setSubscriptionKey(service_key)
121-
.setOutputCol("sentiment")
122-
.setErrorCol("error")
123-
.setLanguageCol("language"))
124-
125-
results = sentiment.transform(df)
126-
127-
# Show the results in a table
128-
display(results.select("text", col("sentiment")[0].getItem("score").alias("sentiment")))
129-
130-
```
117+
```python
118+
from mmlspark.cognitive import *
119+
from pyspark.sql.functions import col
120+
121+
# Add your region and subscription key from the Language service (or a general Cognitive Service key)
122+
# If using a multi-region Cognitive Services resource, delete the placeholder text: service_region = ""
123+
service_key = "ADD-SUBSCRIPTION-KEY-HERE"
124+
service_region = "ADD-SERVICE-REGION-HERE"
125+
126+
df = spark.createDataFrame([
127+
("I am so happy today, its sunny!", "en-US"),
128+
("I am frustrated by this rush hour traffic", "en-US"),
129+
("The cognitive services on spark aint bad", "en-US"),
130+
], ["text", "language"])
131+
132+
sentiment = (TextSentiment()
133+
.setTextCol("text")
134+
.setLocation(service_region)
135+
.setSubscriptionKey(service_key)
136+
.setOutputCol("sentiment")
137+
.setErrorCol("error")
138+
.setLanguageCol("language"))
139+
140+
results = sentiment.transform(df)
141+
142+
# Show the results in a table
143+
display(results.select("text", col("sentiment")[0].getItem("score").alias("sentiment")))
144+
```
131145

132146
1. Get your subscription key from the **Keys and Endpoint** menu from your Language resource in the Azure portal.
147+
133148
1. Replace the subscription key placeholder in your Databricks notebook code with your subscription key.
149+
134150
1. Select the play, or triangle, symbol in the upper right of your notebook cell to run the sample. Optionally, select **Run All** at the top of your notebook to run all cells. The answers will display below the cell in a table.
135151

136152
### Expected results

articles/cognitive-services/big-data/recipes/anomaly-detection.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: "Recipe: Predictive maintenance with the Cognitive Services for Big Data"
2+
title: "Recipe: Predictive maintenance with the Cognitive Services for big data"
33
titleSuffix: Azure Cognitive Services
4-
description: This quickstart shows how to perform distributed anomaly detection with the Cognitive Services for Big Data
4+
description: This quickstart shows how to perform distributed anomaly detection with the Cognitive Services for big data
55
services: cognitive-services
66
author: mhamilton723
77
manager: nitinme
@@ -14,7 +14,7 @@ ms.devlang: python
1414
ms.custom: devx-track-python
1515
---
1616

17-
# Recipe: Predictive maintenance with the Cognitive Services for Big Data
17+
# Recipe: Predictive maintenance with the Cognitive Services for big data
1818

1919
This recipe shows how you can use Azure Synapse Analytics and Cognitive Services on Apache Spark for predictive maintenance of IoT devices. We'll follow along with the [CosmosDB and Synapse Link](https://github.com/Azure-Samples/cosmosdb-synapse-link-samples) sample. To keep things simple, in this recipe we'll read the data straight from a CSV file rather than getting streamed data through CosmosDB and Synapse Link. We strongly encourage you to look over the Synapse Link sample.
2020

0 commit comments

Comments
 (0)