Skip to content

Commit 20b9475

Browse files
committed
edits
1 parent c440db4 commit 20b9475

9 files changed

+173
-142
lines changed

articles/stream-analytics/TOC.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,8 +63,8 @@
6363
href: stream-analytics-custom-path-patterns-blob-storage-output.md
6464
- name: User-defined functions
6565
items:
66-
- name: Azure ML UDF
67-
href: stream-analytics-define-outputs.md
66+
- name: Machine learning UDF
67+
href: machine-learning-udf.md
6868
- name: C# UDF
6969
href: stream-analytics-edge-csharp-udf-methods.md
7070
- name: Optimize your Stream Analytics job

articles/stream-analytics/azuremlservice-udf.md

Lines changed: 0 additions & 134 deletions
This file was deleted.
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
---
2+
title: Integrate Azure Stream Analytics with Azure Machine Learning
3+
description: This article describes how to integrate an Azure Stream Analytics job with Azure Machine Learning models.
4+
author: sidram
5+
ms.author: sidram
6+
ms.reviewer: mamccrea
7+
ms.service: stream-analytics
8+
ms.topic: conceptual
9+
ms.date: 03/19/2020
10+
---
11+
# Integrate Azure Stream Analytics with Azure Machine Learning
12+
13+
You can implement machine learning models as a user-defined function (UDF) in your Azure Stream Analytics jobs to do real-time scoring and predictions on your streaming input data. [Azure Machine Learning](../machine-learning/overview-what-is-azure-ml.md) allows you to use any popular open-source tool, such as Tensorflow, scikit-learn, or PyTorch, to prep, train, and deploy models.
14+
15+
## Prerequisites
16+
17+
Complete the following steps before you add a machine learning model as a function to your Stream Analytics job:
18+
19+
1. Use Azure Machine Learning to [deploy your model as a web service](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-and-where).
20+
21+
2. Your scoring script should have [sample inputs and outputs](../machine-learning/how-to-deploy-and-where.md#example-entry-script) which is used by Azure Machine Learning to generate a schema specification. Stream Analytics uses the schema to understand the function signature of your web service.
22+
23+
3. Make sure your web service accepts and returns JSON serialized data.
24+
25+
4. Deploy your model on [Azure Kubernetes Service](../machine-learning/how-to-deploy-and-where.md#choose-a-compute-target) for high-scale production deployments. If the web service is not able to handle the number of requests coming from your job, the performance of your Stream Analytics job will be degraded, which impacts latency.
26+
27+
## Add a machine learning model to your job
28+
29+
You can add Azure Machine Learning functions to your Stream Analytics job directly from the Azure portal.
30+
31+
1. Navigate to your Stream Analytics job in the Azure portal, and select **Functions** under **Job topology**. Then, select **Azure ML Service** from the **+ Add** dropdown menu.
32+
33+
![Add Azure ML UDF](./media/machine-learning-udf/add-azureml-udf.png)
34+
35+
2. Fill in the **Azure Machine Learning Service function** form with the following property values:
36+
37+
![Configure Azure ML UDF](./media/machine-learning-udf/configure-azureml-udf.png)
38+
39+
The following table describes each property of Azure ML Service functions in Stream Analytics.
40+
41+
|Property|Description|
42+
|--------|-----------|
43+
|Function alias|Enter a name to invoke the function in your query.|
44+
|Subscription|Your Azure subscription..|
45+
|Azure ML workspace|The Azure Machine Learning workspace you used to deploy your model as a web service.|
46+
|Deployments|The web service hosting your model.|
47+
|Function signature|The signature of your web service inferred from the API's schema specification. If your signature fails to load, check that you have provided sample input and output in your scoring script to automatically generate the schema.|
48+
|Number of parallel requests per partition|This is an advanced configuration to optimize high-scale throughput. This number represents the concurrent requests sent from each partition of your job to the web service. Jobs with six streaming units (SU) and lower have one partition. Jobs with 12 SUs have two partitions, 18 SUs have three partitions and so on.<br><br> For example, if your job has two partitions and you set this parameter to four, there will be eight concurrent requests from your job to your web service.|
49+
|Max batch count|This is an advanced configuration for optimizing high-scale throughput. This number represents the maximum number of events be batched together in a single request sent to your web service.|
50+
51+
## Supported input parameters
52+
53+
When your Stream Analytics query invokes an Azure Machine Learning UDF, the job creates a JSON serialized request to the web service. The request is based on a model-specific schema. You have to provide a sample input and output in your scoring script to [automatically generate a schema](../machine-learning/how-to-deploy-and-where.md#optional-automatic-schema-generation). The schema allows Stream Analytics to construct the JSON serialized request for any of the supported data types such as numpy, pandas and PySpark. Multiple input events can be batched together in a single request.
54+
55+
The following Stream Analytics query is an example of how to invoke an Azure Machine Learning UDF:
56+
57+
```SQL
58+
SELECT udf.score(<model-specific-data-structure>)
59+
INTO output
60+
FROM input
61+
```
62+
63+
Stream Analytics only supports passing one parameter for Azure Machine Learning functions. You may need to prepare your data before passing it as an input to machine learning UDF.
64+
65+
## Pass multiple input parameters to the UDF
66+
67+
Most common examples of inputs to machine learning models are numpy arrays and DataFrames. You can create an array using a JavaScript UDF, and create a JSON-serialized DataFrame using the `WITH` clause.
68+
69+
### Create an input array
70+
71+
You can create a JavaScript UDF which accepts *N* number of inputs and creates an array that can be used as input to your Azure Machine Learning UDF.
72+
73+
```javascript
74+
function createArray(vendorid, weekday, pickuphour, passenger, distance) {
75+
'use strict';
76+
var array = [vendorid, weekday, pickuphour, passenger, distance]
77+
return array;
78+
}
79+
```
80+
81+
Once you have added the JavaScript UDF to your job, you can invoke your Azure Machine Learning UDF using the following query:
82+
83+
```SQL
84+
SELECT udf.score(
85+
udf.createArray(vendorid, weekday, pickuphour, passenger, distance)
86+
)
87+
INTO output
88+
FROM input
89+
```
90+
91+
The following JSON is an example request:
92+
93+
```JSON
94+
{
95+
"data": [
96+
["1","Mon","12","1","5.8"],
97+
["2","Wed","10","2","10"]
98+
]
99+
}
100+
```
101+
102+
### Create a pandas or PySpark DataFrame
103+
104+
You can use the `WITH` clause to create a JSON serialized DataFrame that can be passed as input to your Azure Machine Learning UDF as shown below.
105+
106+
The following query creates a DataFrame by selectng the necessary fields and uses the DataFrame as input to the Azure Machine Learning UDF.
107+
108+
```SQL
109+
WITH
110+
Dataframe AS (
111+
SELECT vendorid, weekday, pickuphour, passenger, distance
112+
FROM input
113+
)
114+
115+
SELECT udf.score(Dataframe)
116+
INTO output
117+
FROM input
118+
```
119+
120+
The following JSON is an example request from the previous query:
121+
122+
```JSON
123+
{
124+
"data": [{
125+
"vendorid": "1",
126+
"weekday": "Mon",
127+
"pickuphour": "12",
128+
"passenger": "1",
129+
"distance": "5.8"
130+
}, {
131+
"vendorid": "2",
132+
"weekday": "Tue",
133+
"pickuphour": "10",
134+
"passenger": "2",
135+
"distance": "10"
136+
}
137+
]
138+
}
139+
```
140+
141+
## Optimize the performance for Azure Machine Learning UDFs
142+
143+
When you deploy your model to Azure Kubernetes Service, you can [profile your model to determine resource utilization](../machine-learning/how-to-deploy-and-where.md#profilemodel). You can also [enable App Insights for your deployments](../machine-learning/how-to-enable-app-insights.md) to understand request rates, response times, and failure rates.
144+
145+
If you have a scenario with high event throughput, you may need to change the following parameters in Stream Analytics to achieve optimal performance with low end-to-end latencies:
146+
147+
1. Max batch count.
148+
2. Number of parallel requests per partition.
149+
150+
### Determine the right batch size
151+
152+
After you have deployed your web service, you send sample request with varying batch sizes starting from 50 and increasing it in order of hundreds. For example, 200, 500, 1000, 2000 and so on. You'll notice that after a certain batch size, the latency of the response increases. The point after which latency of response increases should be the max batch count for your job.
153+
154+
### Determine the number of parallel requests per partition
155+
156+
At optimal scaling, your Stream Analytics job should be able to send multiple parallel requests to your web service and get a response within few milliseconds. The latency of the web service's response can directly impact the latency and performance of your Stream Analytics job. If the call from your job to the web service takes a long time, you will likely see an increase in watermark delay and may also see an increase in the number of backlogged input events.
157+
158+
To prevent such latency, ensure that your Azure Kubernetes Service (AKS) cluster has been provisioned with the [right number of nodes and replicas](../machine-learning/how-to-deploy-azure-kubernetes-service.md#using-the-cli). It's critical that your web service is highly available and returns successful responses. If your job receives a service unavailable response (503) from your web service, it will continuously retry with exponential back off. Any response other than success (200) and service unavailable (503) will cause your job to go to a failed state.
159+
160+
## Next steps
161+
162+
* [Tutorial: Azure Stream Analytics JavaScript user-defined functions](stream-analytics-javascript-user-defined-functions.md)
163+
* [Scale your Stream Analytics job with Azure Machine Learning Studio (classic) function](stream-analytics-scale-with-machine-learning-functions.md)
164+
36.3 KB
Loading
49.4 KB
Loading
Binary file not shown.
Binary file not shown.

articles/stream-analytics/stream-analytics-machine-learning-integration-tutorial.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,12 @@ ms.custom: seodec18
1111
---
1212

1313
# Perform sentiment analysis with Azure Stream Analytics and Azure Machine Learning Studio (classic)
14-
> [!TIP]
15-
> It is highly recommended to use [Azure Machine Learning Service UDFs](azuremlservice-udf.md) instead of Azure Machine Learning Studio (classic) UDF for improved performance and reliability.
1614

1715
This article describes how to quickly set up a simple Azure Stream Analytics job that integrates Azure Machine Learning Studio (classic). You use a Machine Learning sentiment analytics model from the Cortana Intelligence Gallery to analyze streaming text data and determine the sentiment score in real time. Using the Cortana Intelligence Suite lets you accomplish this task without worrying about the intricacies of building a sentiment analytics model.
1816

17+
> [!TIP]
18+
> It is highly recommended to use [Azure Machine Learning UDFs](machine-learning-udf.md) instead of Azure Machine Learning Studio (classic) UDF for improved performance and reliability.
19+
1920
You can apply what you learn from this article to scenarios such as these:
2021

2122
* Analyzing real-time sentiment on streaming Twitter data.

0 commit comments

Comments
 (0)