Skip to content

Commit 9fc0be9

Browse files
authored
Merge pull request #107916 from sidramadoss/patch-63
Azure ML UDFs
2 parents e409216 + 90e569c commit 9fc0be9

File tree

6 files changed

+186
-8
lines changed

6 files changed

+186
-8
lines changed

articles/stream-analytics/TOC.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,12 @@
6161
href: stream-analytics-sql-output-perf.md
6262
- name: Blob custom path patterns
6363
href: stream-analytics-custom-path-patterns-blob-storage-output.md
64+
- name: User-defined functions
65+
items:
66+
- name: Machine learning UDF
67+
href: machine-learning-udf.md
68+
- name: C# UDF
69+
href: stream-analytics-edge-csharp-udf-methods.md
6470
- name: Optimize your Stream Analytics job
6571
items:
6672
- name: Understand and adjust Streaming Units
@@ -187,8 +193,6 @@
187193
href: stream-analytics-tools-for-visual-studio-edge-jobs.md
188194
- name: Set up CI/CD pipeline
189195
href: stream-analytics-tools-for-visual-studio-cicd.md
190-
- name: Write .NET UDF
191-
href: stream-analytics-edge-csharp-udf-methods.md
192196
- name: Visual Studio Code
193197
items:
194198
- name: Test locally with sample data
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
---
2+
title: Integrate Azure Stream Analytics with Azure Machine Learning
3+
description: This article describes how to integrate an Azure Stream Analytics job with Azure Machine Learning models.
4+
author: sidram
5+
ms.author: sidram
6+
ms.reviewer: mamccrea
7+
ms.service: stream-analytics
8+
ms.topic: conceptual
9+
ms.date: 03/19/2020
10+
---
11+
# Integrate Azure Stream Analytics with Azure Machine Learning (Preview)
12+
13+
You can implement machine learning models as a user-defined function (UDF) in your Azure Stream Analytics jobs to do real-time scoring and predictions on your streaming input data. [Azure Machine Learning](../machine-learning/overview-what-is-azure-ml.md) allows you to use any popular open-source tool, such as Tensorflow, scikit-learn, or PyTorch, to prep, train, and deploy models.
14+
15+
> [!NOTE]
16+
> This functionality is in public preview. You can access this feature on the Azure portal only by using the [Stream Analytics portal preview link](https://aka.ms/asaportalpreview). This functionality is also available in the latest version of [Stream Analytics tools for Visual Studio](https://docs.microsoft.com/azure/stream-analytics/stream-analytics-tools-for-visual-studio-install).
17+
18+
## Prerequisites
19+
20+
Complete the following steps before you add a machine learning model as a function to your Stream Analytics job:
21+
22+
1. Use Azure Machine Learning to [deploy your model as a web service](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-and-where).
23+
24+
2. Your scoring script should have [sample inputs and outputs](../machine-learning/how-to-deploy-and-where.md#example-entry-script) which is used by Azure Machine Learning to generate a schema specification. Stream Analytics uses the schema to understand the function signature of your web service.
25+
26+
3. Make sure your web service accepts and returns JSON serialized data.
27+
28+
4. Deploy your model on [Azure Kubernetes Service](../machine-learning/how-to-deploy-and-where.md#choose-a-compute-target) for high-scale production deployments. If the web service is not able to handle the number of requests coming from your job, the performance of your Stream Analytics job will be degraded, which impacts latency.
29+
30+
## Add a machine learning model to your job
31+
32+
You can add Azure Machine Learning functions to your Stream Analytics job directly from the Azure portal.
33+
34+
1. Navigate to your Stream Analytics job in the Azure portal, and select **Functions** under **Job topology**. Then, select **Azure ML Service** from the **+ Add** dropdown menu.
35+
36+
![Add Azure ML UDF](./media/machine-learning-udf/add-azureml-udf.png)
37+
38+
2. Fill in the **Azure Machine Learning Service function** form with the following property values:
39+
40+
![Configure Azure ML UDF](./media/machine-learning-udf/configure-azureml-udf.png)
41+
42+
The following table describes each property of Azure ML Service functions in Stream Analytics.
43+
44+
|Property|Description|
45+
|--------|-----------|
46+
|Function alias|Enter a name to invoke the function in your query.|
47+
|Subscription|Your Azure subscription..|
48+
|Azure ML workspace|The Azure Machine Learning workspace you used to deploy your model as a web service.|
49+
|Deployments|The web service hosting your model.|
50+
|Function signature|The signature of your web service inferred from the API's schema specification. If your signature fails to load, check that you have provided sample input and output in your scoring script to automatically generate the schema.|
51+
|Number of parallel requests per partition|This is an advanced configuration to optimize high-scale throughput. This number represents the concurrent requests sent from each partition of your job to the web service. Jobs with six streaming units (SU) and lower have one partition. Jobs with 12 SUs have two partitions, 18 SUs have three partitions and so on.<br><br> For example, if your job has two partitions and you set this parameter to four, there will be eight concurrent requests from your job to your web service.|
52+
|Max batch count|This is an advanced configuration for optimizing high-scale throughput. This number represents the maximum number of events be batched together in a single request sent to your web service.|
53+
54+
## Supported input parameters
55+
56+
When your Stream Analytics query invokes an Azure Machine Learning UDF, the job creates a JSON serialized request to the web service. The request is based on a model-specific schema. You have to provide a sample input and output in your scoring script to [automatically generate a schema](../machine-learning/how-to-deploy-and-where.md#optional-automatic-schema-generation). The schema allows Stream Analytics to construct the JSON serialized request for any of the supported data types such as numpy, pandas and PySpark. Multiple input events can be batched together in a single request.
57+
58+
The following Stream Analytics query is an example of how to invoke an Azure Machine Learning UDF:
59+
60+
```SQL
61+
SELECT udf.score(<model-specific-data-structure>)
62+
INTO output
63+
FROM input
64+
```
65+
66+
Stream Analytics only supports passing one parameter for Azure Machine Learning functions. You may need to prepare your data before passing it as an input to machine learning UDF.
67+
68+
## Pass multiple input parameters to the UDF
69+
70+
Most common examples of inputs to machine learning models are numpy arrays and DataFrames. You can create an array using a JavaScript UDF, and create a JSON-serialized DataFrame using the `WITH` clause.
71+
72+
### Create an input array
73+
74+
You can create a JavaScript UDF which accepts *N* number of inputs and creates an array that can be used as input to your Azure Machine Learning UDF.
75+
76+
```javascript
77+
function createArray(vendorid, weekday, pickuphour, passenger, distance) {
78+
'use strict';
79+
var array = [vendorid, weekday, pickuphour, passenger, distance]
80+
return array;
81+
}
82+
```
83+
84+
Once you have added the JavaScript UDF to your job, you can invoke your Azure Machine Learning UDF using the following query:
85+
86+
```SQL
87+
SELECT udf.score(
88+
udf.createArray(vendorid, weekday, pickuphour, passenger, distance)
89+
)
90+
INTO output
91+
FROM input
92+
```
93+
94+
The following JSON is an example request:
95+
96+
```JSON
97+
{
98+
"data": [
99+
["1","Mon","12","1","5.8"],
100+
["2","Wed","10","2","10"]
101+
]
102+
}
103+
```
104+
105+
### Create a pandas or PySpark DataFrame
106+
107+
You can use the `WITH` clause to create a JSON serialized DataFrame that can be passed as input to your Azure Machine Learning UDF as shown below.
108+
109+
The following query creates a DataFrame by selecting the necessary fields and uses the DataFrame as input to the Azure Machine Learning UDF.
110+
111+
```SQL
112+
WITH
113+
Dataframe AS (
114+
SELECT vendorid, weekday, pickuphour, passenger, distance
115+
FROM input
116+
)
117+
118+
SELECT udf.score(Dataframe)
119+
INTO output
120+
FROM input
121+
```
122+
123+
The following JSON is an example request from the previous query:
124+
125+
```JSON
126+
{
127+
"data": [{
128+
"vendorid": "1",
129+
"weekday": "Mon",
130+
"pickuphour": "12",
131+
"passenger": "1",
132+
"distance": "5.8"
133+
}, {
134+
"vendorid": "2",
135+
"weekday": "Tue",
136+
"pickuphour": "10",
137+
"passenger": "2",
138+
"distance": "10"
139+
}
140+
]
141+
}
142+
```
143+
144+
## Optimize the performance for Azure Machine Learning UDFs
145+
146+
When you deploy your model to Azure Kubernetes Service, you can [profile your model to determine resource utilization](../machine-learning/how-to-deploy-and-where.md#profilemodel). You can also [enable App Insights for your deployments](../machine-learning/how-to-enable-app-insights.md) to understand request rates, response times, and failure rates.
147+
148+
If you have a scenario with high event throughput, you may need to change the following parameters in Stream Analytics to achieve optimal performance with low end-to-end latencies:
149+
150+
1. Max batch count.
151+
2. Number of parallel requests per partition.
152+
153+
### Determine the right batch size
154+
155+
After you have deployed your web service, you send sample request with varying batch sizes starting from 50 and increasing it in order of hundreds. For example, 200, 500, 1000, 2000 and so on. You'll notice that after a certain batch size, the latency of the response increases. The point after which latency of response increases should be the max batch count for your job.
156+
157+
### Determine the number of parallel requests per partition
158+
159+
At optimal scaling, your Stream Analytics job should be able to send multiple parallel requests to your web service and get a response within few milliseconds. The latency of the web service's response can directly impact the latency and performance of your Stream Analytics job. If the call from your job to the web service takes a long time, you will likely see an increase in watermark delay and may also see an increase in the number of backlogged input events.
160+
161+
To prevent such latency, ensure that your Azure Kubernetes Service (AKS) cluster has been provisioned with the [right number of nodes and replicas](../machine-learning/how-to-deploy-azure-kubernetes-service.md#using-the-cli). It's critical that your web service is highly available and returns successful responses. If your job receives a service unavailable response (503) from your web service, it will continuously retry with exponential back off. Any response other than success (200) and service unavailable (503) will cause your job to go to a failed state.
162+
163+
## Next steps
164+
165+
* [Tutorial: Azure Stream Analytics JavaScript user-defined functions](stream-analytics-javascript-user-defined-functions.md)
166+
* [Scale your Stream Analytics job with Azure Machine Learning Studio (classic) function](stream-analytics-scale-with-machine-learning-functions.md)
167+
36.3 KB
Loading
49.4 KB
Loading

articles/stream-analytics/stream-analytics-machine-learning-integration-tutorial.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,17 @@ ms.author: mamccrea
66
ms.reviewer: mamccrea
77
ms.service: stream-analytics
88
ms.topic: conceptual
9-
ms.date: 06/11/2019
9+
ms.date: 03/19/2020
1010
ms.custom: seodec18
1111
---
1212

13-
# Perform sentiment analysis with Azure Stream Analytics and Azure Machine Learning Studio (classic) (Preview)
13+
# Perform sentiment analysis with Azure Stream Analytics and Azure Machine Learning Studio (classic)
14+
1415
This article describes how to quickly set up a simple Azure Stream Analytics job that integrates Azure Machine Learning Studio (classic). You use a Machine Learning sentiment analytics model from the Cortana Intelligence Gallery to analyze streaming text data and determine the sentiment score in real time. Using the Cortana Intelligence Suite lets you accomplish this task without worrying about the intricacies of building a sentiment analytics model.
1516

17+
> [!TIP]
18+
> It is highly recommended to use [Azure Machine Learning UDFs](machine-learning-udf.md) instead of Azure Machine Learning Studio (classic) UDF for improved performance and reliability.
19+
1620
You can apply what you learn from this article to scenarios such as these:
1721

1822
* Analyzing real-time sentiment on streaming Twitter data.

articles/stream-analytics/stream-analytics-scale-with-machine-learning-functions.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,13 @@ ms.author: jeanb
66
ms.reviewer: mamccrea
77
ms.service: stream-analytics
88
ms.topic: conceptual
9-
ms.date: 06/21/2019
9+
ms.date: 03/16/2020
1010
---
1111
# Scale your Stream Analytics job with Azure Machine Learning Studio (classic) functions
1212

13+
> [!TIP]
14+
> It is highly recommended to use [Azure Machine Learning UDFs](machine-learning-udf.md) instead of Azure Machine Learning Studio (classic) UDF for improved performance and reliability.
15+
1316
This article discusses how to efficiently scale Azure Stream Analytics jobs that use Azure Machine Learning functions. For information on how to scale Stream Analytics jobs in general see the article [Scaling jobs](stream-analytics-scale-jobs.md).
1417

1518
## What is an Azure Machine Learning function in Stream Analytics?
@@ -48,7 +51,7 @@ In general, ***B*** for batch size, ***L*** for the web service latency at batch
4851

4952
![Scale Stream Analytics with Machine Learning Functions Formula](./media/stream-analytics-scale-with-ml-functions/stream-analytics-scale-with-ml-functions-02.png "Scale Stream Analytics with Machine Learning Functions Formula")
5053

51-
You can also configure the 'max concurrent calls' on the Machine Learning web service. Its recommended to set this parameter to the maximum value (200 currently).
54+
You can also configure the 'max concurrent calls' on the Machine Learning web service. It's recommended to set this parameter to the maximum value (200 currently).
5255

5356
For more information on this setting, review the [Scaling article for Machine Learning Web Services](../machine-learning/studio/scaling-webservice.md).
5457

@@ -71,7 +74,7 @@ Let's examine the configuration necessary to create a Stream Analytics job, whic
7174

7275
Using 1 SU, could this Stream Analytics job handle the traffic? The job can keep up with the input using the default batch size of 1000. The default latency of the sentiment analysis Machine Learning web service (with a default batch size of 1000) creates no more than a second of latency.
7376

74-
The Stream Analytics jobs **overall** or end-to-end latency would typically be a few seconds. Take a more detailed look into this Stream Analytics job, *especially* the Machine Learning function calls. With a batch size of 1000, a throughput of 10,000 events takes about 10 requests to the web service. Even with one SU, there are enough concurrent connections to accommodate this input traffic.
77+
The Stream Analytics job's **overall** or end-to-end latency would typically be a few seconds. Take a more detailed look into this Stream Analytics job, *especially* the Machine Learning function calls. With a batch size of 1000, a throughput of 10,000 events takes about 10 requests to the web service. Even with one SU, there are enough concurrent connections to accommodate this input traffic.
7578

7679
If the input event rate increases by 100x, then the Stream Analytics job needs to process 1,000,000 tweets per second. There are two options to accomplish the increased scale:
7780

@@ -109,7 +112,7 @@ Below is a table for the throughput of the Stream Analytics job for different SU
109112

110113
By now, you should already have a good understanding of how Machine Learning functions in Stream Analytics work. You likely also understand that Stream Analytics jobs "pull" data from data sources and each "pull" returns a batch of events for the Stream Analytics job to process. How does this pull model impact the Machine Learning web service requests?
111114

112-
Normally, the batch size we set for Machine Learning functions wont exactly be divisible by the number of events returned by each Stream Analytics job "pull". When this occurs, the Machine Learning web service is called with "partial" batches. Using partial batches avoids incurring additional job latency overhead in coalescing events from pull to pull.
115+
Normally, the batch size we set for Machine Learning functions won't exactly be divisible by the number of events returned by each Stream Analytics job "pull". When this occurs, the Machine Learning web service is called with "partial" batches. Using partial batches avoids incurring additional job latency overhead in coalescing events from pull to pull.
113116

114117
## New function-related monitoring metrics
115118
In the Monitor area of a Stream Analytics job, three additional function-related metrics have been added. They are **FUNCTION REQUESTS**, **FUNCTION EVENTS** and **FAILED FUNCTION REQUESTS**, as shown in the graphic below.

0 commit comments

Comments
 (0)