|
| 1 | +--- |
| 2 | +title: Integrate Azure Stream Analytics with Azure Machine Learning |
| 3 | +description: This article describes how to integrate an Azure Stream Analytics job with Azure Machine Learning models. |
| 4 | +author: sidram |
| 5 | +ms.author: sidram |
| 6 | +ms.reviewer: mamccrea |
| 7 | +ms.service: stream-analytics |
| 8 | +ms.topic: conceptual |
| 9 | +ms.date: 03/19/2020 |
| 10 | +--- |
| 11 | +# Integrate Azure Stream Analytics with Azure Machine Learning (Preview) |
| 12 | + |
| 13 | +You can implement machine learning models as a user-defined function (UDF) in your Azure Stream Analytics jobs to do real-time scoring and predictions on your streaming input data. [Azure Machine Learning](../machine-learning/overview-what-is-azure-ml.md) allows you to use any popular open-source tool, such as Tensorflow, scikit-learn, or PyTorch, to prep, train, and deploy models. |
| 14 | + |
| 15 | +> [!NOTE] |
| 16 | +> This functionality is in public preview. You can access this feature on the Azure portal only by using the [Stream Analytics portal preview link](https://aka.ms/asaportalpreview). This functionality is also available in the latest version of [Stream Analytics tools for Visual Studio](https://docs.microsoft.com/azure/stream-analytics/stream-analytics-tools-for-visual-studio-install). |
| 17 | +
|
| 18 | +## Prerequisites |
| 19 | + |
| 20 | +Complete the following steps before you add a machine learning model as a function to your Stream Analytics job: |
| 21 | + |
| 22 | +1. Use Azure Machine Learning to [deploy your model as a web service](https://docs.microsoft.com/azure/machine-learning/how-to-deploy-and-where). |
| 23 | + |
| 24 | +2. Your scoring script should have [sample inputs and outputs](../machine-learning/how-to-deploy-and-where.md#example-entry-script) which is used by Azure Machine Learning to generate a schema specification. Stream Analytics uses the schema to understand the function signature of your web service. |
| 25 | + |
| 26 | +3. Make sure your web service accepts and returns JSON serialized data. |
| 27 | + |
| 28 | +4. Deploy your model on [Azure Kubernetes Service](../machine-learning/how-to-deploy-and-where.md#choose-a-compute-target) for high-scale production deployments. If the web service is not able to handle the number of requests coming from your job, the performance of your Stream Analytics job will be degraded, which impacts latency. |
| 29 | + |
| 30 | +## Add a machine learning model to your job |
| 31 | + |
| 32 | +You can add Azure Machine Learning functions to your Stream Analytics job directly from the Azure portal. |
| 33 | + |
| 34 | +1. Navigate to your Stream Analytics job in the Azure portal, and select **Functions** under **Job topology**. Then, select **Azure ML Service** from the **+ Add** dropdown menu. |
| 35 | + |
| 36 | +  |
| 37 | + |
| 38 | +2. Fill in the **Azure Machine Learning Service function** form with the following property values: |
| 39 | + |
| 40 | +  |
| 41 | + |
| 42 | +The following table describes each property of Azure ML Service functions in Stream Analytics. |
| 43 | + |
| 44 | +|Property|Description| |
| 45 | +|--------|-----------| |
| 46 | +|Function alias|Enter a name to invoke the function in your query.| |
| 47 | +|Subscription|Your Azure subscription..| |
| 48 | +|Azure ML workspace|The Azure Machine Learning workspace you used to deploy your model as a web service.| |
| 49 | +|Deployments|The web service hosting your model.| |
| 50 | +|Function signature|The signature of your web service inferred from the API's schema specification. If your signature fails to load, check that you have provided sample input and output in your scoring script to automatically generate the schema.| |
| 51 | +|Number of parallel requests per partition|This is an advanced configuration to optimize high-scale throughput. This number represents the concurrent requests sent from each partition of your job to the web service. Jobs with six streaming units (SU) and lower have one partition. Jobs with 12 SUs have two partitions, 18 SUs have three partitions and so on.<br><br> For example, if your job has two partitions and you set this parameter to four, there will be eight concurrent requests from your job to your web service.| |
| 52 | +|Max batch count|This is an advanced configuration for optimizing high-scale throughput. This number represents the maximum number of events be batched together in a single request sent to your web service.| |
| 53 | + |
| 54 | +## Supported input parameters |
| 55 | + |
| 56 | +When your Stream Analytics query invokes an Azure Machine Learning UDF, the job creates a JSON serialized request to the web service. The request is based on a model-specific schema. You have to provide a sample input and output in your scoring script to [automatically generate a schema](../machine-learning/how-to-deploy-and-where.md#optional-automatic-schema-generation). The schema allows Stream Analytics to construct the JSON serialized request for any of the supported data types such as numpy, pandas and PySpark. Multiple input events can be batched together in a single request. |
| 57 | + |
| 58 | +The following Stream Analytics query is an example of how to invoke an Azure Machine Learning UDF: |
| 59 | + |
| 60 | +```SQL |
| 61 | +SELECT udf.score(<model-specific-data-structure>) |
| 62 | +INTO output |
| 63 | +FROM input |
| 64 | +``` |
| 65 | + |
| 66 | +Stream Analytics only supports passing one parameter for Azure Machine Learning functions. You may need to prepare your data before passing it as an input to machine learning UDF. |
| 67 | + |
| 68 | +## Pass multiple input parameters to the UDF |
| 69 | + |
| 70 | +Most common examples of inputs to machine learning models are numpy arrays and DataFrames. You can create an array using a JavaScript UDF, and create a JSON-serialized DataFrame using the `WITH` clause. |
| 71 | + |
| 72 | +### Create an input array |
| 73 | + |
| 74 | +You can create a JavaScript UDF which accepts *N* number of inputs and creates an array that can be used as input to your Azure Machine Learning UDF. |
| 75 | + |
| 76 | +```javascript |
| 77 | +function createArray(vendorid, weekday, pickuphour, passenger, distance) { |
| 78 | + 'use strict'; |
| 79 | + var array = [vendorid, weekday, pickuphour, passenger, distance] |
| 80 | + return array; |
| 81 | +} |
| 82 | +``` |
| 83 | + |
| 84 | +Once you have added the JavaScript UDF to your job, you can invoke your Azure Machine Learning UDF using the following query: |
| 85 | + |
| 86 | +```SQL |
| 87 | +SELECT udf.score( |
| 88 | +udf.createArray(vendorid, weekday, pickuphour, passenger, distance) |
| 89 | +) |
| 90 | +INTO output |
| 91 | +FROM input |
| 92 | +``` |
| 93 | + |
| 94 | +The following JSON is an example request: |
| 95 | + |
| 96 | +```JSON |
| 97 | +{ |
| 98 | + "data": [ |
| 99 | + ["1","Mon","12","1","5.8"], |
| 100 | + ["2","Wed","10","2","10"] |
| 101 | + ] |
| 102 | +} |
| 103 | +``` |
| 104 | + |
| 105 | +### Create a pandas or PySpark DataFrame |
| 106 | + |
| 107 | +You can use the `WITH` clause to create a JSON serialized DataFrame that can be passed as input to your Azure Machine Learning UDF as shown below. |
| 108 | + |
| 109 | +The following query creates a DataFrame by selecting the necessary fields and uses the DataFrame as input to the Azure Machine Learning UDF. |
| 110 | + |
| 111 | +```SQL |
| 112 | +WITH |
| 113 | +Dataframe AS ( |
| 114 | +SELECT vendorid, weekday, pickuphour, passenger, distance |
| 115 | +FROM input |
| 116 | +) |
| 117 | + |
| 118 | +SELECT udf.score(Dataframe) |
| 119 | +INTO output |
| 120 | +FROM input |
| 121 | +``` |
| 122 | + |
| 123 | +The following JSON is an example request from the previous query: |
| 124 | + |
| 125 | +```JSON |
| 126 | +{ |
| 127 | + "data": [{ |
| 128 | + "vendorid": "1", |
| 129 | + "weekday": "Mon", |
| 130 | + "pickuphour": "12", |
| 131 | + "passenger": "1", |
| 132 | + "distance": "5.8" |
| 133 | + }, { |
| 134 | + "vendorid": "2", |
| 135 | + "weekday": "Tue", |
| 136 | + "pickuphour": "10", |
| 137 | + "passenger": "2", |
| 138 | + "distance": "10" |
| 139 | + } |
| 140 | + ] |
| 141 | +} |
| 142 | +``` |
| 143 | + |
| 144 | +## Optimize the performance for Azure Machine Learning UDFs |
| 145 | + |
| 146 | +When you deploy your model to Azure Kubernetes Service, you can [profile your model to determine resource utilization](../machine-learning/how-to-deploy-and-where.md#profilemodel). You can also [enable App Insights for your deployments](../machine-learning/how-to-enable-app-insights.md) to understand request rates, response times, and failure rates. |
| 147 | + |
| 148 | +If you have a scenario with high event throughput, you may need to change the following parameters in Stream Analytics to achieve optimal performance with low end-to-end latencies: |
| 149 | + |
| 150 | +1. Max batch count. |
| 151 | +2. Number of parallel requests per partition. |
| 152 | + |
| 153 | +### Determine the right batch size |
| 154 | + |
| 155 | +After you have deployed your web service, you send sample request with varying batch sizes starting from 50 and increasing it in order of hundreds. For example, 200, 500, 1000, 2000 and so on. You'll notice that after a certain batch size, the latency of the response increases. The point after which latency of response increases should be the max batch count for your job. |
| 156 | + |
| 157 | +### Determine the number of parallel requests per partition |
| 158 | + |
| 159 | +At optimal scaling, your Stream Analytics job should be able to send multiple parallel requests to your web service and get a response within few milliseconds. The latency of the web service's response can directly impact the latency and performance of your Stream Analytics job. If the call from your job to the web service takes a long time, you will likely see an increase in watermark delay and may also see an increase in the number of backlogged input events. |
| 160 | + |
| 161 | +To prevent such latency, ensure that your Azure Kubernetes Service (AKS) cluster has been provisioned with the [right number of nodes and replicas](../machine-learning/how-to-deploy-azure-kubernetes-service.md#using-the-cli). It's critical that your web service is highly available and returns successful responses. If your job receives a service unavailable response (503) from your web service, it will continuously retry with exponential back off. Any response other than success (200) and service unavailable (503) will cause your job to go to a failed state. |
| 162 | + |
| 163 | +## Next steps |
| 164 | + |
| 165 | +* [Tutorial: Azure Stream Analytics JavaScript user-defined functions](stream-analytics-javascript-user-defined-functions.md) |
| 166 | +* [Scale your Stream Analytics job with Azure Machine Learning Studio (classic) function](stream-analytics-scale-with-machine-learning-functions.md) |
| 167 | + |
0 commit comments