You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/stream-analytics-scale-with-machine-learning-functions.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ ms.date: 03/28/2017
14
14
It is straight forward to set up a Stream Analytics job and run some sample data through it. What should we do when we need to run the same job with higher data volume? It requires us to understand how to configure the Stream Analytics job so that it scales. In this document, we focus on the special aspects of scaling Stream Analytics jobs with Machine Learning functions. For information on how to scale Stream Analytics jobs in general see the article [Scaling jobs](stream-analytics-scale-jobs.md).
15
15
16
16
## What is an Azure Machine Learning function in Stream Analytics?
17
-
A Machine Learning function in Stream Analytics can be used like a regular function call in the Stream Analytics query language. However, behind the scene, the function calls are actually Azure Machine Learning Web Service requests. Machine Learning web services support "batching" multiple rows, which is called mini-batch, in the same web service API call, to improve overall throughput. See the following articles for more details;[Azure Machine Learning functions in Stream Analytics](https://blogs.technet.microsoft.com/machinelearning/2015/12/10/azure-ml-now-available-as-a-function-in-azure-stream-analytics/) and [Azure Machine Learning Web Services](../machine-learning/studio/consume-web-services.md).
17
+
A Machine Learning function in Stream Analytics can be used like a regular function call in the Stream Analytics query language. However, behind the scene, the function calls are actually Azure Machine Learning Web Service requests. Machine Learning web services support "batching" multiple rows, which is called mini-batch, in the same web service API call, to improve overall throughput. For more information, see[Azure Machine Learning functions in Stream Analytics](https://blogs.technet.microsoft.com/machinelearning/2015/12/10/azure-ml-now-available-as-a-function-in-azure-stream-analytics/) and [Azure Machine Learning Web Services](../machine-learning/studio/consume-web-services.md).
18
18
19
19
## Configure a Stream Analytics job with Machine Learning functions
20
20
When configuring a Machine Learning function for Stream Analytics job, there are two parameters to consider, the batch size of the Machine Learning function calls, and the streaming units (SUs) provisioned for the Stream Analytics job. To determine the appropriate values for these, first a decision must be made between latency and throughput, that is, latency of the Stream Analytics job, and throughput of each SU. SUs may always be added to a job to increase throughput of a well partitioned Stream Analytics query, although additional SUs increase the cost of running the job.
@@ -50,7 +50,7 @@ The query is a simple fully partitioned query followed by the **sentiment** func
50
50
51
51
Consider the following scenario; with a throughput of 10,000 tweets per second a Stream Analytics job must be created to perform sentiment analysis of the tweets (events). Using 1 SU, could this Stream Analytics job be able to handle the traffic? Using the default batch size of 1000 the job should be able to keep up with the input. Further the added Machine Learning function should generate no more than a second of latency, which is the general default latency of the sentiment analysis Machine Learning web service (with a default batch size of 1000). The Stream Analytics job’s **overall** or end-to-end latency would typically be a few seconds. Take a more detailed look into this Stream Analytics job, *especially* the Machine Learning function calls. Having the batch size as 1000, a throughput of 10,000 events take about 10 requests to web service. Even with 1 SU, there are enough concurrent connections to accommodate this input traffic.
52
52
53
-
But what if the input event rate increases by 100x and now the Stream Analytics job needs to process 1,000,000 tweets per second? There are two options:
53
+
If the input event rate increases by 100x, then the Stream Analytics job needs to process 1,000,000 tweets per second. There are two options to accomplish the increased scale:
54
54
55
55
1. Increase the batch size, or
56
56
2. Partition the input stream to process the events in parallel
@@ -61,7 +61,7 @@ With the second option, more SUs would need to be provisioned and therefore gene
61
61
62
62
Assume the latency of the sentiment analysis Machine Learning web service is 200 ms for 1000-event batches or below, 250 ms for 5,000-event batches, 300 ms for 10,000-event batches or 500 ms for 25,000-event batches.
63
63
64
-
1. Using the first option, (**not** provisioning more SUs), the batch size could be increased to **25,000**. This in turn would allow the job to process 1,000,000 events with 20 concurrent connections to the Machine Learning web service (with a latency of 500 ms per call). So the additional latency of the Stream Analytics job due to the sentiment function requests against the Machine Learning web service requests would be increased from **200 ms** to **500 ms**. However, batch size **cannot** be increased infinitely as the Machine Learning web services requires the payload size of a request be 4 MB or smaller web service requests timeout after 100 seconds of operation.
64
+
1. Using the first option (**not** provisioning more SUs). The batch size could be increased to **25,000**. This in turn would allow the job to process 1,000,000 events with 20 concurrent connections to the Machine Learning web service (with a latency of 500 ms per call). So the additional latency of the Stream Analytics job due to the sentiment function requests against the Machine Learning web service requests would be increased from **200 ms** to **500 ms**. However, batch size **cannot** be increased infinitely as the Machine Learning web services requires the payload size of a request be 4 MB or smaller web service requests timeout after 100 seconds of operation.
65
65
2. Using the second option, the batch size is left at 1000, with 200 ms web service latency, every 20 concurrent connections to the web service would be able to process 1000 * 20 * 5 events = 100,000 per second. So to process 1,000,000 events per second, the job would need 60 SUs. Compared to the first option, Stream Analytics job would make more web service batch requests, in turn generating an increased cost.
66
66
67
67
Below is a table for the throughput of the Stream Analytics job for different SUs and batch sizes (in number of events per second).
0 commit comments