You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/stream-analytics/no-code-stream-processing.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -165,7 +165,7 @@ The no-code editor now supports two reference data sources:
165
165
166
166
Reference data is modeled as a sequence of blobs in ascending order of the date/time combination specified in the blob name. You can add blobs to the end of the sequence only by using a date/time greater than the one that the last blob specified in the sequence. Blobs are defined in the input configuration.
167
167
168
-
First, under the **Inputs** section on the ribbon, select **Reference ADLS Gen2**. To see details about each field, see the section about Azure Blob Storage in [Use reference data for lookups in Stream Analytics](stream-analytics-use-reference-data.md#azure-blob-storage).
168
+
First, under the **Inputs** section on the ribbon, select **Reference ADLS Gen2**. To see details about each field, see the section about Azure Blob Storage in [Use reference data for lookups in Stream Analytics](stream-analytics-use-reference-data.md#azure-blob-storage-or-azure-data-lake-storage-gen-2).
169
169
170
170

Copy file name to clipboardExpand all lines: articles/stream-analytics/repartition.md
+25-11Lines changed: 25 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: This article describes how to use repartitioning to optimize Azure
4
4
ms.service: stream-analytics
5
5
author: ahartoon
6
6
ms.author: anboisve
7
-
ms.date: 12/21/2022
7
+
ms.date: 02/26/2024
8
8
ms.topic: conceptual
9
9
ms.custom: mvc
10
10
---
@@ -27,6 +27,7 @@ You can repartition your input in two ways:
27
27
28
28
### Creating a separate Stream Analytics job to repartition input
29
29
You can create a job that reads input and writes to an event hub output using a partition key. This event hub can then serve as input for another Stream Analytics job where you implement your analytics logic. When configuring this event hub output in your job, you must specify the partition key by which Stream Analytics will repartition your data.
30
+
30
31
```sql
31
32
-- For compat level 1.2 or higher
32
33
SELECT*
@@ -40,12 +41,13 @@ FROM input PARTITION BY PartitionId
40
41
```
41
42
42
43
### Repartition input within a single Stream Analytics job
43
-
You can also introduce a step in your query that first repartitions the input and this can then be used by other steps in your query. For example, if you want to repartition input based on **DeviceId**, your query would be:
44
+
You can also introduce a step in your query that first repartitions the input, which can then be used by other steps in your query. For example, if you want to repartition input based on **DeviceId**, your query would be:
45
+
44
46
```sql
45
47
WITH RepartitionedInput AS
46
48
(
47
-
SELECT*
48
-
FROM input PARTITION BY DeviceID
49
+
SELECT*
50
+
FROM input PARTITION BY DeviceID
49
51
)
50
52
51
53
SELECT DeviceID, AVG(Reading) as AvgNormalReading
@@ -54,13 +56,23 @@ FROM RepartitionedInput
54
56
GROUP BY DeviceId, TumblingWindow(minute, 1)
55
57
```
56
58
57
-
The following example query joins two streams of repartitioned data. When joining two streams of repartitioned data, the streams must have the same partition key and count. The outcome is a stream that has the same partition scheme.
59
+
The following example query joins two streams of repartitioned data. When you join two streams of repartitioned data, the streams must have the same partition key and count. The outcome is a stream that has the same partition scheme.
58
60
59
61
```sql
60
-
WITH step1 AS (SELECT*FROM input1 PARTITION BY DeviceID),
61
-
step2 AS (SELECT*FROM input2 PARTITION BY DeviceID)
62
+
WITH step1 AS
63
+
(
64
+
SELECT*FROM input1
65
+
PARTITION BY DeviceID
66
+
),
67
+
step2 AS
68
+
(
69
+
SELECT*FROM input2
70
+
PARTITION BY DeviceID
71
+
)
62
72
63
-
SELECT* INTO output FROM step1 PARTITION BY DeviceID UNION step2 PARTITION BY DeviceID
73
+
SELECT* INTO output
74
+
FROM step1 PARTITION BY DeviceID
75
+
UNION step2 PARTITION BY DeviceID
64
76
```
65
77
66
78
The output scheme should match the stream scheme key and count so that each substream can be flushed independently. The stream could also be merged and repartitioned again by a different scheme before flushing, but you should avoid that method because it adds to the general latency of the processing and increases resource utilization.
@@ -71,14 +83,16 @@ Experiment and observe the resource usage of your job to determine the exact num
71
83
72
84
## Repartitions for SQL output
73
85
74
-
When your job uses SQL database for output, use explicit repartitioning to match the optimal partition count to maximize throughput. Since SQL works best with eight writers, repartitioning the flow to eight before flushing, or somewhere further upstream, may benefit job performance.
86
+
When your job uses SQL database for output, use explicit repartitioning to match the optimal partition count to maximize throughput. Since SQL works best with eight writers, repartitioning the flow to eight before flushing, or somewhere further upstream, might benefit job performance.
75
87
76
88
When there are more than eight input partitions, inheriting the input partitioning scheme might not be an appropriate choice. Consider using [INTO](/stream-analytics-query/into-azure-stream-analytics#into-shard-count) in your query to explicitly specify the number of output writers.
77
89
78
90
The following example reads from the input, regardless of it being naturally partitioned, and repartitions the stream tenfold according to the DeviceID dimension and flushes the data to output.
79
91
80
92
```sql
81
-
SELECT* INTO [output] FROM [input] PARTITION BY DeviceID INTO 10
93
+
SELECT* INTO [output]
94
+
FROM [input]
95
+
PARTITION BY DeviceID INTO 10
82
96
```
83
97
84
98
For more information, see [Azure Stream Analytics output to Azure SQL Database](stream-analytics-sql-output-perf.md).
@@ -87,4 +101,4 @@ For more information, see [Azure Stream Analytics output to Azure SQL Database](
87
101
## Next steps
88
102
89
103
*[Get started with Azure Stream Analytics](stream-analytics-introduction.md)
90
-
*[Leverage query parallelization in Azure Stream Analytics](stream-analytics-parallelization.md)
104
+
*[Use query parallelization in Azure Stream Analytics](stream-analytics-parallelization.md)
Copy file name to clipboardExpand all lines: articles/stream-analytics/sql-database-upsert.md
+17-17Lines changed: 17 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,12 +3,12 @@ title: Update or merge records in Azure SQL Database with Azure Functions
3
3
description: This article describes how to use Azure Functions to update or merge records from Azure Stream Analytics to Azure SQL Database
4
4
ms.service: stream-analytics
5
5
ms.topic: how-to
6
-
ms.date: 12/03/2021
6
+
ms.date: 02/27/2024
7
7
---
8
8
9
9
# Update or merge records in Azure SQL Database with Azure Functions
10
10
11
-
Currently, [Azure Stream Analytics](./index.yml) (ASA) only supports inserting (appending) rows to SQL outputs ([Azure SQL Databases](./sql-database-output.md), and [Azure Synapse Analytics](./azure-synapse-analytics-output.md)). This article discusses workarounds to enable UPDATE, UPSERT, or MERGE on SQL databases, with Azure Functions as the intermediary layer.
11
+
Currently, [Azure Stream Analytics](./index.yml) (ASA) supports only inserting (appending) rows to SQL outputs ([Azure SQL Databases](./sql-database-output.md), and [Azure Synapse Analytics](./azure-synapse-analytics-output.md)). This article discusses workarounds to enable UPDATE, UPSERT, or MERGE on SQL databases, with Azure Functions as the intermediary layer.
12
12
13
13
Alternative options to Azure Functions are presented at the end.
14
14
@@ -22,14 +22,14 @@ Writing data in a table can generally be done in the following manner:
|Accumulate|MERGE (UPSERT) with compound assignment [operator](/sql/t-sql/queries/update-transact-sql#arguments) (`+=`, `-=`...)|Unique key and accumulator|
24
24
25
-
To illustrate the differences, we can look at what happens when ingesting the following two records:
25
+
To illustrate the differences, look at what happens when ingesting the following two records:
26
26
27
27
|Arrival_Time|Device_Id|Measure_Value|
28
28
|-|-|-|
29
29
|10:00|A|1|
30
30
|10:05|A|20|
31
31
32
-
In **append** mode, we insert the two records. The equivalent T-SQL statement is:
32
+
In the **append** mode, we insert two records. The equivalent T-SQL statement is:
33
33
34
34
```SQL
35
35
INSERT INTO [target] VALUES (...);
@@ -42,7 +42,7 @@ Resulting in:
42
42
|10:00|A|1|
43
43
|10:05|A|20|
44
44
45
-
In **replace** mode, we get only the last value by key. Here we will use **Device_Id as the key.** The equivalent T-SQL statement is:
45
+
In **replace** mode, we get only the last value by key. Here we use **Device_Id as the key.** The equivalent T-SQL statement is:
46
46
47
47
```SQL
48
48
MERGE INTO [target] t
@@ -65,7 +65,7 @@ Resulting in:
65
65
|-|-|-|
66
66
|10:05|A|20|
67
67
68
-
Finally, in **accumulate** mode we sum `Value` with a compound assignment operator (`+=`). Here also we will use Device_Id as the key:
68
+
Finally, in **accumulate** mode we sum `Value` with a compound assignment operator (`+=`). Here also we use Device_Id as the key:
69
69
70
70
```SQL
71
71
MERGE INTO [target] t
@@ -90,15 +90,15 @@ Resulting in:
90
90
91
91
For **performance** considerations, the ASA SQL database output adapters currently only support append mode natively. These adapters use bulk insert to maximize throughput and limit back pressure.
92
92
93
-
This article shows how to use Azure Functions to implement Replace and Accumulate modes for ASA. By using a function as an intermediary layer, the potential write performance won't affect the streaming job. In this regard, using Azure Functions will work best with Azure SQL. With Synapse SQL, switching from bulk to row-by-row statements may create greater performance issues.
93
+
This article shows how to use Azure Functions to implement Replace and Accumulate modes for ASA. When you use a function as an intermediary layer, the potential write performance won't affect the streaming job. In this regard, using Azure Functions works best with Azure SQL. With Synapse SQL, switching from bulk to row-by-row statements might create greater performance issues.
94
94
95
95
## Azure Functions Output
96
96
97
-
In our job, we'll replace the ASA SQL output by the [ASA Azure Functions output](./azure-functions-output.md). The UPDATE, UPSERT, or MERGE capabilities will be implemented in the function.
97
+
In our job, we replace the ASA SQL output by the [ASA Azure Functions output](./azure-functions-output.md). The UPDATE, UPSERT, or MERGE capabilities are implemented in the function.
98
98
99
99
There are currently two options to access a SQL Database in a function. First is the [Azure SQL output binding](../azure-functions/functions-bindings-azure-sql.md). It's currently limited to C#, and only offers replace mode. Second is to compose a SQL query to be submitted via the appropriate [SQL driver](/sql/connect/sql-connection-libraries) ([Microsoft.Data.SqlClient](https://github.com/dotnet/SqlClient) for .NET).
100
100
101
-
For both samples below, we'll assume the following table schema. The binding option requires **a primary key** to be set on the target table. It's not necessary, but recommended, when using a SQL driver.
101
+
For both the following samples, we assume the following table schema. The binding option requires **a primary key** to be set on the target table. It's not necessary, but recommended, when using a SQL driver.
102
102
103
103
```SQL
104
104
CREATE TABLE [dbo].[device_updated](
@@ -130,7 +130,7 @@ This sample was built on:
130
130
131
131
To better understand the binding approach, it's recommended to follow [this tutorial](https://github.com/Azure/azure-functions-sql-extension#quick-start).
132
132
133
-
First, create a default HttpTrigger function app by following this [tutorial](../azure-functions/create-first-function-vs-code-csharp.md?tabs=in-process). The following information will be used:
133
+
First, create a default HttpTrigger function app by following this [tutorial](../azure-functions/create-first-function-vs-code-csharp.md?tabs=in-process). The following information is used:
134
134
135
135
- Language: `C#`
136
136
- Runtime: `.NET 6` (under function/runtime v4)
@@ -233,7 +233,7 @@ Update the `Device` class and mapping section to match your own schema:
233
233
publicDateTimeTimestamp { get; set; }
234
234
```
235
235
236
-
You can now test the wiring between the local function and the database bydebugging (F5inVSCode). The SQL database needs to be reachable from your machine. [SSMS](/sql/ssms/sql-server-management-studio-ssms) can be used to check connectivity. Then a tool like [Postman](https://www.postman.com/) can be used to issue POST requests to the local endpoint. A request with an empty body should return http 204. A request with an actual payload should be persisted in the destination table (in replace / update mode). Here's a sample payload corresponding to the schema used in this sample:
236
+
You can now test the wiring between the local function and the database bydebugging (F5inVisualStudio Code). The SQL database needs to be reachable from your machine. [SSMS](/sql/ssms/sql-server-management-studio-ssms) can be used to check connectivity. Then a tool like [Postman](https://www.postman.com/) can be used to issue POST requests to the local endpoint. A request with an empty body should return http 204. A request with an actual payload should be persisted in the destination table (in replace / update mode). Here's a sample payload corresponding to the schema used in this sample:
First, createadefaultHttpTriggerfunctionappbyfollowing this [tutorial](../azure-functions/create-first-function-vs-code-csharp.md?tabs=in-process). The following information will be used:
259
+
First, createadefaultHttpTriggerfunctionappbyfollowing this [tutorial](../azure-functions/create-first-function-vs-code-csharp.md?tabs=in-process). The following information is used:
260
260
261
261
- Language: `C#`
262
262
- Runtime: `.NET 6` (underfunction/runtimev4)
@@ -371,11 +371,11 @@ The function can then be defined as an output in the ASA job, and used to replac
371
371
372
372
## Alternatives
373
373
374
-
Outside of Azure Functions, there are multiple ways to achieve the expected result. We'll mention the most likely solutions below.
374
+
Outside of Azure Functions, there are multiple ways to achieve the expected result. This section provides some of them.
375
375
376
376
### Post-processing in the target SQL Database
377
377
378
-
A background task will operate once the data is inserted in the database via the standard ASA SQL outputs.
378
+
A background task operates once the data is inserted in the database via the standard ASA SQL outputs.
379
379
380
380
For Azure SQL, `INSTEAD OF`[DML triggers](/sql/relational-databases/triggers/dml-triggers?view=azuresqldb-current&preserve-view=true) can be used to intercept the INSERT commands issued by ASA:
381
381
@@ -402,13 +402,13 @@ END;
402
402
403
403
For Synapse SQL, ASA can insert into a [staging table](../synapse-analytics/sql/data-loading-best-practices.md#load-to-a-staging-table). A recurring task can then transform the data as needed into an intermediary table. Finally the [data is moved](../synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition.md#partition-switching) to the production table.
404
404
405
-
### Pre-processing in Azure Cosmos DB
405
+
### Preprocessing in Azure Cosmos DB
406
406
407
407
Azure Cosmos DB [supports UPSERT natively](./stream-analytics-documentdb-output.md#upserts-from-stream-analytics). Here only append/replace is possible. Accumulations must be managed client-side in Azure Cosmos DB.
408
408
409
409
If the requirements match, an option is to replace the target SQL database by an Azure Cosmos DB instance. Doing so requires an important change in the overall solution architecture.
410
410
411
-
For Synapse SQL, Azure Cosmos DB can be used as an intermediary layer via [Azure Synapse Link for Azure Cosmos DB](../cosmos-db/synapse-link.md). Synapse Link can be used to create an [analytical store](../cosmos-db/analytical-store-introduction.md). This data store can then be queried directly in Synapse SQL.
411
+
For Synapse SQL, Azure Cosmos DB can be used as an intermediary layer via [Azure Synapse Link for Azure Cosmos DB](../cosmos-db/synapse-link.md). Azure Synapse Link can be used to create an [analytical store](../cosmos-db/analytical-store-introduction.md). This data store can then be queried directly in Synapse SQL.
412
412
413
413
### Comparison of the alternatives
414
414
@@ -422,7 +422,7 @@ Each approach offers different value proposition and capabilities:
Copy file name to clipboardExpand all lines: articles/stream-analytics/stream-analytics-add-inputs.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ ms.service: stream-analytics
5
5
author: enkrumah
6
6
ms.author: ebnkruma
7
7
ms.topic: conceptual
8
-
ms.date: 02/28/2023
8
+
ms.date: 02/26/2024
9
9
---
10
10
# Understand inputs for Azure Stream Analytics
11
11
@@ -31,14 +31,14 @@ As data is pushed to a data source, it's consumed by the Stream Analytics job an
31
31
- Reference data inputs.
32
32
33
33
### Data stream input
34
-
A data stream is an unbounded sequence of events over time. Stream Analytics jobs must include at least one data stream input. Event Hubs, IoT Hub, Azure Data Lake Storage Gen2 and Blob storage are supported as data stream input sources. Event Hubs is used to collect event streams from multiple devices and services. These streams might include social media activity feeds, stock trade information, or data from sensors. IoT Hubs are optimized to collect data from connected devices in Internet of Things (IoT) scenarios. Blob storage can be used as an input source for ingesting bulk data as a stream, such as log files.
34
+
A data stream is an unbounded sequence of events over time. Stream Analytics jobs must include at least one data stream input. Event Hubs, IoT Hub, Azure Data Lake Storage Gen2, and Blob storage are supported as data stream input sources. Event Hubs is used to collect event streams from multiple devices and services. These streams might include social media activity feeds, stock trade information, or data from sensors. IoT Hubs are optimized to collect data from connected devices in Internet of Things (IoT) scenarios Blob storage can be used as an input source for ingesting bulk data as a stream, such as log files.
35
35
36
-
For more information about streaming data inputs, see [Stream data as input into Stream Analytics](stream-analytics-define-inputs.md)
36
+
For more information about streaming data inputs, see [Stream data as input into Stream Analytics](stream-analytics-define-inputs.md).
37
37
38
38
### Reference data input
39
39
Stream Analytics also supports input known as *reference data*. Reference data is either completely static or changes slowly. It's typically used to perform correlation and lookups. For example, you might join data in the data stream input to data in the reference data, much as you would perform a SQL join to look up static values. Azure Blob storage, Azure Data Lake Storage Gen2, and Azure SQL Database are currently supported as input sources for reference data. Reference data source blobs have a limit of up to 300 MB in size, depending on the query complexity and allocated Streaming Units. For more information, see the [Size limitation](stream-analytics-use-reference-data.md#size-limitation) section of the reference data documentation.
40
40
41
-
For more information about reference data inputs, see [Using reference data for lookups in Stream Analytics](stream-analytics-use-reference-data.md)
41
+
For more information about reference data inputs, see [Using reference data for lookups in Stream Analytics](stream-analytics-use-reference-data.md).
0 commit comments