Skip to content

Commit 7bd9a34

Browse files
Merge branch 'MicrosoftDocs:main' into Artifact-streaming
2 parents 5b397bc + 2bfd129 commit 7bd9a34

File tree

6 files changed

+57
-43
lines changed

6 files changed

+57
-43
lines changed

articles/stream-analytics/no-code-stream-processing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ The no-code editor now supports two reference data sources:
165165

166166
Reference data is modeled as a sequence of blobs in ascending order of the date/time combination specified in the blob name. You can add blobs to the end of the sequence only by using a date/time greater than the one that the last blob specified in the sequence. Blobs are defined in the input configuration.
167167

168-
First, under the **Inputs** section on the ribbon, select **Reference ADLS Gen2**. To see details about each field, see the section about Azure Blob Storage in [Use reference data for lookups in Stream Analytics](stream-analytics-use-reference-data.md#azure-blob-storage).
168+
First, under the **Inputs** section on the ribbon, select **Reference ADLS Gen2**. To see details about each field, see the section about Azure Blob Storage in [Use reference data for lookups in Stream Analytics](stream-analytics-use-reference-data.md#azure-blob-storage-or-azure-data-lake-storage-gen-2).
169169

170170
![Screenshot that shows fields for configuring Azure Data Lake Storage Gen2 as input in the no-code editor.](./media/no-code-stream-processing/blob-referencedata-nocode.png)
171171

articles/stream-analytics/repartition.md

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: This article describes how to use repartitioning to optimize Azure
44
ms.service: stream-analytics
55
author: ahartoon
66
ms.author: anboisve
7-
ms.date: 12/21/2022
7+
ms.date: 02/26/2024
88
ms.topic: conceptual
99
ms.custom: mvc
1010
---
@@ -27,6 +27,7 @@ You can repartition your input in two ways:
2727

2828
### Creating a separate Stream Analytics job to repartition input
2929
You can create a job that reads input and writes to an event hub output using a partition key. This event hub can then serve as input for another Stream Analytics job where you implement your analytics logic. When configuring this event hub output in your job, you must specify the partition key by which Stream Analytics will repartition your data.
30+
3031
```sql
3132
-- For compat level 1.2 or higher
3233
SELECT *
@@ -40,12 +41,13 @@ FROM input PARTITION BY PartitionId
4041
```
4142

4243
### Repartition input within a single Stream Analytics job
43-
You can also introduce a step in your query that first repartitions the input and this can then be used by other steps in your query. For example, if you want to repartition input based on **DeviceId**, your query would be:
44+
You can also introduce a step in your query that first repartitions the input, which can then be used by other steps in your query. For example, if you want to repartition input based on **DeviceId**, your query would be:
45+
4446
```sql
4547
WITH RepartitionedInput AS
4648
(
47-
SELECT *
48-
FROM input PARTITION BY DeviceID
49+
SELECT *
50+
FROM input PARTITION BY DeviceID
4951
)
5052

5153
SELECT DeviceID, AVG(Reading) as AvgNormalReading
@@ -54,13 +56,23 @@ FROM RepartitionedInput
5456
GROUP BY DeviceId, TumblingWindow(minute, 1)
5557
```
5658

57-
The following example query joins two streams of repartitioned data. When joining two streams of repartitioned data, the streams must have the same partition key and count. The outcome is a stream that has the same partition scheme.
59+
The following example query joins two streams of repartitioned data. When you join two streams of repartitioned data, the streams must have the same partition key and count. The outcome is a stream that has the same partition scheme.
5860

5961
```sql
60-
WITH step1 AS (SELECT * FROM input1 PARTITION BY DeviceID),
61-
step2 AS (SELECT * FROM input2 PARTITION BY DeviceID)
62+
WITH step1 AS
63+
(
64+
SELECT * FROM input1
65+
PARTITION BY DeviceID
66+
),
67+
step2 AS
68+
(
69+
SELECT * FROM input2
70+
PARTITION BY DeviceID
71+
)
6272

63-
SELECT * INTO output FROM step1 PARTITION BY DeviceID UNION step2 PARTITION BY DeviceID
73+
SELECT * INTO output
74+
FROM step1 PARTITION BY DeviceID
75+
UNION step2 PARTITION BY DeviceID
6476
```
6577

6678
The output scheme should match the stream scheme key and count so that each substream can be flushed independently. The stream could also be merged and repartitioned again by a different scheme before flushing, but you should avoid that method because it adds to the general latency of the processing and increases resource utilization.
@@ -71,14 +83,16 @@ Experiment and observe the resource usage of your job to determine the exact num
7183

7284
## Repartitions for SQL output
7385

74-
When your job uses SQL database for output, use explicit repartitioning to match the optimal partition count to maximize throughput. Since SQL works best with eight writers, repartitioning the flow to eight before flushing, or somewhere further upstream, may benefit job performance.
86+
When your job uses SQL database for output, use explicit repartitioning to match the optimal partition count to maximize throughput. Since SQL works best with eight writers, repartitioning the flow to eight before flushing, or somewhere further upstream, might benefit job performance.
7587

7688
When there are more than eight input partitions, inheriting the input partitioning scheme might not be an appropriate choice. Consider using [INTO](/stream-analytics-query/into-azure-stream-analytics#into-shard-count) in your query to explicitly specify the number of output writers.
7789

7890
The following example reads from the input, regardless of it being naturally partitioned, and repartitions the stream tenfold according to the DeviceID dimension and flushes the data to output.
7991

8092
```sql
81-
SELECT * INTO [output] FROM [input] PARTITION BY DeviceID INTO 10
93+
SELECT * INTO [output]
94+
FROM [input]
95+
PARTITION BY DeviceID INTO 10
8296
```
8397

8498
For more information, see [Azure Stream Analytics output to Azure SQL Database](stream-analytics-sql-output-perf.md).
@@ -87,4 +101,4 @@ For more information, see [Azure Stream Analytics output to Azure SQL Database](
87101
## Next steps
88102

89103
* [Get started with Azure Stream Analytics](stream-analytics-introduction.md)
90-
* [Leverage query parallelization in Azure Stream Analytics](stream-analytics-parallelization.md)
104+
* [Use query parallelization in Azure Stream Analytics](stream-analytics-parallelization.md)

articles/stream-analytics/sql-database-upsert.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ title: Update or merge records in Azure SQL Database with Azure Functions
33
description: This article describes how to use Azure Functions to update or merge records from Azure Stream Analytics to Azure SQL Database
44
ms.service: stream-analytics
55
ms.topic: how-to
6-
ms.date: 12/03/2021
6+
ms.date: 02/27/2024
77
---
88

99
# Update or merge records in Azure SQL Database with Azure Functions
1010

11-
Currently, [Azure Stream Analytics](./index.yml) (ASA) only supports inserting (appending) rows to SQL outputs ([Azure SQL Databases](./sql-database-output.md), and [Azure Synapse Analytics](./azure-synapse-analytics-output.md)). This article discusses workarounds to enable UPDATE, UPSERT, or MERGE on SQL databases, with Azure Functions as the intermediary layer.
11+
Currently, [Azure Stream Analytics](./index.yml) (ASA) supports only inserting (appending) rows to SQL outputs ([Azure SQL Databases](./sql-database-output.md), and [Azure Synapse Analytics](./azure-synapse-analytics-output.md)). This article discusses workarounds to enable UPDATE, UPSERT, or MERGE on SQL databases, with Azure Functions as the intermediary layer.
1212

1313
Alternative options to Azure Functions are presented at the end.
1414

@@ -22,14 +22,14 @@ Writing data in a table can generally be done in the following manner:
2222
|Replace|[MERGE](/sql/t-sql/statements/merge-transact-sql) (UPSERT)|Unique key|
2323
|Accumulate|MERGE (UPSERT) with compound assignment [operator](/sql/t-sql/queries/update-transact-sql#arguments) (`+=`, `-=`...)|Unique key and accumulator|
2424

25-
To illustrate the differences, we can look at what happens when ingesting the following two records:
25+
To illustrate the differences, look at what happens when ingesting the following two records:
2626

2727
|Arrival_Time|Device_Id|Measure_Value|
2828
|-|-|-|
2929
|10:00|A|1|
3030
|10:05|A|20|
3131

32-
In **append** mode, we insert the two records. The equivalent T-SQL statement is:
32+
In the **append** mode, we insert two records. The equivalent T-SQL statement is:
3333

3434
```SQL
3535
INSERT INTO [target] VALUES (...);
@@ -42,7 +42,7 @@ Resulting in:
4242
|10:00|A|1|
4343
|10:05|A|20|
4444

45-
In **replace** mode, we get only the last value by key. Here we will use **Device_Id as the key.** The equivalent T-SQL statement is:
45+
In **replace** mode, we get only the last value by key. Here we use **Device_Id as the key.** The equivalent T-SQL statement is:
4646

4747
```SQL
4848
MERGE INTO [target] t
@@ -65,7 +65,7 @@ Resulting in:
6565
|-|-|-|
6666
|10:05|A|20|
6767

68-
Finally, in **accumulate** mode we sum `Value` with a compound assignment operator (`+=`). Here also we will use Device_Id as the key:
68+
Finally, in **accumulate** mode we sum `Value` with a compound assignment operator (`+=`). Here also we use Device_Id as the key:
6969

7070
```SQL
7171
MERGE INTO [target] t
@@ -90,15 +90,15 @@ Resulting in:
9090

9191
For **performance** considerations, the ASA SQL database output adapters currently only support append mode natively. These adapters use bulk insert to maximize throughput and limit back pressure.
9292

93-
This article shows how to use Azure Functions to implement Replace and Accumulate modes for ASA. By using a function as an intermediary layer, the potential write performance won't affect the streaming job. In this regard, using Azure Functions will work best with Azure SQL. With Synapse SQL, switching from bulk to row-by-row statements may create greater performance issues.
93+
This article shows how to use Azure Functions to implement Replace and Accumulate modes for ASA. When you use a function as an intermediary layer, the potential write performance won't affect the streaming job. In this regard, using Azure Functions works best with Azure SQL. With Synapse SQL, switching from bulk to row-by-row statements might create greater performance issues.
9494

9595
## Azure Functions Output
9696

97-
In our job, we'll replace the ASA SQL output by the [ASA Azure Functions output](./azure-functions-output.md). The UPDATE, UPSERT, or MERGE capabilities will be implemented in the function.
97+
In our job, we replace the ASA SQL output by the [ASA Azure Functions output](./azure-functions-output.md). The UPDATE, UPSERT, or MERGE capabilities are implemented in the function.
9898

9999
There are currently two options to access a SQL Database in a function. First is the [Azure SQL output binding](../azure-functions/functions-bindings-azure-sql.md). It's currently limited to C#, and only offers replace mode. Second is to compose a SQL query to be submitted via the appropriate [SQL driver](/sql/connect/sql-connection-libraries) ([Microsoft.Data.SqlClient](https://github.com/dotnet/SqlClient) for .NET).
100100

101-
For both samples below, we'll assume the following table schema. The binding option requires **a primary key** to be set on the target table. It's not necessary, but recommended, when using a SQL driver.
101+
For both the following samples, we assume the following table schema. The binding option requires **a primary key** to be set on the target table. It's not necessary, but recommended, when using a SQL driver.
102102

103103
```SQL
104104
CREATE TABLE [dbo].[device_updated](
@@ -130,7 +130,7 @@ This sample was built on:
130130

131131
To better understand the binding approach, it's recommended to follow [this tutorial](https://github.com/Azure/azure-functions-sql-extension#quick-start).
132132

133-
First, create a default HttpTrigger function app by following this [tutorial](../azure-functions/create-first-function-vs-code-csharp.md?tabs=in-process). The following information will be used:
133+
First, create a default HttpTrigger function app by following this [tutorial](../azure-functions/create-first-function-vs-code-csharp.md?tabs=in-process). The following information is used:
134134

135135
- Language: `C#`
136136
- Runtime: `.NET 6` (under function/runtime v4)
@@ -233,7 +233,7 @@ Update the `Device` class and mapping section to match your own schema:
233233
public DateTime Timestamp { get; set; }
234234
```
235235

236-
You can now test the wiring between the local function and the database by debugging (F5 in VS Code). The SQL database needs to be reachable from your machine. [SSMS](/sql/ssms/sql-server-management-studio-ssms) can be used to check connectivity. Then a tool like [Postman](https://www.postman.com/) can be used to issue POST requests to the local endpoint. A request with an empty body should return http 204. A request with an actual payload should be persisted in the destination table (in replace / update mode). Here's a sample payload corresponding to the schema used in this sample:
236+
You can now test the wiring between the local function and the database by debugging (F5 in Visual Studio Code). The SQL database needs to be reachable from your machine. [SSMS](/sql/ssms/sql-server-management-studio-ssms) can be used to check connectivity. Then a tool like [Postman](https://www.postman.com/) can be used to issue POST requests to the local endpoint. A request with an empty body should return http 204. A request with an actual payload should be persisted in the destination table (in replace / update mode). Here's a sample payload corresponding to the schema used in this sample:
237237
238238
```JSON
239239
[{"DeviceId":3,"Value":13.4,"Timestamp":"2021-11-30T03:22:12.991Z"},{"DeviceId":4,"Value":41.4,"Timestamp":"2021-11-30T03:22:12.991Z"}]
@@ -256,7 +256,7 @@ This sample was built on:
256256
- [.NET 6.0](/dotnet/core/whats-new/dotnet-6)
257257
- Microsoft.Data.SqlClient [4.0.0](https://www.nuget.org/packages/Microsoft.Data.SqlClient/)
258258
259-
First, create a default HttpTrigger function app by following this [tutorial](../azure-functions/create-first-function-vs-code-csharp.md?tabs=in-process). The following information will be used:
259+
First, create a default HttpTrigger function app by following this [tutorial](../azure-functions/create-first-function-vs-code-csharp.md?tabs=in-process). The following information is used:
260260

261261
- Language: `C#`
262262
- Runtime: `.NET 6` (under function/runtime v4)
@@ -371,11 +371,11 @@ The function can then be defined as an output in the ASA job, and used to replac
371371

372372
## Alternatives
373373

374-
Outside of Azure Functions, there are multiple ways to achieve the expected result. We'll mention the most likely solutions below.
374+
Outside of Azure Functions, there are multiple ways to achieve the expected result. This section provides some of them.
375375

376376
### Post-processing in the target SQL Database
377377

378-
A background task will operate once the data is inserted in the database via the standard ASA SQL outputs.
378+
A background task operates once the data is inserted in the database via the standard ASA SQL outputs.
379379

380380
For Azure SQL, `INSTEAD OF` [DML triggers](/sql/relational-databases/triggers/dml-triggers?view=azuresqldb-current&preserve-view=true) can be used to intercept the INSERT commands issued by ASA:
381381

@@ -402,13 +402,13 @@ END;
402402

403403
For Synapse SQL, ASA can insert into a [staging table](../synapse-analytics/sql/data-loading-best-practices.md#load-to-a-staging-table). A recurring task can then transform the data as needed into an intermediary table. Finally the [data is moved](../synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition.md#partition-switching) to the production table.
404404

405-
### Pre-processing in Azure Cosmos DB
405+
### Preprocessing in Azure Cosmos DB
406406

407407
Azure Cosmos DB [supports UPSERT natively](./stream-analytics-documentdb-output.md#upserts-from-stream-analytics). Here only append/replace is possible. Accumulations must be managed client-side in Azure Cosmos DB.
408408

409409
If the requirements match, an option is to replace the target SQL database by an Azure Cosmos DB instance. Doing so requires an important change in the overall solution architecture.
410410

411-
For Synapse SQL, Azure Cosmos DB can be used as an intermediary layer via [Azure Synapse Link for Azure Cosmos DB](../cosmos-db/synapse-link.md). Synapse Link can be used to create an [analytical store](../cosmos-db/analytical-store-introduction.md). This data store can then be queried directly in Synapse SQL.
411+
For Synapse SQL, Azure Cosmos DB can be used as an intermediary layer via [Azure Synapse Link for Azure Cosmos DB](../cosmos-db/synapse-link.md). Azure Synapse Link can be used to create an [analytical store](../cosmos-db/analytical-store-introduction.md). This data store can then be queried directly in Synapse SQL.
412412

413413
### Comparison of the alternatives
414414

@@ -422,7 +422,7 @@ Each approach offers different value proposition and capabilities:
422422
|Pre-Processing|||||
423423
||Azure Functions|Replace, Accumulate|+|- (row-by-row performance)|
424424
||Azure Cosmos DB replacement|Replace|N/A|N/A|
425-
||Azure Cosmos DB Synapse Link|Replace|N/A|+|
425+
||Azure Cosmos DB Azure Synapse Link|Replace|N/A|+|
426426

427427
## Get support
428428

articles/stream-analytics/stream-analytics-add-inputs.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ ms.service: stream-analytics
55
author: enkrumah
66
ms.author: ebnkruma
77
ms.topic: conceptual
8-
ms.date: 02/28/2023
8+
ms.date: 02/26/2024
99
---
1010
# Understand inputs for Azure Stream Analytics
1111

@@ -31,14 +31,14 @@ As data is pushed to a data source, it's consumed by the Stream Analytics job an
3131
- Reference data inputs.
3232

3333
### Data stream input
34-
A data stream is an unbounded sequence of events over time. Stream Analytics jobs must include at least one data stream input. Event Hubs, IoT Hub, Azure Data Lake Storage Gen2 and Blob storage are supported as data stream input sources. Event Hubs is used to collect event streams from multiple devices and services. These streams might include social media activity feeds, stock trade information, or data from sensors. IoT Hubs are optimized to collect data from connected devices in Internet of Things (IoT) scenarios. Blob storage can be used as an input source for ingesting bulk data as a stream, such as log files.
34+
A data stream is an unbounded sequence of events over time. Stream Analytics jobs must include at least one data stream input. Event Hubs, IoT Hub, Azure Data Lake Storage Gen2, and Blob storage are supported as data stream input sources. Event Hubs is used to collect event streams from multiple devices and services. These streams might include social media activity feeds, stock trade information, or data from sensors. IoT Hubs are optimized to collect data from connected devices in Internet of Things (IoT) scenarios Blob storage can be used as an input source for ingesting bulk data as a stream, such as log files.
3535

36-
For more information about streaming data inputs, see [Stream data as input into Stream Analytics](stream-analytics-define-inputs.md)
36+
For more information about streaming data inputs, see [Stream data as input into Stream Analytics](stream-analytics-define-inputs.md).
3737

3838
### Reference data input
3939
Stream Analytics also supports input known as *reference data*. Reference data is either completely static or changes slowly. It's typically used to perform correlation and lookups. For example, you might join data in the data stream input to data in the reference data, much as you would perform a SQL join to look up static values. Azure Blob storage, Azure Data Lake Storage Gen2, and Azure SQL Database are currently supported as input sources for reference data. Reference data source blobs have a limit of up to 300 MB in size, depending on the query complexity and allocated Streaming Units. For more information, see the [Size limitation](stream-analytics-use-reference-data.md#size-limitation) section of the reference data documentation.
4040

41-
For more information about reference data inputs, see [Using reference data for lookups in Stream Analytics](stream-analytics-use-reference-data.md)
41+
For more information about reference data inputs, see [Using reference data for lookups in Stream Analytics](stream-analytics-use-reference-data.md).
4242

4343
## Next steps
4444
> [!div class="nextstepaction"]

0 commit comments

Comments
 (0)