Skip to content

Commit 37699fd

Browse files
authored
initial commit
1 parent 5169f27 commit 37699fd

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

articles/stream-analytics/stream-analytics-time-handling.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
---
22
title: Understand time handling in Azure Stream Analytics
3-
description: Learn how time handling works in Azure Stream Analytics, like how to choose the best start time, how to handle late and early events, and time handling metrics.
3+
description: Learn how to choose the best start time, handle late and early events, and about time handling metrics in Azure Stream Analytics.
44
author: mamccrea
55
ms.author: mamccrea
66
ms.reviewer: mamccrea
77
ms.service: stream-analytics
88
ms.topic: conceptual
9-
ms.date: 03/05/2018
9+
ms.date: 04/09/2020
1010
---
1111

1212
# Understand time handling in Azure Stream Analytics
1313

14-
In this article, we discuss how you can make design choices to solve practical time handling problems in the Azure Stream Analytics service. Time handling design decisions are closely related to event ordering factors.
14+
In this article, you learn how to make design choices to solve practical time handling problems in Azure Stream Analytics jobs. Time handling design decisions are closely related to event ordering factors.
1515

1616
## Background time concepts
1717

@@ -41,7 +41,7 @@ Stream Analytics gives users two choices for picking event time:
4141

4242
Application time is assigned when the event is generated, and it's part of the event payload. To process events by application time, use the **Timestamp by** clause in the select query. If the **Timestamp by** clause is absent, events are processed by arrival time.
4343

44-
It’s important to use a timestamp in the payload when temporal logic is involved. That way, delays in the source system or in the network can be accounted for.
44+
It’s important to use a timestamp in the payload when temporal logic is involved. That way, delays in the source system or in the network can be accounted for. The time assigned to an event is available in [SYSTEM.TIMESTAMP](https://docs.microsoft.com/stream-analytics-query/system-timestamp-stream-analytics).
4545

4646
## How time progresses in Azure Stream Analytics
4747

@@ -83,15 +83,17 @@ The heuristic watermark generation mechanism described here works well in most o
8383

8484
Instead of using a watermark global to all events in an input partition, Stream Analytics has another mechanism called substreams to help you. You can utilize substreams in your job by writing a job query that uses the [**TIMESTAMP BY**](/stream-analytics-query/timestamp-by-azure-stream-analytics) clause and the keyword **OVER**. To designate the substream, provide a key column name after the **OVER** keyword, such as a `deviceid`, so that system applies time policies by that column. Each substream gets its own independent watermark. This mechanism is useful to allow timely output generation, when dealing with large clock skews or network delays among event senders.
8585

86-
Substreams are a unique solution provided by Azure Stream Analytics, and are not offered by other streaming data processing systems. Stream Analytics applies the late arrival tolerance window to incoming events when substreams are used. The default setting (5 seconds) is likely too small for devices with divergent timestamps. We recommend that you start with 5 minutes, and make adjustments according to their device clock skew pattern.
86+
Substreams are a unique solution provided by Azure Stream Analytics, and are not offered by other streaming data processing systems.
87+
88+
Stream Analytics applies the late arrival tolerance window to incoming events when substreams are used. Late arrival tolerance decides the maximum amount by which different substreams can be apart from eachother. (if device 1 is at TS 1, and device 2 is at TS 2, TS 2- TS1 is at most late arrival tolerance). The default setting (5 seconds) is likely too small for devices with divergent timestamps. We recommend that you start with 5 minutes, and make adjustments according to their device clock skew pattern.
8789

8890
## Early arriving events
8991

90-
You may have noticed another concept called early arrival window, that looks like the opposite of late arrival tolerance window. This window is fixed at 5 minutes, and serves a different purpose from late arrival one.
92+
You may have noticed another concept called early arrival window, that looks like the opposite of late arrival tolerance window. This window is fixed at 5 minutes, and serves a different purpose from the late arrival tolerance window.
9193

92-
Because Azure Stream Analytics guarantees it always generates complete results, you can only specify **job start time** as the first output time of the job, not the input time. The job start time is required so that the complete window is processed, not just from the middle of the window.
94+
Because Azure Stream Analytics guarantees complete results, you can only specify **job start time** as the first output time of the job, not the input time. The job start time is required so that the complete window is processed, not just from the middle of the window.
9395

94-
Stream Analytics then derives the starting time from the query specification. However, because input event broker is only indexed by arrival time, the system has to translate the starting event time to arrival time. The system can start processing events from that point in the input event broker. With the early arriving window limit, the translation is straightforward. It’s starting event time minus the 5-minute early arriving window. This calculation also means that the system drops all events that are seen having event time 5 minutes greater than arrival time.
96+
Stream Analytics then derives the starting time from the query specification. However, because input event broker is only indexed by arrival time, the system has to translate the starting event time to arrival time. The system can start processing events from that point in the input event broker. With the early arriving window limit, the translation is straightforward. It’s starting event time minus the 5-minute early arriving window. This calculation also means that the system drops all events that are seen having event time 5 minutes greater than arrival time. (add link to metrics https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-monitoring) Early input events metric is incrimented when the events are dropped.
9597

9698
This concept is used to ensure the processing is repeatable no matter where you start to output from. Without such a mechanism, it would not be possible to guarantee repeatability, as many other streaming systems claim they do.
9799

@@ -117,7 +119,7 @@ Stream Analytics jobs have several **Event ordering** options. Two can be config
117119

118120
5. **System.Timestamp** value is different from the time in the **event time** field.
119121

120-
As described previously, the system adjusts event time by the out-of-order tolerance or late arrival tolerance windows. The **System.Timestamp** value of the event is adjusted, but not the **event time** field.
122+
As described previously, the system adjusts event time by the out-of-order tolerance or late arrival tolerance windows. The **System.Timestamp** value of the event is adjusted, but not the **event time** field. This can be used to identify for which events the timestamps adjusted. If the system changed the timestamp due to one of the tolerances, normally they are the same.
121123

122124
## Metrics to observe
123125

@@ -130,7 +132,7 @@ You can observe a number of the Event ordering time tolerance effects through [S
130132
| **Early Input Events** | Indicates the number of events arriving early from the source that have either been dropped, or their timestamp has been adjusted if they are beyond 5 minutes early. |
131133
| **Watermark Delay** | Indicates the delay of the streaming data processing job. See more information in the following section.|
132134

133-
## Watermark Delay details
135+
## Watermark delay details
134136

135137
The **Watermark delay** metric is computed as the wall clock time of the processing node minus the largest watermark it has seen so far. For more information, see the [watermark delay blog post](https://azure.microsoft.com/blog/new-metric-in-azure-stream-analytics-tracks-latency-of-your-streaming-pipeline/).
136138

@@ -154,9 +156,7 @@ There are a number of other resource constraints that can cause the streaming pi
154156

155157
## Output event frequency
156158

157-
Azure Stream Analytics uses watermark progress as the only trigger to produce output events. Because the watermark is derived from input data, it is repeatable during failure recovery and also in user initiated reprocessing.
158-
159-
When using [windowed aggregates](stream-analytics-window-functions.md), the service only produces outputs at the end of the windows. In some cases, users may want to see partial aggregates generated from the windows. Partial aggregates are not supported currently in Azure Stream Analytics.
159+
Azure Stream Analytics uses watermark progress as the only trigger to produce output events. Because the watermark is derived from input data, it is repeatable during failure recovery and also in user initiated reprocessing. When using [windowed aggregates](stream-analytics-window-functions.md), the service only produces outputs at the end of the windows. In some cases, users may want to see partial aggregates generated from the windows. Partial aggregates are not supported currently in Azure Stream Analytics.
160160

161161
In other streaming solutions, output events could be materialized at various trigger points, depending on external circumstances. It's possible in some solutions that the output events for a given time window could be generated multiple times. As the input values are refined, the aggregate results become more accurate. Events could be speculated at first, and revised over time. For example, when a certain device is offline from the network, an estimated value could be used by a system. Later on, the same device comes online to the network. Then the actual event data could be included in the input stream. The output results from processing that time window produces more accurate output.
162162

0 commit comments

Comments
 (0)