You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+94-10Lines changed: 94 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -151,7 +151,7 @@ func main() {
151
151
// Step 1: Create StreamSQL Instance
152
152
// StreamSQL is the core component of the stream SQL processing engine, managing the entire stream processing lifecycle
153
153
ssql:= streamsql.New()
154
-
154
+
defer ssql.Stop()
155
155
// Step 2: Define Stream SQL Query Statement
156
156
// This SQL statement showcases StreamSQL's core capabilities:
157
157
// - SELECT: Choose output fields and aggregation functions
@@ -267,8 +267,7 @@ func main() {
267
267
window_end() as end
268
268
FROM stream
269
269
WHERE device.info.type = 'temperature'
270
-
GROUP BY device.location, TumblingWindow('5s')
271
-
WITH (TIMESTAMP='timestamp', TIMEUNIT='ss')`
270
+
GROUP BY device.location, TumblingWindow('5s')`
272
271
273
272
err:= ssql.Execute(rsql)
274
273
if err != nil {
@@ -335,6 +334,11 @@ Since stream data is unbounded, it cannot be processed as a whole. Windows provi
335
334
-**Characteristics**: The size of the window is not related to time but is divided based on the volume of data. It is suitable for segmenting data based on the amount of data.
336
335
-**Application Scenario**: In industrial IoT, an aggregation calculation is performed every time 100 device status data records are processed.
337
336
337
+
-**Session Window**
338
+
-**Definition**: A dynamic window based on data activity. When the interval between data exceeds a specified timeout, the current session ends and triggers the window.
339
+
-**Characteristics**: Window size changes dynamically, automatically dividing sessions based on data arrival intervals. When data arrives continuously, the session continues; when the data interval exceeds the timeout, the session ends and triggers the window.
340
+
-**Application Scenario**: In user behavior analysis, maintain a session when users operate continuously, and close the session and count operations within that session when users stop operating for more than 5 minutes.
341
+
338
342
### Stream
339
343
340
344
-**Definition**: A continuous sequence of data that is generated in an unbounded manner, typically from sensors, log systems, user behaviors, etc.
@@ -343,17 +347,97 @@ Since stream data is unbounded, it cannot be processed as a whole. Windows provi
343
347
344
348
### Time Semantics
345
349
346
-
-**Event Time**
347
-
-**Definition**: The actual time when the data occurred, usually represented by a timestamp generated by the data source.
348
-
349
-
-**Processing Time**
350
-
-**Definition**: The time when the data arrives at the processing system.
350
+
StreamSQL supports two time concepts that determine how windows are divided and triggered:
351
+
352
+
#### Event Time
353
+
354
+
-**Definition**: Event time refers to the actual time when data was generated, usually recorded in a field within the data itself (such as `event_time`, `timestamp`, `order_time`, etc.).
355
+
-**Characteristics**:
356
+
- Windows are divided based on timestamp field values in the data
357
+
- Even if data arrives late, it can be correctly counted into the corresponding window based on event time
358
+
- Uses Watermark mechanism to handle out-of-order and late data
359
+
- Results are accurate but may have delays (need to wait for late data)
360
+
-**Use Cases**:
361
+
- Scenarios requiring precise temporal analysis
362
+
- Scenarios where data may arrive out of order or delayed
363
+
- Historical data replay and analysis
364
+
-**Configuration**: Use `WITH (TIMESTAMP='field_name', TIMEUNIT='ms')` to specify the event time field
365
+
-**Example**:
366
+
```sql
367
+
SELECT deviceId, COUNT(*) as cnt
368
+
FROM stream
369
+
GROUP BY deviceId, TumblingWindow('1m')
370
+
WITH (TIMESTAMP='eventTime', TIMEUNIT='ms')
371
+
```
372
+
373
+
#### Processing Time
374
+
375
+
-**Definition**: Processing time refers to the time when data arrives at the StreamSQL processing system, i.e., the current time when the system receives the data.
376
+
-**Characteristics**:
377
+
- Windows are divided based on the time data arrives at the system (`time.Now()`)
378
+
- Regardless of the time field value in the data, it is counted into the current window based on arrival time
379
+
- Uses system clock (Timer) to trigger windows
380
+
- Low latency but results may be inaccurate (cannot handle out-of-order and late data)
381
+
-**Use Cases**:
382
+
- Real-time monitoring and alerting scenarios
383
+
- Scenarios with high latency requirements and relatively low accuracy requirements
384
+
- Scenarios where data arrives in order and delay is controllable
385
+
-**Configuration**: Default when `WITH (TIMESTAMP=...)` is not specified
386
+
-**Example**:
387
+
```sql
388
+
SELECT deviceId, COUNT(*) as cnt
389
+
FROM stream
390
+
GROUP BY deviceId, TumblingWindow('1m')
391
+
-- No WITH clause specified, defaults to processing time
392
+
```
393
+
394
+
#### Event Time vs Processing Time Comparison
395
+
396
+
| Feature | Event Time | Processing Time |
397
+
|---------|------------|-----------------|
398
+
|**Time Source**| Timestamp field in data | System current time |
399
+
|**Window Division**| Based on event timestamp | Based on data arrival time |
400
+
|**Late Data Handling**| Supported (Watermark mechanism) | Not supported |
401
+
|**Out-of-Order Handling**| Supported (Watermark mechanism) | Not supported |
402
+
|**Result Accuracy**| Accurate | May be inaccurate |
403
+
|**Processing Latency**| Higher (need to wait for late data) | Low (real-time trigger) |
404
+
|**Configuration**|`WITH (TIMESTAMP='field')`| Default (no WITH clause) |
-**Definition**: The starting time point of the window based on event time. For example, for a sliding window based on event time, the window start time is the timestamp of the earliest event within the window.
410
+
-**Event Time Windows**: The starting time point of the window, aligned to window boundaries based on event time (e.g., aligned to minute or hour boundaries).
411
+
-**Processing Time Windows**: The starting time point of the window, based on the time data arrives at the system.
412
+
-**Example**: For an event-time-based tumbling window `TumblingWindow('5m')`, the window start time aligns to multiples of 5 minutes (e.g., 10:00, 10:05, 10:10).
354
413
355
414
-**Window End Time**
356
-
-**Definition**: The ending time point of the window based on event time. Typically, the window end time is the window start time plus the duration of the window. For example, if the duration of a sliding window is 1 minute, then the window end time is the window start time plus 1 minute.
415
+
-**Event Time Windows**: The ending time point of the window, usually the window start time plus the window duration. Windows trigger when `watermark >= window_end`.
416
+
-**Processing Time Windows**: The ending time point of the window, based on the time data arrives at the system plus the window duration. Windows trigger when the system clock reaches the end time.
417
+
-**Example**: For a tumbling window with a duration of 1 minute, if the window start time is 10:00, then the window end time is 10:01.
418
+
419
+
#### Watermark Mechanism (Event Time Windows Only)
420
+
421
+
-**Definition**: Watermark indicates "events with timestamps less than this time should not arrive anymore", used to determine when windows can trigger.
0 commit comments