Skip to content

Commit 970d367

Browse files
authored
Merge pull request #6579 from ktalmor/wi-364896-batch6
KQL-consistency-batch6
2 parents 333b81a + 3c6290f commit 970d367

File tree

10 files changed

+244
-255
lines changed

10 files changed

+244
-255
lines changed

data-explorer/kusto/query/join-rightouter.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: rightouter join
33
description: Learn how to use the rightouter join flavor to merge the rows of two tables.
44
ms.reviewer: alexans
55
ms.topic: reference
6-
ms.date: 08/11/2024
6+
ms.date: 01/21/2025
77
---
88

99
# rightouter join
@@ -29,6 +29,8 @@ The `rightouter` join flavor returns all the records from the right side and onl
2929

3030
## Example
3131

32+
This query returns all rows from table Y and any matching rows from table X, filling in NULL values where there is no match from X.
33+
3234
:::moniker range="azure-data-explorer"
3335
> [!div class="nextstepaction"]
3436
> <a href="https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA8tJLVGIULBVSEksAcKknFQN79RKq+KSosy8dB2FsMSc0lRDq5z8vHRNrmguBSBQT1TXMdSBMJPUdYwQTGMoM1ldx4Qr1porB2h0JH6jjVCNBhpiaIAwxQiJbQxjpwBNNwAZH6FQo5CVn5mnkJ2Zl2JblJmeUZJfWpJaBLQzP08BaBUAPvRgAtsAAAA=" target="_blank">Run the query</a>

data-explorer/kusto/query/join-rightsemi.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: rightsemi join
33
description: Learn how to use the rightsemi join flavor to merge the rows of two tables.
44
ms.reviewer: alexans
55
ms.topic: reference
6-
ms.date: 08/11/2024
6+
ms.date: 01/21/2025
77
---
88

99
# rightsemi join
@@ -29,6 +29,8 @@ The `rightsemi` join flavor returns all records from the right side that match a
2929

3030
## Example
3131

32+
This query filters and returns only those rows from table Y that have a matching key in table X.
33+
3234
:::moniker range="azure-data-explorer"
3335
> [!div class="nextstepaction"]
3436
> <a href="https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA8tJLVGIULBVSEksAcKknFQN79RKq+KSosy8dB2FsMSc0lRDq5z8vHRNrmguBSBQT1TXMdSBMJPUdYwQTGMoM1ldx4Qr1porB2h0JH6jjVCNBhpiaIAwxQiJbQxjpwBNNwAZH6FQo5CVn5mnkJ2Zl2JblJmeUVKcmpsJtDI/TwFoEwCXFUWa2gAAAA==" target="_blank">Run the query</a>
@@ -59,3 +61,7 @@ X | join kind=rightsemi Y on Key
5961
| b | 10 |
6062
| c | 20 |
6163
| c | 30 |
64+
65+
## Related content
66+
67+
* Learn about other [join flavors](join-operator.md#returns)

data-explorer/kusto/query/join-time-window.md

Lines changed: 36 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -3,28 +3,41 @@ title: Joining within time window
33
description: Learn how to perform a time window join operation to match between two large datasets.
44
ms.reviewer: alexans
55
ms.topic: reference
6-
ms.date: 08/11/2024
6+
ms.date: 01/28/2025
77
---
88
# Time window join
99

1010
> [!INCLUDE [applies](../includes/applies-to-version/applies.md)] [!INCLUDE [fabric](../includes/applies-to-version/fabric.md)] [!INCLUDE [azure-data-explorer](../includes/applies-to-version/azure-data-explorer.md)] [!INCLUDE [monitor](../includes/applies-to-version/monitor.md)] [!INCLUDE [sentinel](../includes/applies-to-version/sentinel.md)]
1111
1212
It's often useful to join between two large datasets on some high-cardinality key, such as an operation ID or a session ID, and further limit the right-hand-side ($right) records that need to match up with each left-hand-side ($left) record by adding a restriction on the "time-distance" between `datetime` columns on the left and on the right.
1313

14-
The above operation differs from the usual Kusto join operation, since for the `equi-join` part of matching the high-cardinality key between the left and right datasets, the system can also apply a distance function and use it to considerably speed up the join.
14+
The above operation differs from the usual join operation, since for the `equi-join` part of matching the high-cardinality key between the left and right datasets, the system can also apply a distance function and use it to considerably speed up the join.
1515

1616
> [!NOTE]
17-
> A distance function doesn't behave like equality (that is, when both dist(x,y) and dist(y,z) are true it doesn't follow that dist(x,z) is also true.) Internally, we sometimes refer to this as "diagonal join".
17+
> A distance function doesn't behave like equality (that is, when both dist(x,y) and dist(y,z) are true it doesn't follow that dist(x,z) is also true.) This is sometimes referred to as a "diagonal join".
1818
19-
For example, if you want to identify event sequences within a relatively small time window, assume that you have a table `T` with the following schema:
19+
## Example to identify event sequences without time window
20+
21+
To identify event sequences within a relatively small time window, this example uses a table `T` with the following schema:
2022

2123
* `SessionId`: A column of type `string` with correlation IDs.
2224
* `EventType`: A column of type `string` that identifies the event type of the record.
2325
* `Timestamp`: A column of type `datetime` indicates when the event described by the record happened.
2426

27+
| SessionId | EventType | Timestamp |
28+
|--|--|--|
29+
| 0 | A | 2017-10-01T00:00:00Z |
30+
| 0 | B | 2017-10-01T00:01:00Z |
31+
| 1 | B | 2017-10-01T00:02:00Z |
32+
| 1 | A | 2017-10-01T00:03:00Z |
33+
| 3 | A | 2017-10-01T00:04:00Z |
34+
| 3 | B | 2017-10-01T00:10:00Z |
35+
36+
The following query creates the dataset and then identifies all the session IDs in which event type `A` was followed by an event type `B` within a `1min` time window.
37+
2538
:::moniker range="azure-data-explorer"
2639
> [!div class="nextstepaction"]
27-
> <a href="https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA8tJLVEIUbBVSEksAcKknFSN4NTi4sz8PM8Uq+KSosy8dB0F17LUvJKQyoJUuEhIZm5qcUliboEVUF9qCZCnycsVzculAATqBuo6CuqOQAImp2FkYGiua2iga2CoYGBgBUaaOsiqnfCoNkRWbUhItRGGanwuMUZWbUxItQmGajwuMYT5MtaalysEAKb/JupnAQAA" target="_blank">Run the query</a>
40+
> <a href="https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA4WQTWvDMAyG74H8B91iQ1LsdjDI8GGFHnZubmOHdBGdu8YJjlgZ7MdPbsgHtKS2sbD12O8rnZGgAANVSTwPZxR77DrbuLcq78hbd0xh94OOit8Wx5vC1thRWbc5v0Pik4yj9zgCHolKUkheeRtyYq30c6ZVpjQolV+XTOf0doHWc1o/otc39JKTzZzePKKfbugFJ3qo8uMljgqIoz+4fKHHqZtgTNALmdY3J/wkGHufwp5KT2ZsdKBOjXXwbV1lrHPoeyOiD0EhxPsq22TI3lHa8YczncBJaNyETN4Fs5D13iQckC6IDoSq2dhqBZqjXKrnKvYPNlcRxHMCAAA=" target="_blank">Run the query</a>
2841
::: moniker-end
2942

3043
```kusto
@@ -38,38 +51,6 @@ let T = datatable(SessionId:string, EventType:string, Timestamp:datetime)
3851
'3', 'B', datetime(2017-10-01 00:10:00),
3952
];
4053
T
41-
```
42-
43-
**Output**
44-
45-
|SessionId|EventType|Timestamp|
46-
|---|---|---|
47-
|0|A|2017-10-01 00:00:00.0000000|
48-
|0|B|2017-10-01 00:01:00.0000000|
49-
|1|B|2017-10-01 00:02:00.0000000|
50-
|1|A|2017-10-01 00:03:00.0000000|
51-
|3|A|2017-10-01 00:04:00.0000000|
52-
|3|B|2017-10-01 00:10:00.0000000|
53-
54-
**Problem statement**
55-
56-
Our query should answer the following question:
57-
58-
Find all the session IDs in which event type `A` was followed by an
59-
event type `B` within a `1min` time window.
60-
61-
> [!NOTE]
62-
> In the sample data above, the only such session ID is `0`.
63-
64-
Semantically, the following query answers this question, albeit inefficiently.
65-
66-
:::moniker range="azure-data-explorer"
67-
> [!div class="nextstepaction"]
68-
> <a href="https://dataexplorer.azure.com/clusters/help/databases/Samples?query=H4sIAAAAAAAAA4WQTWvDMAyG74H8B91iQ1LsdjDI8GGFHnZubmOHdBGdu8YJjlgZ7MdPbsgHtKS2sbD12O8rnZGgAANVSTwPZxR77DrbuLcq78hbd0xh94OOit8Wx5vC1thRWbc5v0Pik4yj9zgCHolKUkheeRtyYq30c6ZVpjQolV+XTOf0doHWc1o/otc39JKTzZzePKKfbugFJ3qo8uMljgqIoz+4fKHHqZtgTNALmdY3J/wkGHufwp5KT2ZsdKBOjXXwbV1lrHPoeyOiD0EhxPsq22TI3lHa8YczncBJaNyETN4Fs5D13iQckC6IDoSq2dhqBZqjXKrnKvYPNlcRxHMCAAA=" target="_blank">Run the query</a>
69-
::: moniker-end
70-
71-
```kusto
72-
T
7354
| where EventType == 'A'
7455
| project SessionId, Start=Timestamp
7556
| join kind=inner
@@ -84,43 +65,15 @@ T
8465

8566
**Output**
8667

87-
|SessionId|Start|End|
88-
|---|---|---|
89-
|0|2017-10-01 00:00:00.0000000|2017-10-01 00:01:00.0000000|
68+
| SessionId | Start | End |
69+
|--|--|--|
70+
| 0 | 2017-10-01 00:00:00.0000000 | 2017-10-01 00:01:00.0000000 |
9071

91-
To optimize this query, we can rewrite it as described below
92-
so that the time window is expressed as a join key.
72+
## Example optimized with time window
9373

94-
**Rewrite the query to account for the time window**
74+
To optimize this query, we can rewrite it to account for the time window. THe time window is expressed as a join key. Rewrite the query so that the `datetime` values are "discretized" into buckets whose size is half the size of the time window. Use *`equi-join`* to compare the bucket IDs.
9575

96-
Rewrite the query so that the `datetime` values are "discretized" into buckets whose size is half the size of the time window. Use Kusto's *`equi-join`* to compare those bucket IDs.
97-
98-
```kusto
99-
let lookupWindow = 1min;
100-
let lookupBin = lookupWindow / 2.0; // lookup bin = equal to 1/2 of the lookup window
101-
T
102-
| where EventType == 'A'
103-
| project SessionId, Start=Timestamp,
104-
// TimeKey on the left side of the join is mapped to a discrete time axis for the join purpose
105-
TimeKey = bin(Timestamp, lookupBin)
106-
| join kind=inner
107-
(
108-
T
109-
| where EventType == 'B'
110-
| project SessionId, End=Timestamp,
111-
// TimeKey on the right side of the join - emulates event 'B' appearing several times
112-
// as if it was 'replicated'
113-
TimeKey = range(bin(Timestamp-lookupWindow, lookupBin),
114-
bin(Timestamp, lookupBin),
115-
lookupBin)
116-
// 'mv-expand' translates the TimeKey array range into a column
117-
| mv-expand TimeKey to typeof(datetime)
118-
) on SessionId, TimeKey
119-
| where (End - Start) between (0min .. lookupWindow)
120-
| project SessionId, Start, End
121-
```
122-
123-
**Runnable query reference (with table inlined)**
76+
The query finds pairs of events within the same session (*SessionId*) where an 'A' event is followed by a 'B' event within 1 minute. It projects the session ID, the start time of the 'A' event, and the end time of the 'B' event.
12477

12578
:::moniker range="azure-data-explorer"
12679
> [!div class="nextstepaction"]
@@ -158,13 +111,13 @@ T
158111

159112
**Output**
160113

161-
|SessionId|Start|End|
162-
|---|---|---|
163-
|0|2017-10-01 00:00:00.0000000|2017-10-01 00:01:00.0000000|
114+
| SessionId | Start | End |
115+
|--|--|--|
116+
| 0 | 2017-10-01 00:00:00.0000000 | 2017-10-01 00:01:00.0000000 |
164117

165-
**5M data query**
118+
## 5 million data query
166119

167-
The next query emulates a dataset of 5M records and ~1M IDs and runs the query with the technique described above.
120+
The next query emulates an extensive dataset of 5M records and approximately 1M Session IDs and runs the query with the time window technique.
168121

169122
:::moniker range="azure-data-explorer"
170123
> [!div class="nextstepaction"]
@@ -199,6 +152,10 @@ T
199152

200153
**Output**
201154

202-
|Count|
203-
|---|
204-
|3344|
155+
| Count |
156+
|--|
157+
| 3344 |
158+
159+
## Related content
160+
161+
* [join operator](join-operator.md)

data-explorer/kusto/query/let-statement.md

Lines changed: 39 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Let statement
33
description: Learn how to use the Let statement to set a variable name to define an expression or a function.
44
ms.reviewer: alexans
55
ms.topic: reference
6-
ms.date: 08/11/2024
6+
ms.date: 01/28/2025
77
ms.localizationpriority: high
88
---
99
# Let statement
@@ -67,6 +67,8 @@ To optimize multiple uses of the `let` statement within a single query, see [Opt
6767

6868
[!INCLUDE [help-cluster](../includes/help-cluster-note.md)]
6969

70+
The query examples show the syntax and example usage of the operator, statement, or function.
71+
7072
### Define scalar values
7173

7274
The following example uses a scalar expression statement.
@@ -95,13 +97,13 @@ range y from 0 to ['some number'] step 5
9597

9698
**Output**
9799

98-
|y|
99-
|---|
100-
|0|
101-
|5|
102-
|10|
103-
|15|
104-
|20|
100+
| y |
101+
|--|
102+
| 0 |
103+
| 5 |
104+
| 10 |
105+
| 15 |
106+
| 20 |
105107

106108
### Create a user defined function with scalar calculation
107109

@@ -120,13 +122,13 @@ range x from 1 to 5 step 1
120122

121123
**Output**
122124

123-
|x|result|
124-
|---|---|
125-
|1|5|
126-
|2|10|
127-
|3|15|
128-
|4|20|
129-
|5|25|
125+
| x | result |
126+
|--|--|
127+
| 1 | 5 |
128+
| 2 | 10 |
129+
| 3 | 15 |
130+
| 4 | 20 |
131+
| 5 | 25 |
130132

131133
### Create a user defined function that trims input
132134

@@ -145,14 +147,14 @@ range x from 10 to 15 step 1
145147

146148
**Output**
147149

148-
|x|result|
149-
|---|---|
150-
|10|0|
151-
|11||
152-
|12|2|
153-
|13|3|
154-
|14|4|
155-
|15|5|
150+
| x | result |
151+
|--|--|
152+
| 10 | 0 |
153+
| 11 | |
154+
| 12 | 2 |
155+
| 13 | 3 |
156+
| 14 | 4 |
157+
| 15 | 5 |
156158

157159
### Use multiple let statements
158160

@@ -171,9 +173,9 @@ foo2(2) | count
171173

172174
**Output**
173175

174-
|result|
175-
|---|
176-
|50|
176+
| result |
177+
|--|
178+
| 50 |
177179

178180
### Create a view or virtual table
179181

@@ -192,10 +194,10 @@ search MyColumn == 5
192194

193195
**Output**
194196

195-
|$table|MyColumn|
196-
|---|---|
197-
|Range10|5|
198-
|Range20|5|
197+
| $table | MyColumn |
198+
|--|--|
199+
| Range10 | 5 |
200+
| Range20 | 5 |
199201

200202
### Use a materialize function
201203

@@ -226,11 +228,11 @@ on $left.Day1 == $right.Day
226228

227229
**Output**
228230

229-
|Day1|Day2|Percentage|
230-
|---|---|---|
231-
|2016-05-01 00:00:00.0000000|2016-05-02 00:00:00.0000000|34.0645725975255|
232-
|2016-05-01 00:00:00.0000000|2016-05-03 00:00:00.0000000|16.618368960101|
233-
|2016-05-02 00:00:00.0000000|2016-05-03 00:00:00.0000000|14.6291376489636|
231+
| Day1 | Day2 | Percentage |
232+
|--|--|--|
233+
| 2016-05-01 00:00:00.0000000 | 2016-05-02 00:00:00.0000000 | 34.0645725975255 |
234+
| 2016-05-01 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 16.618368960101 |
235+
| 2016-05-02 00:00:00.0000000 | 2016-05-03 00:00:00.0000000 | 14.6291376489636 |
234236

235237
### Using nested let statements
236238

@@ -261,13 +263,13 @@ StormEvents
261263
**Output**
262264

263265
| State | s_s |
264-
|---|---|
266+
|--|--|
265267
| ATLANTIC SOUTH | ATLANTIC SOUTHATLANTIC SOUTH |
266268
| FLORIDA | FLORIDAFLORIDA |
267269
| FLORIDA | FLORIDAFLORIDA |
268270
| GEORGIA | GEORGIAGEORGIA |
269271
| MISSISSIPPI | MISSISSIPPIMISSISSIPPI |
270-
|...|...|
272+
| ... | ... |
271273

272274
### Tabular argument with wildcard
273275

data-explorer/kusto/query/lookup-operator.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: lookup operator
33
description: Learn how to use the lookup operator to extend columns of a fact table.
44
ms.reviewer: alexans
55
ms.topic: reference
6-
ms.date: 12/04/2024
6+
ms.date: 01/20/2025
77
---
88
# lookup operator
99

@@ -75,7 +75,7 @@ A table with:
7575
* If `kind` is unspecified or `kind=leftouter`, then in addition to the inner matches, there's a row for every row on the left (and/or right), even if it has no match. In that case, the unmatched output cells contain nulls.
7676
* If `kind=inner`, then there's a row in the output for every combination of matching rows from left and right.
7777
78-
## Examples
78+
## Example
7979
8080
The following example shows how to perform a left outer join between the `FactTable` and `DimTable`, based on matching values in the `Personal` and `Family` columns.
8181

0 commit comments

Comments
 (0)