Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Commit 7540594

Browse files
authored
Merge pull request #185 from microsoft/mjmelone-patch-32-1
Create Episode 2 - Joins.csl
2 parents 663df1e + 5930c41 commit 7540594

File tree

1 file changed

+325
-0
lines changed

1 file changed

+325
-0
lines changed
Lines changed: 325 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,325 @@
1+
print Series = 'Tracking the Adversary with MTP Advanced Hunting', EpisodeNumber = 2, Topic = 'Joins', Presenter = 'Michael Melone, Tali Ash', Company = 'Microsoft'
2+
3+
// Language Reference: https://docs.microsoft.com/en-us/azure/kusto/query/
4+
// Advanced Hunting Reference: https://docs.microsoft.com/en-us/microsoft-365/security/mtp/advanced-hunting-schema-tables?view=o365-worldwide
5+
// ---------------
6+
7+
// Joins
8+
// - Links two datasets together based on a common key
9+
// - Can heavily impact performance depending on how datasets are joined
10+
// - If datasets being joined are too large you may get an error
11+
12+
// ---------------
13+
14+
// The Join Statement
15+
// In the below example, we will find users in the Finance department and determine where they have logged on.
16+
// We'll accomplish this using the IdentityInfo table (user information) and the IdentityLogonEvents
17+
// table.
18+
19+
IdentityLogonEvents
20+
| take 100
21+
22+
// IdentityLogonEvents
23+
// - Authentications performed against an on-prem DC or to Microsoft online services.
24+
// - Contains success \ fail information, logon type, application, identity information, and client information
25+
26+
IdentityInfo
27+
| where Department == 'Finance'
28+
| join IdentityLogonEvents on AccountObjectId
29+
30+
// Note that we now have duplicate columns.
31+
// the duplicates have a '1' at the end of the column name to
32+
// avoid errors.
33+
34+
// This example uses two datasets, identified as "left" and "right"
35+
// based on their location relative to the join statement.
36+
37+
// Left table:
38+
IdentityInfo
39+
| where Department == 'Finance'
40+
41+
// Right table:
42+
IdentityLogonEvents
43+
| take 100
44+
45+
// As long as the join column names match this should
46+
// work nicely. If the column names do not match, we may
47+
// need to specify which columns to join...
48+
// We accomplish this by using $left. and $right.
49+
50+
IdentityInfo
51+
| where Department == 'Finance'
52+
| project-rename objid = AccountObjectId
53+
| join IdentityLogonEvents on $left.objid == $right.AccountObjectId
54+
55+
// --------------------------------------------------------
56+
57+
// JOIN TYPES
58+
// Now comes the fun part - understanding the default Kusto join.
59+
60+
let LeftTable = datatable (key:int, value:string)
61+
[
62+
0, "Hello",
63+
0, "Hola",
64+
1, "Salut",
65+
1, "Ciao",
66+
2, "Hallo"
67+
];
68+
let RightTable = datatable (key:int, value:string)
69+
[
70+
0, "World",
71+
0, "Mundo",
72+
1, "Monde",
73+
1, "Mondo",
74+
2, "Welt"
75+
];
76+
LeftTable
77+
| join RightTable on key
78+
79+
// As you can see we are missing data. The default Kusto join
80+
// deduplicates the left table based on the join column before
81+
// joining the datasets together. Because of this, we lose
82+
// "Hola" and "Ciao".
83+
84+
// This is important since it can directly result in missed
85+
// detections! If you want to join data together using the
86+
// standard inner join (the default in SQL) you need to specify
87+
// kind = inner!
88+
89+
// The default join can be handy from a performance perspective. For
90+
// example, let's say we wanted to produce a list of users who logged
91+
// on to Windows 10 devices. The DeviceInfo table has duplicates (one
92+
// row for each checkin), but we don't need them represented.
93+
94+
DeviceInfo
95+
| where OSPlatform == 'Windows10'
96+
| join DeviceLogonEvents on DeviceId
97+
| distinct DeviceId, DeviceName, AccountDomain, AccountName, AccountSid
98+
99+
// Specifying kind=inner enables us to return all rows from both tables
100+
101+
let LeftTable = datatable (key:int, value:string)
102+
[
103+
0, "Hello",
104+
0, "Hola",
105+
1, "Salut",
106+
1, "Ciao",
107+
2, "Hallo"
108+
];
109+
let RightTable = datatable (key:int, value:string)
110+
[
111+
0, "World",
112+
0, "Mundo",
113+
1, "Monde",
114+
1, "Mondo",
115+
2, "Welt"
116+
];
117+
LeftTable
118+
| join kind=inner RightTable on key
119+
120+
// This comes in handy when you want to see every network communication within 5 minutes
121+
// of an alert event on the device
122+
123+
AlertEvidence
124+
| where isnotempty(DeviceId)
125+
| project-rename AlertTimestamp = Timestamp
126+
| join kind=inner DeviceNetworkEvents on DeviceId
127+
| where Timestamp between (datetime_add('minute', -5, AlertTimestamp) .. datetime_add('minute', 5, AlertTimestamp))
128+
129+
// Other types of joins
130+
// - left outer: all rows from the left table regardless if they match on the right
131+
// - right outer: all rows from the right table regardless if they match on the left
132+
133+
let LeftTable = datatable (key:int, value:string)
134+
[
135+
0, "Foo",
136+
1, "Bar",
137+
2, "Baz",
138+
3, "Qux",
139+
4, "Quux"
140+
];
141+
let RightTable = datatable (key:int, value:string)
142+
[
143+
0, "Wibble",
144+
1, "Wobble",
145+
2, "Wubble",
146+
];
147+
LeftTable
148+
| join kind=leftouter RightTable on key
149+
150+
// For example, let’s say we wanted a list of all emails that the malware
151+
// filter detected as phishing paired with details about their attachments.
152+
153+
// EmailEvents
154+
// ref: https://docs.microsoft.com/en-us/microsoft-365/security/mtp/advanced-hunting-emailevents-table?view=o365-worldwide
155+
// Contains information about e-mails processed through Office ATP, including
156+
// - Standard email metadata
157+
// - Whether phish or malware detection identified the e-mail as malicious upon receipt
158+
// - Actions taken by Office ATP on the e-mail upon receipt
159+
160+
// EmailAttachmentInfo
161+
// ref: https://docs.microsoft.com/en-us/microsoft-365/security/mtp/advanced-hunting-emailattachmentinfo-table?view=o365-worldwide
162+
// Contains information about e-mail attachments
163+
164+
EmailEvents
165+
| where PhishFilterVerdict == "Phish"
166+
| join kind=leftouter EmailAttachmentInfo on NetworkMessageId, RecipientObjectId
167+
| take 100
168+
169+
// EmailEvents can tell us what e-mails were picked up as phishing, but we won’t
170+
// have an entry in EmailAttachmentInfo for each since many are unlikely to have
171+
// an attachment. To accomplish this we used left outer join.
172+
173+
// ------------------------------------------
174+
// - full outer: all rows of both tables despite whether or not they match each other
175+
176+
let LeftTable = datatable (key:int, value:string)
177+
[
178+
0, "Foo",
179+
1, "Bar",
180+
2, "Baz",
181+
3, "Qux",
182+
4, "Quux"
183+
];
184+
let RightTable = datatable (key:int, value:string)
185+
[
186+
2, "Wibble",
187+
3, "Wobble",
188+
16, "Wubble",
189+
];
190+
LeftTable
191+
| join kind=fullouter RightTable on key
192+
193+
// I use this in a query I use reporting on antimalware signature, engine, and platform versions.
194+
195+
let StartDate = ago(30d);
196+
DeviceFileEvents
197+
| where Timestamp > StartDate
198+
// Find signature \ engine update activity
199+
| where InitiatingProcessFileName =~ 'MpSigStub.exe' and InitiatingProcessCommandLine contains '/stub' and InitiatingProcessCommandLine contains '/payload'
200+
| summarize Timestamp = arg_max(Timestamp, InitiatingProcessCommandLine) by DeviceId, DeviceName
201+
| extend SplitCommand = split(InitiatingProcessCommandLine, ' ')
202+
// Locate stub and payload versions
203+
| extend EngineVersionLocation = array_index_of(SplitCommand, "/stub") + 1, DefinitionVersionLocation = array_index_of(SplitCommand, "/payload") + 1
204+
| project Timestamp, DeviceName, DeviceId, AMEngineVersion = SplitCommand[EngineVersionLocation], AntivirusSignatureVersion = SplitCommand[DefinitionVersionLocation]
205+
| join kind=fullouter (
206+
DeviceProcessEvents
207+
| where Timestamp > StartDate
208+
// Find process creations for MsMpEng from the platform folder
209+
| where FileName =~ 'MsMpEng.exe' and FolderPath contains @"\Microsoft\Windows Defender\Platform\"
210+
| summarize arg_max(Timestamp, FolderPath) by DeviceId, DeviceName
211+
// Go up two levels
212+
| project DeviceId, DeviceName, AMServiceVersion = split(FolderPath, '\\')[-2]
213+
) on DeviceId
214+
// Re-projecting to make the UI happy
215+
| project DeviceId, DeviceName, AMEngineVersion, AntivirusSignatureVersion, AMServiceVersion
216+
217+
// There are also anti joins and semi joins which are designed to quickly reduce datasets
218+
219+
// anti joins will remove any matching rows and return only the left or right table
220+
// - leftanti: removes any rows that match between the two tables, only returns the left table
221+
222+
let LeftTable = datatable (key:int, value:string)
223+
[
224+
0, "Foo",
225+
1, "Bar",
226+
2, "Baz",
227+
3, "Qux",
228+
4, "Quux"
229+
];
230+
let RightTable = datatable (key:int, value:string)
231+
[
232+
2, "Wibble",
233+
3, "Wobble",
234+
16, "Wubble",
235+
];
236+
LeftTable
237+
| join kind=leftanti RightTable on key
238+
239+
// rightanti - you guessed it. It removes matches and returns values from the right table
240+
241+
let LeftTable = datatable (key:int, value:string)
242+
[
243+
0, "Foo",
244+
1, "Bar",
245+
2, "Baz",
246+
3, "Qux",
247+
4, "Quux"
248+
];
249+
let RightTable = datatable (key:int, value:string)
250+
[
251+
2, "Wibble",
252+
3, "Wobble",
253+
16, "Wubble",
254+
];
255+
LeftTable
256+
| join kind=rightanti RightTable on key
257+
// Let’s say you wanted to see e-mails which were identified as either phishing
258+
// or malware which were likely still in user’s mailboxes. To achieve this, we
259+
// will use EmailEvents to identify the suspicious e-mails and filter the results
260+
// using the EmailPostDeliveryEvents table.
261+
262+
// EmailPostDeliveryEvents
263+
// ref: https://docs.microsoft.com/en-us/microsoft-365/security/mtp/advanced-hunting-emailpostdeliveryevents-table?view=o365-worldwide
264+
// contains information about post-delivery remediation actions such as manual administrator
265+
// remediation, phish zap, or malware zap
266+
267+
EmailEvents
268+
| where PhishFilterVerdict == 'Phish' or MalwareFilterVerdict == 'Malware' and FinalEmailAction !in ('Replace attachment', 'Send to quarantine')
269+
| join kind=leftanti EmailPostDeliveryEvents on InternetMessageId
270+
271+
// For all of the joins, check out: https://docs.microsoft.com/en-us/azure/kusto/query/joinoperator
272+
273+
// ---------------------------
274+
275+
// union
276+
// Sometimes you want to "link" two queries together into one result instead of joining them based on a key.
277+
// To accomplish this you would use the union operator. A union merges all rows from each query where the column
278+
// name and data type match.
279+
280+
let LeftTable = datatable (key:int, value:string)
281+
[
282+
0, "Foo",
283+
1, "Bar",
284+
2, "Baz",
285+
3, "Qux",
286+
4, "Quux"
287+
];
288+
let RightTable = datatable (key:int, value:string)
289+
[
290+
2, "Wibble",
291+
3, "Wobble",
292+
16, "Wubble",
293+
];
294+
LeftTable
295+
| union RightTable
296+
297+
// Notice we no longer have the extra columns from a join. This might be useful if you want to track
298+
// logon activity with devices (the DeviceLogonEvents table) and Active Directory \ Azure Active Directory
299+
// (the IdentityLogonEvents table) in one query.
300+
301+
DeviceLogonEvents
302+
| extend Table = 'DeviceLogonEvents'
303+
| take 100
304+
| union (
305+
IdentityLogonEvents
306+
| extend Table = 'IdentityLogonEvents'
307+
| take 100
308+
)
309+
| project-reorder Timestamp, Table, AccountDomain, AccountName, AccountUpn, AccountSid
310+
| order by Timestamp asc
311+
312+
313+
// --------------------------------------
314+
315+
// Functions are a special sort of join which let you pull more static data about a file (more are
316+
// planned in the future, stay tuned!). This is really helpful when you want to get information about
317+
// file prevalence or antimalware detections.
318+
319+
// Let's say we wanted information about rare files involved in a process creation event
320+
321+
DeviceProcessEvents
322+
| invoke FileProfile() // Call the FileProfile function
323+
| where isnotempty(GlobalPrevalence) and GlobalPrevalence < 1000 // Note that in the real world you might want to include empty GlobalPrevalence
324+
| project-reorder DeviceName, FileName, ProcessCommandLine, FileSize, GlobalPrevalence, GlobalFirstSeen, GlobalLastSeen, ThreatName, Publisher, SoftwareName
325+
| top 100 by GlobalPrevalence asc

0 commit comments

Comments
 (0)