Skip to content

Commit dc945d5

Browse files
committed
Azure Monitor query optimization headers
1 parent e4534d2 commit dc945d5

File tree

1 file changed

+25
-8
lines changed

1 file changed

+25
-8
lines changed

articles/azure-monitor/log-query/query-optimization.md

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ Query processing time is spent on:
5555

5656
Other than time spent in the query processing nodes, there is additional time that is spend by Azure Monitor Logs to: authenticate the user and verify that they are permitted to access this data, locate the data store, parse the query, and allocate the query processing nodes. This time is not included in the query total CPU time.
5757

58+
### Early filtering of records prior of using high CPU functions
59+
5860
Some of the query commands and functions are heavy in their CPU consumption. This is especially true for commands that parse JSON and XML or extract complex regular expressions. Such parsing can happen explicitly via [parse_json()](/azure/kusto/query/parsejsonfunction) or [parse_xml()](/azure/kusto/query/parse-xmlfunction) functions or implicitly when referring to dynamic columns.
5961

6062
These functions consume CPU in proportion to the number of rows they are processing. The most efficient optimization is to add where conditions early in the query that can filter out as many records as possible before the CPU intensive function is executed.
@@ -82,6 +84,8 @@ SecurityEvent
8284
| where FileHash != "" // No need to filter out %SYSTEM32 here as it was removed before
8385
```
8486

87+
### Avoid using evaluated where clauses
88+
8589
Queries that contain [where](/azure/kusto/query/whereoperator) clauses on an evaluated column rather than on columns that are physically present in the dataset lose efficiency. Filtering on evaluated columns prevents some system optimizations when large sets of data are handled.
8690
For example, the following queries produce exactly the same result but the second one is more efficient as the [where](/azure/kusto/query/whereoperator) condition refers to built-in column
8791

@@ -100,6 +104,8 @@ Heartbeat
100104
| summarize count() by Computer
101105
```
102106

107+
### Use effective aggregation commands and dimmentions in summarize and join
108+
103109
While some aggregation commands like [max()](/azure/kusto/query/max-aggfunction), [sum()](/azure/kusto/query/sum-aggfunction), [count()](/azure/kusto/query/count-aggfunction), and [avg()](/azure/kusto/query/avg-aggfunction) have low CPU impact due to their logic, other are more complex and include heuristics and estimations that allow them to be executed efficiently. For example, [dcount()](/azure/kusto/query/dcount-aggfunction) uses the HyperLogLog algorithm to provide close estimation to distinct count of large sets of data without actually counting each value; the percentile functions are doing similar approximations using the nearest rank percentile algorithm. Several of the commands include optional parameters to reduce their impact. For example, the [makeset()](/azure/kusto/query/makeset-aggfunction) function has an optional parameter to define the maximum set size, which significantly affects the CPU and memory.
104110

105111
[Join](/azure/kusto/query/joinoperator?pivots=azuremonitor) and [summarize](/azure/kusto/query/summarizeoperator) commands may cause high CPU utilization when they are processing a large set of data. Their complexity is directly related to the number of possible values, referred to as *cardinality*, of the columns that are using as the `by` in summarize or as the join attributes. For explanation and optimization of join and summarize, see their documentation articles and optimization tips.
@@ -150,9 +156,12 @@ Heartbeat
150156
## Data used for processed query
151157

152158
A critical factor in the processing of the query is the volume of data that is scanned and used for the query processing. Azure Data Explorer uses aggressive optimizations that dramatically reduce the data volume compared to other data platforms. Still, there are critical factors in the query that can impact the data volume that is used.
159+
153160
In Azure Monitor Logs, the **TimeGenerated** column is used as a way to index the data. Restricting the **TimeGenerated** values to as narrow a range as possible will make a significant improvement to query performance by significantly limiting the amount of data that has to be processed.
154161

155-
Another factor that increases the data that is process is the use of large number of tables. This usually happens when `search *` and `union *` commands are used. These commands force the system to evaluate and scan data from all tables in the workspace. In some cases, there might be hundreds of tables in the workspace. Try to avoid as much as possible using “search *” or any search without scoping it to a specific table.
162+
### Avoid unnecessary use of search and union operators
163+
164+
Another factor that increases the data that is process is the use of large number of tables. This usually happens when `search *` and `union *` commands are used. These commands force the system to evaluate and scan data from all tables in the workspace. In some cases, there might be hundreds of tables in the workspace. Try to avoid as much as possible using "search *" or any search without scoping it to a specific table.
156165

157166
For example, the following queries produce exactly the same result but the last one is by far the most efficient:
158167

@@ -174,6 +183,8 @@ Perf
174183
| summarize count(), avg(CounterValue) by Computer
175184
```
176185

186+
### Add early filters to the query
187+
177188
Another method to reduce the data volume is to have [where](/azure/kusto/query/whereoperator) conditions early in the query. The Azure Data Explorer platform includes a cache that lets it know which partitions include data that is relevant for a specific where condition. For example, if a query contains `where EventID == 4624` then it would distribute the query only to nodes that handle partitions with matching events.
178189

179190
The following example queries produce exactly the same result but the second one is more efficient:
@@ -190,7 +201,9 @@ SecurityEvent
190201
| summarize LoginSessions = dcount(LogonGuid) by Account
191202
```
192203

193-
Since Azure Data Explorer is a columnar data store, retrieval of every column is independent of the others. The number of columns that are retrieved directly influences the overall data volume. You should only include the columns in the output that are needed by [summarizing](/azure/kusto/query/summarizeoperator) the results or [projecting](/azure/kusto/query/projectoperator) the specific columns. Azure Data Explorer has several optimizations to reduce the number of retrieved columns. If it determines that a column isn’t needed, for example if it's not referenced in the [summarize](/azure/kusto/query/summarizeoperator) command, it won’t retrieve it.
204+
### Reduce the number of columns that is retrieved
205+
206+
Since Azure Data Explorer is a columnar data store, retrieval of every column is independent of the others. The number of columns that are retrieved directly influences the overall data volume. You should only include the columns in the output that are needed by [summarizing](/azure/kusto/query/summarizeoperator) the results or [projecting](/azure/kusto/query/projectoperator) the specific columns. Azure Data Explorer has several optimizations to reduce the number of retrieved columns. If it determines that a column isn't needed, for example if it's not referenced in the [summarize](/azure/kusto/query/summarizeoperator) command, it won't retrieve it.
194207

195208
For example, the second query may process three times more data since it needs to fetch not one column but three:
196209

@@ -214,6 +227,8 @@ The time range can be set using the time range selector in the Log Analytics scr
214227
An alternative method is to explicitly include a [where](/azure/kusto/query/whereoperator) condition on **TimeGenerated** in the query. You should use this method as it assures that the time span is fixed, even when the query is used from a different interface.
215228
You should ensure that all parts of the query have **TimeGenerated** filters. When a query has sub-queries fetching data from various tables or the same table, each has to include its own [where](/azure/kusto/query/whereoperator) condition.
216229

230+
### Make sure all sub-queries have TimeGenerated filter
231+
217232
For example, in the following query, while the **Perf** table will be scanned only for the last day, the **Heartbeat** table will be scanned for all of its history, which might be up to two years:
218233

219234
```Kusto
@@ -283,6 +298,8 @@ Heartbeat
283298
| summarize min(TimeGenerated) by Computer
284299
```
285300

301+
### Time span measurement limitations
302+
286303
The measurement is always larger than the actual time specified. For example, if the filter on the query is 7 days, the system might scan 7.5 or 8.1 days. This is because the system is partitioning the data into chunks in variable size. To assure that all relevant records are scanned, it scans the entire partition that might cover several hours and even more than a day.
287304

288305
There are several cases where the system cannot provide an accurate measurement of the time range. This happens in most of the cases where the query's span less than a day or in multi-workspace queries.
@@ -298,9 +315,9 @@ While some queries require usage of old data, there are cases where old data is
298315

299316
Such cases can be for example:
300317

301-
- Not setting the time range in Log Analytics with a sub-query that isnt limited. See example above.
318+
- Not setting the time range in Log Analytics with a sub-query that isn't limited. See example above.
302319
- Using the API without the time range optional parameters.
303-
- Using a client that doesnt force a time range such as the Power BI connector.
320+
- Using a client that doesn't force a time range such as the Power BI connector.
304321

305322
See examples and notes in the pervious section as they are also relevant in this case.
306323

@@ -337,10 +354,10 @@ To efficiently execute a query, it is partitioned and distributed to compute nod
337354
Query behaviors that can reduce parallelism include:
338355

339356
- Use of serialization and window functions such as the [serialize operator](/azure/kusto/query/serializeoperator), [next()](/azure/kusto/query/nextfunction), [prev()](/azure/kusto/query/prevfunction), and the [row](/azure/kusto/query/rowcumsumfunction) functions. Time series and user analytics functions can be used in some of these cases. Inefficient serialization may also happen if the following operators are used not at the end of the query: [range](/azure/kusto/query/rangeoperator), [sort](/azure/kusto/query/sortoperator), [order](/azure/kusto/query/orderoperator), [top](/azure/kusto/query/topoperator), [top-hitters](/azure/kusto/query/tophittersoperator), [getschema](/azure/kusto/query/getschemaoperator).
340-
- Usage of [dcount()](/azure/kusto/query/dcount-aggfunction) aggregation function force the system to have central copy of the distinct values. When the scale of data is high, consider using the dcount function optional parameters to reduced accuracy.
341-
- In many cases, the [join](/azure/kusto/query/joinoperator?pivots=azuremonitor) operator lowers overall parallelism. Examine shuffle join as an alternative when performance is problematic.
342-
- In resource-scope queries, the pre-execution RBAC checks may linger in situations where there is very large number of RBAC assignments. This may lead to longer checks that would result in lower parallelism. For example, a query is executed on a subscription where there are thousands of resources and each resource has many role assignments in the resource level, not on the subscription or resource group.
343-
- If a query is processing small chunks of data, its parallelism will be low as the system will not spread it across many compute nodes.
357+
- Usage of [dcount()](/azure/kusto/query/dcount-aggfunction) aggregation function force the system to have central copy of the distinct values. When the scale of data is high, consider using the dcount function optional parameters to reduced accuracy.
358+
- In many cases, the [join](/azure/kusto/query/joinoperator?pivots=azuremonitor) operator lowers overall parallelism. Examine shuffle join as an alternative when performance is problematic.
359+
- In resource-scope queries, the pre-execution RBAC checks may linger in situations where there is very large number of RBAC assignments. This may lead to longer checks that would result in lower parallelism. For example, a query is executed on a subscription where there are thousands of resources and each resource has many role assignments in the resource level, not on the subscription or resource group.
360+
- If a query is processing small chunks of data, its parallelism will be low as the system will not spread it across many compute nodes.
344361

345362

346363

0 commit comments

Comments
 (0)