-
Notifications
You must be signed in to change notification settings - Fork 228
Best practices for enhanced search performance #5284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
504612e
Best practices for enhanced search performance
JV0812 f63cbfc
minor fixes
JV0812 2ec8206
Update docs/search/optimize-search-performance.md
JV0812 64703eb
Update docs/search/optimize-search-performance.md
JV0812 2f2a7cc
Update docs/search/optimize-search-performance.md
JV0812 df43835
Update docs/search/optimize-search-performance.md
JV0812 b58a567
Update docs/search/optimize-search-performance.md
JV0812 8a78488
minor fixes
JV0812 772deeb
Merge branch 'best-practices-search' of https://github.com/SumoLogic/…
JV0812 ee96149
Replace slanted quotation marks with straight ones
jpipkin1 011b535
Merge branch 'main' into best-practices-search
JV0812 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,7 +29,7 @@ Log data may not be kept when sent via HTTP Sources or Cloud Syslog Sources, as | |
| * Sumo Logic accounts can be upgraded at any time to allow for additional quota. Contact [Sumo Logic Sales](mailto:[email protected]) to customize your account to meet your organization's needs. | ||
|
|
||
| :::important | ||
| Compressed files are decompressed before they are ingested, so they are ingested at the decompressed file size rate. | ||
| [Compressed files](/docs/send-data/hosted-collectors/http-source/logs-metrics/#compressed-data) are decompressed before they are ingested, so they are ingested at the decompressed file size rate. | ||
| ::: | ||
|
|
||
| ## Log Throttling | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -70,3 +70,191 @@ Here's a quick look at how to choose the right indexed search optimization tool. | |
| As data enters Sumo Logic, it is first routed to any Partitions for indexing. It is then checked against Scheduled Views, and any data that matches the Scheduled Views is indexed. | ||
|
|
||
| Data can be in both a Partition and a Scheduled View because the two tools are used differently (and are indexed separately). Although Partitions are indexed first, the process does not slow the indexing of Scheduled Views. | ||
|
|
||
| ## Additional methods to optimize Search performance | ||
|
|
||
| ### Use the smallest Time Range | ||
|
|
||
| Always set the search time range to the minimum duration required for your use case. This reduces the data volume and improve the query efficiency. When working with long time ranges, start by building and testing your search on a shorter time range. Once the search is finalized and validated, extend it to cover the entire period needed for your analysis. | ||
JV0812 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Use fields extracted by FERs | ||
|
|
||
| Instead of relying on the `where` operator, filter the data using fields that are already extracted through the Field Extraction Rules (FERs) in the source expression. This approach is more efficient and improves query performance. | ||
|
|
||
| **Recommended approach:** | ||
|
|
||
| ``` | ||
| sourceCategory=foo and field_a=value_a | ||
| ``` | ||
|
|
||
| **Not recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=foo | ||
| | where field_a="value_a" | ||
| ``` | ||
|
|
||
| ### Move terms from parse statement to source expression | ||
|
|
||
| Adding the parsing terms in the source expression will help you enhance the search performance. A parse statement without `nodrop` drops the logs that could not parse the desired field. For example, `parse “completed * action“ as actionName` will remove logs that do not have **completed** and **action** terms. | ||
|
|
||
| **Recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog completed action | ||
| | parse “completed * action“ as actionName | ||
| | count by actionName | ||
jpipkin1 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| **Not recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse “completed * action“ as actionName | ||
| | count by actionName | ||
| ``` | ||
|
|
||
| ### Filter data before aggregation | ||
|
|
||
| While filtering the date, reduces the result set to the smallest possible size before performing aggregate operations such as sum, min, max, and average. Also, use subquery in source expression instead of using `if` or `where` search operators. | ||
JV0812 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| **Recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog userName | ||
| | parse “userName: *, “ as user | ||
| | where user="john" | ||
| | count by user | ||
| ``` | ||
|
|
||
| **Not recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse “userName: *, “ as user | ||
| | count by user | ||
| | where user="john" | ||
| ``` | ||
|
|
||
| ### Remove redundant operators | ||
|
|
||
| Remove the search operators in the query that are not referred or is not really required for the desired results. | ||
|
|
||
| For example, let’s say you have a `sort` operator before an aggregation and this sorting does not make any difference to the aggregated results, resulting in reducing the performance. | ||
JV0812 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| **Recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse “userName: *, “ as user | ||
| | count by user | ||
| ``` | ||
|
|
||
| **Not recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse “userName: *, “ as user | ||
| | parse “evenName: *, “ as event | ||
| | count by user | ||
| ``` | ||
|
|
||
| ### Merge operators | ||
|
|
||
| If the same operators are used multiple times in different levels of query, if possible, try to merge these similar operators. Also, do not use the same operator multiple times to get the same value. This helps in reducing the number of passes performed on the data thereby improving the search performance. | ||
|
|
||
| **Example 1:** | ||
|
|
||
| **Recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse “completed * action in * ms“ as actionName, duration | ||
| | pct(duration, 95) by actionName | ||
| ``` | ||
|
|
||
| **Not recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse “completed * action“ as actionName | ||
| | parse “action in * ms“ as duration | ||
| | pct(duration, 95) by actionName | ||
| ``` | ||
|
|
||
| **Example 2:** | ||
|
|
||
| **Recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse “completed * action“ as actionName | ||
| | toLowerCase(actionName) as actionNameLowered | ||
| | where actionNameLowered = “logIn” or actionNameLowered matches “abc*” or actionNameLowered contains “xyz” | ||
| ``` | ||
|
|
||
| **Not recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse “completed * action“ as actionName | ||
| | where toLowerCase(actionName) = “logIn” or toLowerCase(actionName) matches “abc*” or toLowerCase(actionName) contains “xyz" | ||
| ``` | ||
|
|
||
| ### Use lookup on the lowest possible dataset | ||
|
|
||
| Minimize the data processed by the `lookup` operator in the query, as lookup is an expensive operation. It can be done in two ways: | ||
|
|
||
| - Use the lookup as late as possible in the query assuming that clauses before lookup are doing additional data filtering. | ||
| - Move the lookup after an aggregation to drastically reduce the data processed by lookup, as aggregated data is generally far less than non-aggregated data. | ||
|
|
||
| **Not recommended approach:** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse “completed * action in * ms“ as actionName, duration | ||
| | lookup actionType from path://"/Library/Users/[email protected]/actionTypes" on actionName | ||
| | where actionName in (“login”, “logout”) | ||
| | count by actionName, actionType | ||
| ``` | ||
JV0812 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| **Recommended approach (Option 1):** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse “completed * action in * ms“ as actionName, duration | ||
| | where actionName in (“login”, “logout”) | ||
| | count by actionName | ||
| | lookup actionType from path://"/Library/Users/[email protected]/actionTypes" on actionName | ||
| ``` | ||
|
|
||
| **Recommended approach (Option 2):** | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse “completed * action in * ms“ as actionName, duration | ||
| | where actionName in (“login”, “logout”) | ||
| | lookup actionType from path://"/Library/Users/[email protected]/actionTypes" on actionName | ||
| | count by actionName, actionType | ||
| ``` | ||
|
|
||
| ### Avoid multiple parse multi statements | ||
|
|
||
| A parse multi statement causes a single log to produce multiple logs in the results. But if a parse multi statement is followed by more parse multi statements, it can lead to data explosion and the query may never finish. Even if the query works the results may not be as expected. | ||
JV0812 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| For example, consider the below query where the assumption is that a single log line contains multiple users and multiple event names. | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse regex “userName: (?<user>[a-z-A-Z]+), “ multi | ||
| | parse regex “eventName: (?<event>[a-z-A-Z]+), “ multi | ||
| ``` | ||
|
|
||
| But if you write the query like that, it will generate a result for every combination of `userName` and `eventName` values. Now suppose you want to count by `eventName`, it will not give you the desired result, since a single `eventName` has been duplicated for every `userName` in the same log. So, the better query would be: | ||
|
|
||
| ``` | ||
| _sourceCategory=Prod/User/Eventlog | ||
| | parse regex “userName: (?<user>[a-z-A-Z]+), eventName: (?<event>[a-z-A-Z]+), “ multi | ||
| ``` | ||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.