-
Notifications
You must be signed in to change notification settings - Fork 227
System Limits Visibility #5996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JV0812
wants to merge
3
commits into
main
Choose a base branch
from
health-events-doc-update
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
System Limits Visibility #5996
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| --- | ||
| title: System Limits Visibility (Manage) | ||
| image: https://assets-www.sumologic.com/company-logos/_800x418_crop_center-center_82_none/SumoLogic_Preview_600x600.jpg?mtime=1617040082 | ||
| keywords: | ||
| - system-limits-visibility | ||
| - manage | ||
| - health-events | ||
| hide_table_of_contents: true | ||
| --- | ||
|
|
||
| We’re excited to announce that Health Events are now automatically generated when 90% credit usage threshold is exceeded for Lookup Tables, Partitions, Fields, or Field Extraction Rules (FERs). These health events can further be configured to receive timely alerts whenever a threshold breach occurs, ensuring that all designated recipients are promptly notified when the health event is triggered every time. [Learn more](/docs/manage/health-events). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,89 +1,35 @@ | ||||||
| --- | ||||||
| id: health-events | ||||||
| title: Health Events | ||||||
| description: Monitor the health of your Collectors and Sources. | ||||||
| description: Monitor the health of your Collectors, Sources, and Log data. | ||||||
| --- | ||||||
|
|
||||||
| ## Availability | ||||||
|
|
||||||
| | Account Type | Account Level | | ||||||
| |:--------------|:---------------------------------------------------------------------------------| | ||||||
| | CloudFlex | Professional, Enterprise | | ||||||
| | Credits | Trial, Essentials, Enterprise Operations, Enterprise Security, Enterprise Suite | | ||||||
|
|
||||||
| Health events allow you to keep track of the health of your Collectors, Sources, and Ingest Budgets. You can use them to find and investigate common errors and warnings that are known to cause collection issues. | ||||||
|
|
||||||
| This framework includes the following: | ||||||
|
|
||||||
| * Health event logs indexed in the [System Event Index](/docs/manage/security/audit-indexes/system-event-index). | ||||||
| * A [health events table](#health-events-table) on the Alerts page. | ||||||
| * A health status column on the [Collection page](#collection-page). | ||||||
|
|
||||||
| Health events are sent from Installed Collectors on version 19.308-2 and | ||||||
| later. | ||||||
|
|
||||||
| ## Alerts | ||||||
|
|
||||||
| Alerts for specific health events are easy to create in the Health Events Table. The details pane of an event provides a **Create Scheduled Search** button to automatically generate the required query. | ||||||
|
|
||||||
| ## Health events | ||||||
|
|
||||||
| Health events are created when an issue is detected with a Collector or Source. Events are indexed and searchable in a separate partition named **sumologic_system_events** in the [System Event Index](/docs/manage/security/audit-indexes/system-event-index). For details on what information is available in a health event, see the [common parameters](#common-parameters) table. | ||||||
|
|
||||||
| ### Health events table | ||||||
| import useBaseUrl from '@docusaurus/useBaseUrl'; | ||||||
|
|
||||||
| The health events table allows you to easily view and investigate problems getting your data to Sumo. | ||||||
| System Health Events are generated automatically when the system detects an issue within a Collector or Source, or when a credit usage threshold is exceeded for Lookup Tables, Partitions, Fields, or Field Extraction Rules (FERs). | ||||||
|
|
||||||
| On the health events table, you can search, filter, and sort incidents by key aspects like severity, resource name, event name, resource type, and opened since date. | ||||||
| These events provide visibility into the operational health of Collectors, Sources, and Ingest Budgets, enabling administrators to monitor performance and identify potential issues proactively. Health events also help in investigating common errors and warnings known to affect data collection and processing. | ||||||
|
|
||||||
| [**New UI**](/docs/get-started/sumo-logic-ui/). To access the health events table, in the main Sumo Logic menu select **Data Management**, and then under **Data Collection** select **Health Events**. You can also click the **Go To...** menu at the top of the screen and select **Health Events**. | ||||||
| Additionally, a health event is triggered when any credit limit associated with Lookup Tables, Partitions, Fields, or FERs reaches or exceeds 90% of the allocated capacity, allowing timely action to prevent service disruption. This health event will auto-resolve when the usage falls back below the 90% threshold limit. | ||||||
|
|
||||||
| [**Classic UI**](/docs/get-started/sumo-logic-ui-classic). To access the health events table, in the main Sumo Logic menu select **Manage Data > Monitoring > Health Events**. | ||||||
|
|
||||||
|  | ||||||
|
|
||||||
| Click on a row to view the details of a health event. | ||||||
|
|
||||||
|  | ||||||
|
|
||||||
| Click the **Create Scheduled Search** button on the details pane to get alerts for specific health events. The unique identifier of the resource, such as the Source or Collector, is used in the query. See [Schedule a Search](../alerts/scheduled-searches/schedule-search.md) for details. | ||||||
|
|
||||||
| Under the **More Actions** menu you can select: | ||||||
|
|
||||||
| * **Event History** to run a search against the **sumologic_system_events** partition to view all of the related event logs. | ||||||
| * **View Object** to view the Collector or Source in the Collection page related to the event. | ||||||
|
|
||||||
| ### Health events severity | ||||||
| :::note | ||||||
| Health events are sent from Installed Collectors of version `19.308-2` and later. | ||||||
| ::: | ||||||
|
|
||||||
| Events are categorized by two severity levels, warning and error. The severity column has color-coded error and warning events so you can quickly determine the severity of a given issue. | ||||||
| ## Availability | ||||||
|
|
||||||
| *  A warning indicates the Collector or Source has a configuration issue or is operating in a degraded state. | ||||||
| *  An error indicates the Collector or Source is unable to collect data as expected. | ||||||
| | Account Type | Account Level | | ||||||
| |:--------------|:---------------------------------------------------------------------------------| | ||||||
| | CloudFlex | Professional, Enterprise | | ||||||
| | Credits | Trial, Essentials, Enterprise Operations, Enterprise Security, Enterprise Suite | | ||||||
|
|
||||||
| ### Common parameters | ||||||
| ## Event schema | ||||||
|
|
||||||
| Each health event log has common keys that categorize it to a product | ||||||
| area and provide details of the event. The following table shows the | ||||||
| common parameters in the order that they are found in health event logs. | ||||||
| This section defines the structure of System Health Events, including all key parameters and their descriptions. The example below illustrates a sample health event in JSON format, followed by a parameter table explaining each field for better understanding and analysis. | ||||||
|
|
||||||
| | Parameter | Description | Data Type | | ||||||
| |:--|:--|:--| | ||||||
| | status | Either `Healthy` or `Unhealthy` based on the event. | String | | ||||||
| | details | The details of the event include the type as `trackerId`, the `name` of the event, and a `description`. | JSON object of Strings | | ||||||
| | eventType | Health events have a value of `Health-Change`. | String | | ||||||
| | severityLevel | Either `Error` or `Warning` based on the event. | String | | ||||||
| | accountId | The unique identifier of the organization. | String | | ||||||
| | eventId | The unique identifier of the event. | String | | ||||||
| | eventName | The name of the event. | String | | ||||||
| | eventTime | The event timestamp in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format. | String | | ||||||
| | eventFormatVersion | The event log format version. | String | | ||||||
| | operator | Information on who did the operation. If it's missing, the Sumo service was the operator. | JSON object of Strings | | ||||||
| | subsystem | The product area of the event. | String | | ||||||
| | resourceIdentity | This includes any unique identifiers, names, and the type of the object associated with the event. | JSON object of Strings | | ||||||
| ### JSON example | ||||||
|
|
||||||
| ### Health event log example | ||||||
|
|
||||||
| ```json | ||||||
| ```json title="Sample Health Event" | ||||||
| { | ||||||
| "status": "UnHealthy", | ||||||
| "details": { | ||||||
|
|
@@ -109,10 +55,94 @@ common parameters in the order that they are found in health event logs. | |||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| ### Parameters table | ||||||
|
|
||||||
| Each health event log has common keys that categorize it to a product area and provide details of the event. The following table shows the common parameters in the order that they are found in health event logs. | ||||||
|
|
||||||
| | Parameter | Description | Data type | | ||||||
| |:--|:--|:--| | ||||||
| | status | Either `Healthy` or `Unhealthy` based on the event. | String | | ||||||
| | details | The details of the event include the type as `trackerId`, the `name` of the event, and a `description`. | JSON object of Strings | | ||||||
| | eventType | Health events have a value of `Health-Change`. | String | | ||||||
| | severityLevel | Either `Error` or `Warning` based on the event. | String | | ||||||
| | accountId | The unique identifier of the organization. | String | | ||||||
| | eventId | The unique identifier of the event. | String | | ||||||
| | eventName | The name of the event. | String | | ||||||
| | eventTime | The event timestamp in [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format. | String | | ||||||
| | eventFormatVersion | The event log format version. | String | | ||||||
| | operator | Information on who did the operation. If it's missing, the Sumo service was the operator. | JSON object of Strings | | ||||||
| | subsystem | The product area of the event. | String | | ||||||
| | resourceIdentity | This includes any unique identifiers, names, and the type of the object associated with the event. | JSON object of Strings | | ||||||
|
|
||||||
| ## Configure Scheduled Search | ||||||
|
|
||||||
| Configuring the scheduled search for the selected health event will help you with timely alerts to all the recipients when the health event is triggered every time. To configure, follow the below steps: | ||||||
|
|
||||||
| 1. [**Classic UI**](/docs/get-started/sumo-logic-ui-classic). Go to **Manage Data > Monitoring > Health Events**.<br/>[**New UI**](/docs/get-started/sumo-logic-ui). In the main Sumo Logic menu select **Data Management**, and then under **Data Collection** select **Health Events**. <br/><img src={useBaseUrl('/img/health-events/health-events-table.png')} alt="health-events-table" style={{border: '1px solid gray'}} width="800"/> | ||||||
| 1. Click on the required row to view the details of a health event. <br/><img src={useBaseUrl('/img/health-events/health-event-detail.png')} alt="health-events-detial" style={{border: '1px solid gray'}} width="400"/> | ||||||
| 1. Click the **Create Scheduled Search** button and configure it based on your requirement. For more details, refer to [Create a Scheduled Search](/docs/alerts/scheduled-searches/schedule-search/). | ||||||
| :::info | ||||||
| Query will be auto-generated for the selected health event. | ||||||
| ::: | ||||||
|
|
||||||
| Use the below scheduled search query to get an alert when 90% threshold is exceeded for Lookup Tables, Partitions, Fields, or Field Extraction Rules (FERs). | ||||||
|
|
||||||
| ``` sql | ||||||
| _index=sumologic_system_events "0000000007063B25" | ||||||
| | json "eventType", "resourceIdentity.id" as eventType , resourceId | ||||||
| | where eventType = "Health-Change" AND resourceId = "0000000007063B25" | ||||||
| ``` | ||||||
|
|
||||||
| For specific `eventType`, `resourceId`, `eventName`: | ||||||
|
|
||||||
| ```sql | ||||||
| _index=sumologic_system_events "0000000007063B25" | ||||||
| | json "eventType", "resourceIdentity.id","eventName" as eventType, resourceId, eventName | ||||||
| | where eventType = "Health-Change" AND resourceId = "0000000007063B25" AND eventName='LookupsLimitApproaching' | ||||||
| ``` | ||||||
|
|
||||||
| ## View Health Events | ||||||
|
|
||||||
| The health events table allows you to easily view and investigate problems which occurs while injecting the data to Sumo Logic. On the health events table, you can search, filter, and sort incidents by key aspects like severity, resource name, event name, resource type, and opened since date. | ||||||
|
|
||||||
| :::info | ||||||
| It may take up to 15 minutes for a 90% usage breach for Lookup Tables, Partitions, Fields, or Field Extraction Rules (FERs) to reflect on the Health Events page after detection. | ||||||
| ::: | ||||||
|
|
||||||
| 1. [**Classic UI**](/docs/get-started/sumo-logic-ui-classic). Go to **Manage Data > Monitoring > Health Events**.<br/>[**New UI**](/docs/get-started/sumo-logic-ui). In the main Sumo Logic menu select **Data Management**, and then under **Data Collection** select **Health Events**. <br/><img src={useBaseUrl('/img/health-events/health-events-table.png')} alt="health-events-table" style={{border: '1px solid gray'}} width="800"/> | ||||||
| 1. Click on the required row to view the details of a health event. <br/><img src={useBaseUrl('/img/health-events/health-event-detail.png')} alt="health-events-detial" style={{border: '1px solid gray'}} width="400"/> | ||||||
| - **Create Scheduled Search**. Click this button to get alerts for specific health events. The unique identifier of the resource type is used in the query. See [Schedule a Search](../alerts/scheduled-searches/schedule-search.md) for details. | ||||||
| - Under the **More Actions** menu you can select: | ||||||
| * **Event History** to run a search against the **sumologic_system_events** partition to view all of the related event logs. | ||||||
| * **View Object** to view the resource in detail related to the event. | ||||||
| - **Description**. Provides the information about the health events error or warning. | ||||||
| - **Severity**. Events are categorized by two severity levels, warning, and error. The severity column has color-coded error and warning events so you can quickly determine the severity of a given issue. | ||||||
| *  A warning indicates the Collector or Source has a configuration issue or is operating in a degraded state. | ||||||
| *  An error indicates the Collector or Source is unable to collect data as expected. | ||||||
| - **Event Name**. The name or type of the health event that occurred. This identifies what kind of issue or status change was detected. | ||||||
| - **Resource Type**. The category or class of resource affected by the event. For example, Collectors, Sources, or Organizations. | ||||||
| - **Resource ID**. A unique identifier for the affected resource. | ||||||
| - **Created At**. The timestamp indicating when the event was generated by the monitoring system. | ||||||
| - **Collector ID**. The unique identifier of the collector that detected and reported the event. This field is only available for *Source* resource type. | ||||||
| - **Collector Name**. The name of the collector associated with the event. This field is only available for *Source* resource type. | ||||||
| - **Error**. A brief summary or title of the detected issue. | ||||||
| - **Service**. Displays the specific resource or service affected by the event. | ||||||
| - **Error Code**. A numeric code associated with the error, that provides a quick reference for troubleshooting or mapping to known issue types. | ||||||
| - **Error Info**. Detailed information about the event. This may include error context and suggested corrective actions. | ||||||
| - **Minutes Since Last Heartbeat**. The number of minutes that have elapsed since the system last received a heartbeat signal from the resource. A higher number may indicate the resource is offline or unresponsive. This field is only available for *Collector* resource type. | ||||||
|
|
||||||
| ## View Health Events in Collection page | ||||||
|
|
||||||
| A **Health** column on the Collection page shows color-coded healthy, error, and warning states for Collectors and Sources to quickly determine the health of your Collectors and Sources.<br/><img src={useBaseUrl('/img/health-events/Collection-health-column.png')} alt="Collection-health-column" style={{border: '1px solid gray'}} width="800"/> | ||||||
|
|
||||||
| To view the number of health events associated with the Collector or Source, perform the following steps: | ||||||
|
|
||||||
| 1. Hover over a **Health** status to view a tooltip that provides the number of health events detected on the selected Collector or Source. <br/><img src={useBaseUrl('/img/health-events/health_tooltip.png')} alt="health_tooltip" style={{border: '1px solid gray'}} width="200"/> | ||||||
| 1. Click on the **Health** status of a Collector or Source to view a pop-up displaying a list of related events. <br/><img src={useBaseUrl('/img/health-events/object_event_details.png')} alt="object_event_details" style={{border: '1px solid gray'}} width="500"/> | ||||||
|
|
||||||
| ## Search health events | ||||||
|
|
||||||
| To search all health events run a query against the internal partition | ||||||
| named **sumologic_system_events**. For example, | ||||||
| Events are indexed and searchable in a separate partition named **sumologic_system_events** in the [System Event Index](/docs/manage/security/audit-indexes/system-event-index). To search all health events run a query against the internal partition named **sumologic_system_events**. For example, | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| ```sql | ||||||
| _index=sumologic_system_events "Health-Change" | ||||||
|
|
@@ -128,22 +158,6 @@ Creating a query that defines built-in metadata field values in the scope can he | |||||
|
|
||||||
| | **Metadata Field** | **Assignment Description** | | ||||||
| |:--|:--| | ||||||
| | _sourceCategory | Value of the [common parameter](#common-parameters), `subsystem`. | | ||||||
| | _sourceName | Value of the [common parameter](#common-parameters), `eventName`. | | ||||||
| | _sourceHost | The remote IP address of the host that made the request. If not available the value will be `no_sourceHost`. | | ||||||
|
|
||||||
| ### Collection page | ||||||
|
|
||||||
| A **Health** column on the Collection page shows color-coded healthy, error, and warning states for Collectors and Sources so you can quickly determine the health of your Collectors and Sources. | ||||||
|
|
||||||
| The **status** column now shows the status of Sources manually paused by users. | ||||||
|
|
||||||
|  | ||||||
|
|
||||||
| * Hover your mouse over a Collector or Source to view a tooltip that provides the number of health events detected on the Collector or Source. | ||||||
|
|
||||||
|  | ||||||
|
|
||||||
| * Click on the **Health** status in a row to view a pop-up displaying a list of related events. | ||||||
|
|
||||||
|  | ||||||
| | _sourceCategory | Value of the [common parameter](#parameters-table), `subsystem`. | | ||||||
| | _sourceName | Value of the [common parameter](#parameters-table), `eventName`. | | ||||||
| | _sourceHost | The remote IP address of the host that made the request. If not available the value will be `no_sourceHost`. | | ||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -97,7 +97,7 @@ cluster=cluster-1 node=node-1 cpu=cpu-1 metric=cpu_idle 97.29 1460061337 | |||||||||
|
|
||||||||||
| ### Mandatory metric name | ||||||||||
|
|
||||||||||
| Unlike Prometheus, Carbon 2.0 format doesn't enforce the presence of a metric name. It also cannot be reliably inferred automatically. Therefore, Sumo Logic requires a `metric` key to be present among `intrinsic_tags`. All metrics without a `metric` key specified will not be ingested to Sumo Logic and a `MetricsMetricNameMissing` Health Event for the associated Metric Source will be triggered (for more information on Health Events, see [About Health Events](/docs/manage/health-events#health-events)). | ||||||||||
| Unlike Prometheus, Carbon 2.0 format doesn't enforce the presence of a metric name. It also cannot be reliably inferred automatically. Therefore, Sumo Logic requires a `metric` key to be present among `intrinsic_tags`. All metrics without a `metric` key specified will not be ingested to Sumo Logic and a `MetricsMetricNameMissing` Health Event for the associated Metric Source will be triggered (for more information on Health Events, see [About Health Events](/docs/manage/health-events)). | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| For example, the following metric will be correctly ingested to Sumo Logic: | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.