Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
fbf5007
SUMO-251500: Adding monitor's information to OTEL Apps Set2
chetanchoudhary-sumo Dec 13, 2024
0a44606
Adding Oracle and IIS10 Otel monitor info
chetanchoudhary-sumo Dec 16, 2024
0cbcbd1
Update docs/integrations/databases/opentelemetry/mariadb-opentelemetr…
chetanchoudhary-sumo Dec 16, 2024
12da77f
Update docs/integrations/microsoft-azure/opentelemetry/sql-server-lin…
chetanchoudhary-sumo Dec 16, 2024
ad216f3
Update docs/integrations/databases/opentelemetry/couchbase-openteleme…
chetanchoudhary-sumo Dec 16, 2024
eea0cd3
Update docs/integrations/web-servers/opentelemetry/squid-proxy-opente…
chetanchoudhary-sumo Dec 16, 2024
b5d8d2a
Update docs/integrations/microsoft-azure/opentelemetry/sql-server-lin…
chetanchoudhary-sumo Dec 16, 2024
23f64ef
Update docs/integrations/microsoft-azure/opentelemetry/sql-server-lin…
chetanchoudhary-sumo Dec 16, 2024
c7086de
Update docs/integrations/microsoft-azure/opentelemetry/sql-server-lin…
chetanchoudhary-sumo Dec 16, 2024
986c99d
Update docs/integrations/databases/opentelemetry/oracle-opentelemetry.md
chetanchoudhary-sumo Dec 17, 2024
85569ec
Update docs/integrations/databases/opentelemetry/oracle-opentelemetry.md
chetanchoudhary-sumo Dec 17, 2024
b7833b7
Update docs/integrations/databases/opentelemetry/oracle-opentelemetry.md
chetanchoudhary-sumo Dec 17, 2024
5381153
Update docs/integrations/microsoft-azure/opentelemetry/sql-server-lin…
chetanchoudhary-sumo Dec 17, 2024
5e0a5e0
Update docs/integrations/databases/opentelemetry/oracle-opentelemetry.md
chetanchoudhary-sumo Dec 17, 2024
3a606bf
Update docs/integrations/databases/opentelemetry/oracle-opentelemetry.md
chetanchoudhary-sumo Dec 17, 2024
144e7d7
Update docs/integrations/databases/opentelemetry/oracle-opentelemetry.md
chetanchoudhary-sumo Dec 17, 2024
9845ad1
Update docs/integrations/databases/opentelemetry/oracle-opentelemetry.md
chetanchoudhary-sumo Dec 17, 2024
bcb7d3f
Update docs/integrations/databases/opentelemetry/oracle-opentelemetry.md
chetanchoudhary-sumo Dec 17, 2024
f995140
Update docs/integrations/databases/opentelemetry/oracle-opentelemetry.md
chetanchoudhary-sumo Dec 17, 2024
35ca4ac
Update sql-server-linux-opentelemetry.md
amee-sumo Dec 17, 2024
7862084
Update iis-10-opentelemetry.md
amee-sumo Dec 17, 2024
b8c5838
Merge branch 'main' into monitors_section_otel_apps_set2
amee-sumo Dec 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -226,3 +226,20 @@ Use this dashboard to:
- To understand user behavior accessing clusters and servers through Rest API.

<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Couchbase-OpenTelemetry/Couchbase-HTTP-Access.png' alt="Access" />

## Create monitors for Couchbase app

import CreateMonitors from '../../../reuse/apps/create-monitors.md';

<CreateMonitors/>

### Couchbase alerts

| Name | Description | Alert Condition | Recover Condition |
|:--|:--|:--|:--|
| `Couchbase - Bucket Not Ready` | This alert is triggered when a bucket in the Couchbase cluster is not ready. | Count `>` 0 | Count `<=` 0 |
| `Couchbase - High Latency HTTP Requests` | This alert is triggered on high average latency for HTTP requests to the Couchbase. | Count `>` 1000 | Count `<=` 1000 |
| `Couchbase - Node Down` | This alert is triggered when a node in the Couchbase cluster is down. | Count `>` 0 | Count `<=` 0 |
| `Couchbase - Node Not Respond` | This alert is triggered when a node in the Couchbase cluster does not respond too many times. | Count `>=` 10 | Count `<` 10 |
| `Couchbase - Too Many Error Queries on Buckets` | This alert is triggered when there are too many error queries on a bucket in a Couchbase cluster. | Count `>=` 1000 | Count `<` 1000 |
| `Couchbase - Too Many Login Failures` | This alert is triggered when there are too many login failures to a node in a Couchbase cluster. | Count `>=` 1000 | Count `<` 1000 |
Original file line number Diff line number Diff line change
Expand Up @@ -248,3 +248,19 @@ Use this dashboard to:
- Examine slow query trends to determine if there are periodic performance bottlenecks in your database clusters.

<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/MariaDB-OpenTelemetry/MariaDB-Slow-Queries.png' alt="Slow Queries" />

## Create monitors for MariaDB app

import CreateMonitors from '../../../reuse/apps/create-monitors.md';

<CreateMonitors/>

### MariaDB alerts

| Name | Description | Alert Condition | Recover Condition |
|:--|:--|:--|:--|
| `MariaDB - Critical Errors` | This alert is triggered when there are critical database errors. | Count `>` 10 | Count `<=` 10 |
| `MariaDB - Excessive Slow Query Detected` | This alert is triggered when the average time to execute a query is more than 15 seconds for a 5 minute time interval. | Count `>=` 1 | Count `<` 1 |
| `MariaDB - Failed Login Attempts` | This alert is triggered when there are excessive failed login attempts in a short period. | Count `>=` 1 | Count `<` 1 |
| `MariaDB - Instance down` | This alert is triggered when the MariaDB instance is down. | Count `>=` 1 | Count `<` 1 |
| `MariaDB - Replication Failure` | This alert is triggered when there are replication failures. | Count `>=` 1 | Count `<` 1 |
Original file line number Diff line number Diff line change
Expand Up @@ -559,3 +559,26 @@ See information derived from the syslog audit trail, including successful and fa
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Oracle-OpenTelemetry/Oracle-Performance-Details.png' alt="Monitor Performance by DB Script" />
The Oracle - Performance Details dashboard gives insight about - count of rollback, commits, transaction, process, session.
In addition to this it helps monitoring physical and logical reads, PGA allocated. This dashboard is based on the [metrics collected by Oracle DB opentelemetry receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/oracledbreceiver/documentation.md).

## Create monitors for Oracle app

import CreateMonitors from '../../../reuse/apps/create-monitors.md';

<CreateMonitors/>

### Oracle alerts

| Name | Description | Alert Condition | Recover Condition |
|:--|:--|:--|:--|
| `Oracle - Admin Restricted Command Execution` | This alert is triggered when the Listener cannot resolve a command. | Count `>` 0 | Count `<=` 0 |
| `Oracle - Archival Log Creation` | This alert is triggered when an archive log creation error occurs. | Count `>` 0 | Count `<=` 0 |
| `Oracle - Block Corruption` | This alert is triggered when corrupt data blocks are detected. | Count `>` 0 | Count `<=` 0 |
| `Oracle - Database Crash` | This alert is triggered when the database crashes. | Count `>` 0 | Count `<=` 0 |
| `Oracle - Deadlock` | This alert is triggered when deadlocks are detected. | Count `>` 5 | Count `<=` 5 |
| `Oracle - Fatal NI Connect Error` | This alert is triggered when a "Fatal NI connect error" is detected. | Count `>` 0 | Count `<=` 0 |
| `Oracle - Internal Errors` | This alert is triggered when internal errors are detected. | Count `>` 0 | Count `<=` 0 |
| `Oracle - Login Fail` | This alert is triggered when a user login failure is detected. | Count `>` 0 | Count `<=` 0 |
| `Oracle - Possible Inappropriate Activity` | This alert is triggered when possible inappropriate activity is detected. | Count `>` 0 | Count `<=` 0 |
| `Oracle - TNS Error` | This alert is triggered when TNS operation errors are detected. | Count `>` 0 | Count `<=` 0 |
| `Oracle - Unable To Extend Tablespace` | This alert is triggered when tablespace extension failures are detected. | Count `>` 0 | Count `<=` 0 |
| `Oracle - Unauthorized Command Execution` | This alert is triggered when a user is not authorized to execute a requested listener command in an Oracle instance. | Count `>` 0 | Count `<=` 0 |
Original file line number Diff line number Diff line change
Expand Up @@ -184,3 +184,21 @@ Use this dashboard to:
- Monitor any errors and warnings.

<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/SQLServer-Linux-OpenTelemetry/SQL-Server-Operations.png' alt="Operations" />

## Create monitors for SQL Server Linux app

import CreateMonitors from '../../../reuse/apps/create-monitors.md';

<CreateMonitors/>

### SQL Server Linux alerts

| Name | Description | Alert Condition | Recover Condition |
|:--|:--|:--|:--|
| `SQL Server - AppDomain` | This alert is triggered when AppDomain-related issues are detected in your SQL Server instance. | Count `>=` 1 | Count `<` 1 |
| `SQL Server - Backup Fail` | This alert is triggered when the SQL Server backup fails. | Count `>=` 1 | Count `<` 1 |
| `SQL Server - Deadlock` | This alert is triggered when deadlocks are detected in a SQL Server instance. | Count `>` 5 | Count `<=` 5 |
| `SQL Server - Instance Down` | This alert is triggered when the SQL Server instance is down for 5 minutes. | Count `>` 0 | Count `<=` 0 |
| `SQL Server - Insufficient Space` | This alert is triggered when the SQL Server instance cannot allocate a new page for the database due to insufficient disk space in the filegroup. | Count `>` 0 | Count `<=` 0 |
| `SQL Server - Login Fail` | This alert is triggered when the user is unable to login to the SQL Server. | Count `>=` 1 | Count `<` 1 |
| `SQL Server - Mirroring Error` | This alert is triggered when an error occurs in SQL Server mirroring. | Count `>=` 1 | Count `<` 1 |
Original file line number Diff line number Diff line change
Expand Up @@ -318,3 +318,23 @@ The **IIS - Web Service** dashboard provides a high-level view of the Web Servic

<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/IIS-OpenTelemetry/IIS-Web-Service.png' alt="IIS-Web-Service" />

## Create monitors for IIS app

import CreateMonitors from '../../../reuse/apps/create-monitors.md';

<CreateMonitors/>

### IIS alerts

| Name | Description | Alert Condition | Recover Condition |
|:--|:--|:--|:--|
| `IIS - Access from Highly Malicious Sources` | This alert is triggered when an IIS server is accessed from highly malicious IP addresses. | Count `>` 0 | Count `<=` 0 |
| `IIS - ASP.NET Application Errors` | This alert is triggered when an error is detected in the ASP.NET applications running on an IIS server. | Count `>` 0 | Count `<=` 0 |
| `IIS - Blocked Async IO Requests` | This alert is triggered when blocked async I/O requests are detected on an IIS server. | Count `>` 0 | Count `<=` 0 |
| `IIS - Error Events` | This alert is triggered when an error is detected in the IIS logs. | Count `>` 0 | Count `<=` 0 |
| `IIS - High ASP.NET Current Requests` | This alert is triggered when the current ASP.NET request count exceeds the given value (Default 500). | Count `>` 500 | Count `<=` 500 |
| `IIS - High Client (HTTP 4xx) Error Rate (Copy)` | This alert is triggered when more than 5% of HTTP requests result in a 4xx response code. | Count `>` 0 | Count `<=` 0 |
| `IIS - High Current Connections` | This alert is triggered when the current connections exceed the given value (Default 1000), indicating potential capacity issues. | Count `>` 1000 | Count `<=` 1000 |
| `IIS - High Server (HTTP 5xx) Error Rate` | This alert is triggered when more than 5% of HTTP requests result in a 5xx response code. | Count `>` 0 | Count `<=` 0 |
| `IIS - No Worker Processes` | This alert is triggered when the worker process count drops to zero, indicating potential application pool issues. | Count `<` 1 | Count `>=` 1 |
| `IIS - Slow Response Time` | This alert is triggered when the response time for a given IIS server exceeds one second. | Count `>` 0 | Count `<=` 0 |
Original file line number Diff line number Diff line change
Expand Up @@ -194,3 +194,18 @@ The **The Squid Proxy - HTTP Response Analysis** dashboard provides insights int
The **Squid Proxy - Quality of Service** dashboard provides insights into latency, the response time of requests according to HTTP action, and the response time according to location.

<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Squid-Proxy-OpenTelemetry/Squid-Proxy-Quality-of-Service.png' alt="Quality of Service" />

## Create monitors for SquidProxy app

import CreateMonitors from '../../../reuse/apps/create-monitors.md';

<CreateMonitors/>

### SquidProxy alerts

| Name | Description | Alert Condition | Recover Condition |
|:--|:--|:--|:--|
| `Squid Proxy - High Client (HTTP 4xx) Error Rate` | This alert is triggered when there are too many HTTP requests (>5%) with a response status of 4xx. | Count `>` 0 | Count `<=` 0 |
| `Squid Proxy - High Denied Request` | This alert is triggered when there are too many HTTP denied requests (>5%). | Count `>` 0 | Count `<=` 0 |
| `Squid Proxy - High Response Time` | This alert is triggered when requests are taking too long to process. | Count `>` 20 | Count `<=` 20 |
| `Squid Proxy - High Server (HTTP 5xx) Error Rate` | This alert is triggered when there are too many HTTP requests (>5%) with a response status of 5xx. | Count `>` 0 | Count `<=` 0 |
Original file line number Diff line number Diff line change
Expand Up @@ -184,3 +184,17 @@ The **Varnish - Visitor Traffic Insight** dashboard provides detailed informatio
The **Varnish - Web Server Operations** dashboard provides a high-level view combined with detailed information on the top ten bots, geographic locations and data for clients with high error rates, server errors over time, and non 200 response code status codes. Dashboard panels also show information on server error logs, error log levels, error responses by server, and the top URIs responsible for 404 responses.
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Varnish-OpenTelemetry/Varnish-Web-Server-Operations.png' alt="Web Server Operations" />
## Create monitors for Varnish app
import CreateMonitors from '../../../reuse/apps/create-monitors.md';
<CreateMonitors/>
### Varnish alerts
| Name | Description | Alert Condition | Recover Condition |
|:--|:--|:--|:--|
| `Varnish - Access from Highly Malicious Sources` | This alert is triggered when Varnish is accessed from highly malicious IP addresses. | Count `>` 0 | Count `<=` 0 |
| `Varnish - High 4XX Error Rate` | This alert is triggered when there are too many HTTP requests (>5%) with a response status of 4xx. | Count `>` 5 | Count `<=` 5 |
| `Varnish - High 5XX Error Rate` | This alert is triggered when there are too many HTTP requests (>5%) with a response status of 5xx. | Count `>` 5 | Count `<=` 5 |
Loading