Skip to content

Commit 6d37ad2

Browse files
SUMO-251499: Adding monitor's information to OTEL Apps Set1 (#4852)
* SUMO-251499: Adding monitor's information to OTEL Apps Set1 * updating mongodb instance down recovery condition * Update rabbitmq-opentelemetry.md * Addressing feedbacks * Update cassandra-opentelemetry.md * Update memcached-opentelemetry.md * Update mongodb-opentelemetry.md * Update redis-opentelemetry.md * Update haproxy-opentelemetry.md --------- Co-authored-by: Jagadisha V <[email protected]>
1 parent 36f13bb commit 6d37ad2

File tree

6 files changed

+183
-65
lines changed

6 files changed

+183
-65
lines changed

docs/integrations/containers-orchestration/opentelemetry/rabbitmq-opentelemetry.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,10 @@ RabbitMQ logs are sent to Sumo Logic through the OpenTelemetry [filelog receiver
1515

1616
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/RabbitMq-OpenTelemetry/RabbitMQ-Schematics.png' alt="Schematics" />
1717

18+
:::info
19+
This app includes [built-in monitors](#rabbitmq-alerts). For details on creating custom monitors, refer to the [Create monitors for RabbitMQ app](#create-monitors-for-rabbitmq-app).
20+
:::
21+
1822
## Fields creation in Sumo Logic for RabbitMQ
1923

2024
Following are the [Fields](/docs/manage/fields/) which will be created as part of RabbitMQ App install if not already present.
@@ -230,3 +234,20 @@ The **RabbitMQ - Logs** dashboard gives you an at-a-glance view of error message
230234
The **RabbitMQ - Metrics** dashboard gives you an at-a-glance view of your RabbitMQ deployment across brokers, queue, exchange, consumer, and messages.
231235

232236
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/RabbitMq-OpenTelemetry/RabbitMQ-Metrics.png' alt="RabbitMQ Metrics dashboards" />
237+
238+
## Create monitors for RabbitMQ app
239+
240+
import CreateMonitors from '../../../reuse/apps/create-monitors.md';
241+
242+
<CreateMonitors/>
243+
244+
### RabbitMQ alerts
245+
246+
| Name | Description | Alert Condition | Recover Condition |
247+
|:--|:--|:--|:--|
248+
| `RabbitMQ - High Consumer Count` | This alert is triggered when consumers are higher than given value (Default 10000) in a queue. | Count `>=` 10000 | Count `<` 10000 |
249+
| `RabbitMQ - High Message Queue Size` | This alert is triggered when the number of messages in a queue exceeds a given threshold (Default 10000), indicating potential consumer issues or message processing bottlenecks. | Count `>=` 10000 | Count `<` 10000 |
250+
| `RabbitMQ - High Messages Count` | This alert is triggered when messages are higher than given value (Default 10000) in a queue. | Count `>=` 10000 | Count `<` 10000 |
251+
| `RabbitMQ - High Unacknowledged Messages` | This alert is triggered when there are too many unacknowledged messages (Default 5000), suggesting consumer processing issues. | Count `>=` 5000 | Count `<` 5000 |
252+
| `RabbitMQ - Node Down` | This alert is triggered when a node in the RabbitMQ cluster is down. | Count `>=` 1 | Count `<` 1 |
253+
| `RabbitMQ - Zero Consumers Alert` | This alert is triggered when a queue has no consumers, indicating potential service issues. | Count `<=` 0 | Count `>` 0 |

docs/integrations/databases/opentelemetry/cassandra-opentelemetry.md

Lines changed: 41 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
id: cassandra-opentelemetry
33
title: Cassandra - OpenTelemetry Collector
44
sidebar_label: Cassandra - OTel Collector
5-
description: Learn about the Sumo Logic OpenTelemetry App for Cassandra.
5+
description: Learn about the Sumo Logic OpenTelemetry app for Cassandra.
66
---
77

88
import useBaseUrl from '@docusaurus/useBaseUrl';
@@ -15,10 +15,14 @@ The [Cassandra](https://cassandra.apache.org/_/cassandra-basics.html) app is a l
1515

1616
Cassandra logs are sent to Sumo Logic through OpenTelemetry [filelog receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelogreceiver) and cassandra metrics are sent to Sumo Logic using [JMX opentelemetry receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/jmxreceiver) with the `target_system` set as [`cassandra`](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/jmx-metrics/docs/target-systems/cassandra.md).
1717

18-
The app supports Logs from the open-source version of Cassandra. The App is tested on the 4.0.0 version of Cassandra.
18+
The app supports logs from the open-source version of Cassandra. The app is tested on the 4.0.0 version of Cassandra.
1919

2020
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Cassandra-OpenTelemetry/Cassandra-Schematics.png' alt="Schematics" />
2121

22+
:::info
23+
This app includes [built-in monitors](#cassandra-alerts). For details on creating custom monitors, refer to the [Create monitors for Cassandra app](#create-monitors-for-cassandra-app).
24+
:::
25+
2226
## Fields creation in Sumo Logic for Cassandra
2327

2428
Following are the [Fields](/docs/manage/fields/) which will be created as part of Cassandra App install if not already present:
@@ -36,17 +40,17 @@ Following are the [Fields](/docs/manage/fields/) which will be created as part o
3640
JMX receiver collects Cassandra metrics from Cassandra server as part of the OpenTelemetry Collector (OTC).
3741

3842
1. Follow the instructions in [JMX - OpenTelemetry's prerequisites section](/docs/integrations/app-development/opentelemetry/jmx-opentelemetry/#prerequisites) to download the [JMX Metric Gatherer](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/jmx-metrics/README.md). This gatherer is used by the [JMX Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/jmxreceiver#details).
39-
4043
2. Set the JMX port as part of `JAVA_OPTS` for Tomcat startup. Usually, it is set in the `/etc/systemd/system/cassandra.service` or `C:\Program Files\apache-tomcat\bin\tomcat.bat` file.
4144

4245
```json
4346
JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=11099 -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.password.file=${CASSANDRA_CONF_DIR}/jmx.password -Dcom.sun.management.jmxremote.access.file=${CASSANDRA_CONF_DIR}/jmx.access"
4447
```
4548

4649
#### For log collection
47-
Cassandra has three main logs: system.log, debug.log, and gc.log which hold general logging messages, debugging logging messages, and java garbage collection logs respectively.
4850

49-
These logs by default live in `${CASSANDRA_HOME}/logs`, but most Linux distributions relocate logs to `/var/log/cassandra`. Operators can tune this location as well as what levels are logged using the provided logback.xml file. For more details on Cassandra logs, see[ this](https://cassandra.apache.org/doc/latest/troubleshooting/reading_logs.html) link.
51+
Cassandra has three main logs: `system.log`, `debug.log`, and `gc.log`, which hold general logging messages, debugging logging messages, and java garbage collection logs respectively.
52+
53+
These logs by default live in `${CASSANDRA_HOME}/logs`, but most Linux distributions relocate logs to `/var/log/cassandra`. Operators can tune this location as well as what levels are logged using the provided logback.xml file. For more details on Cassandra logs, see [this](https://cassandra.apache.org/doc/latest/troubleshooting/reading_logs.html).
5054

5155
import LogsCollectionPrereqisites from '../../../reuse/apps/logs-collection-prereqisites.md';
5256

@@ -78,7 +82,7 @@ You can add any custom fields which you want to be tagged with the data ingested
7882

7983
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Cassandra-OpenTelemetry/Cassandra-YAML.png' style={{border:'1px solid gray'}} alt="YAML" />
8084

81-
### Step 3: Send logs to Sumo
85+
### Step 3: Send logs to Sumo Logic
8286

8387
import LogsIntro from '../../../reuse/apps/opentelemetry/send-logs-intro.md';
8488

@@ -133,7 +137,7 @@ import LogsOutro from '../../../reuse/apps/opentelemetry/send-logs-outro.md';
133137

134138
<LogsOutro/>
135139

136-
## Sample log messages
140+
## Sample log message
137141

138142
```sql
139143
INFO [ScheduledTasks:1] 2023-01-08 09:18:47,347 StatusLogger.java:101 - system.schema_aggregates
@@ -176,7 +180,7 @@ import LogsOutro from '../../../reuse/apps/opentelemetry/send-logs-outro.md';
176180
}
177181
```
178182

179-
## Sample log queries 
183+
## Sample log query 
180184

181185
Following is a query from the Cassandra app's **Cassandra - Overview** dashboard Nodes Up panel:
182186

@@ -191,6 +195,7 @@ Following is a query from the Cassandra app's **Cassandra - Overview** dashboard
191195
```
192196

193197
## Sample metrics query
198+
194199
Following is the query from Cassandra App's overview Dashboard's Number of Requests Panel:
195200

196201
```sql
@@ -205,20 +210,15 @@ The **Cassandra - Overview** dashboard provides an at-a-glance view of Cassandra
205210

206211
Use this dashboard to:
207212

208-
- Identify number of nodes which are up and down
209-
- Gain insights into Memory - Init, used, Max and committed
210-
- Gain insights into the error and warning logs by thread and Node activity
213+
- Identify number of nodes which are up and down.
214+
- Gain insights into Memory - Init, used, Max, and committed.
215+
- Gain insights into the error and warning logs by thread and Node activity.
211216

212217
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Cassandra-OpenTelemetry/Cassandra-Overview.png' alt="Collector" />
213218

214219
### Cache Stats
215220

216-
The **Cassandra - Cache Stats** dashboard provides insight into the database cache status, schedule, and items.
217-
218-
Use this dashboard to:
219-
220-
- Monitor Cache performance.
221-
- Identify Cache usage statistics.
221+
The **Cassandra - Cache Stats** dashboard provides insight into the database cache status, schedule, and items. Use this dashboard to monitor cache performance and identify cache usage statistics.
222222

223223
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Cassandra-OpenTelemetry/Cassandra-Cache-Stats.png' alt="Cache Stats" />
224224

@@ -246,36 +246,50 @@ Use this dashboard to:
246246

247247
### Memtable
248248

249-
The **Cassandra - Memtable** dashboard provides insights into memtable statistics.
250-
251-
Use this dashboard to:
252-
253-
- Review flush activity and memtable status.
249+
The **Cassandra - Memtable** dashboard provides insights into memtable statistics. Use this dashboard to review flush activity and memtable status.
254250

255251
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Cassandra-OpenTelemetry/Cassandra-Memtable.png' alt="Memtable" />
256252

257253
### Resource Usage
258254

259-
The **Cassandra - Resource Usage** dashboard provides details of resource utilization across Cassandra clusters.
260-
261-
Use this dashboard to:
262-
263-
- Identify resource utilization. This can help you to determine whether resources are over-allocated or under-allocated.
255+
The **Cassandra - Resource Usage** dashboard provides details of resource utilization across Cassandra clusters. Use this dashboard to identify resource utilization. This can help you to determine whether resources are over-allocated or under-allocated.
264256

265257
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Cassandra-OpenTelemetry/Cassandra-Resource-Usage-Logs.png' alt="Resource Usage" />
266258

267259
### Compaction
268260

269261
The **Cassandra - Compactions** dashboard provides insight into the completed and pending compaction tasks.
262+
270263
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Cassandra-OpenTelemetry/Cassandra-Compaction.png' alt="Compaction" />
271264

272265
### Requests
273266

274267
The **Cassandra - Requests** dashboard provides insight into the number of request served, number of error request, and their distribution by status and operation. Also you can monitor the read and write latency of the cluster instance using this dashboard.
268+
275269
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Cassandra-OpenTelemetry/Cassandra-Requests.png' alt="Requests" />
276270

277271
### Storage
278272

279273
The **Cassandra - Storage** dashboard provides insight into the current value of total hints of your Cassandra cluster along with storage managed by the cluster.
280274

281275
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Cassandra-OpenTelemetry/Cassandra-Storage.png' alt="Storage" />
276+
277+
## Create monitors for Cassandra app
278+
279+
import CreateMonitors from '../../../reuse/apps/create-monitors.md';
280+
281+
<CreateMonitors/>
282+
283+
### Cassandra alerts
284+
285+
| Name | Description | Alert Condition | Recover Condition |
286+
|:--|:--|:--|:--|
287+
| `Cassandra - Compaction Task Pending` | This alert is triggered when there are more than 15 pending Compaction tasks. | Count > = 15 | Count < 15 |
288+
| `Cassandra - High Hints Backlog` | This alert is triggered when the number of in-progress hints exceeds the given value for 5 minutes. | Count > = 5000 | Count < 5000 |
289+
| `Cassandra - High Memory Usage` | This alert is triggered when memory used exceeds 85% of committed memory for more than 10 minutes. | Count > = 1 | Count < 1 |
290+
| `Cassandra - Node Down Alert` | This alert is triggered when a Cassandra node status changes to DOWN for more than 5 minutes. | Count > = 1 | Count < 1 |
291+
| `Cassandra - Operation Error Rate High` | This alert is triggered when the error rate of operations exceeds given value (Default 5%) for 5 minutes. | Count > 5 | Count < = 5 |
292+
| `Cassandra - Range Query Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of range query latency exceeds the given value (Default 2 seconds) for 5 minutes. | Count > = 2000000 | Count < 2000000 |
293+
| `Cassandra - Read Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of read latency exceeds given value (Default 500ms) for 5 minutes. | Count > = 500000 | Count < 500000 |
294+
| `Cassandra - Storage Growth Rate Abnormal` | This alert is triggered when the storage growth rate exceeds given value (Default 25MB/minute) for 5 minutes. | Count > = 26214400 | Count < 26214400 |
295+
| `Cassandra - Write Latency High (99th Percentile)` | This alert is triggered when the 99th percentile of write latency exceeds given value (Default 200ms) for 5 minutes. | Count > = 200000 | Count < 200000 |

docs/integrations/databases/opentelemetry/memcached-opentelemetry.md

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,20 @@ Memcached logs are sent to Sumo Logic through the OpenTelemetry [filelog receive
1919

2020
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Memcached-OpenTelemetry/Memcached-Schematics.png' alt="Schematics" />
2121

22+
:::info
23+
This app includes [built-in monitors](#memcached-alerts). For details on creating custom monitors, refer to the [Create monitors for Memcached app](#create-monitors-for-memcached-app).
24+
:::
25+
2226
## Fields creation in Sumo Logic for Memcached
2327

2428
Following are the [Fields](/docs/manage/fields/) which will be created as part of Memcached App install if not already present.
2529

2630
- **`sumo.datasource`**. Has a fixed value of **memcached**.
27-
- **`db.system`**. Has a fixed value of **memcached**
28-
- **`deployment.environment`**. User configured. This is the deployment environment where the Memcache cluster resides. For example: dev, prod or qa.
31+
- **`db.system`**. Has a fixed value of **memcached**.
32+
- **`deployment.environment`**. User configured. This is the deployment environment where the Memcache cluster resides. For example: dev, prod, or qa.
2933
- **`db.cluster.name`**. User configured. Enter a name to identify this Memcached cluster. This cluster name will be shown in the Sumo Logic dashboards.
3034
- **`db.node.name`**. This has value of the FQDN of the machine where OpenTelemetry collector is collecting logs and metrics from.
3135

32-
3336
## Prerequisites
3437

3538
1. Configure logging in Memcached: By default, the installation of Memcached will not write any request logs to disk. To add a log file for Memcached, you can use the following syntax:
@@ -221,13 +224,12 @@ Following is the query from Errors panel of Memcached app's overview Dashboard:
221224
| sum(ERROR) as ERROR by _timeslice
222225
```
223226
## Sample metrics queries
224-
**Total Get**
225227

226-
```
228+
```sql title="Total Get"
227229
sumo.datasource=memcached deployment.environment=* db.cluster.name=* db.node.name=* metric=memcached.commands command=get | sum
228230
```
229231

230-
## Viewing Memcached Dashboards
232+
## Viewing the Memcached dashboards
231233

232234
### Overview
233235

@@ -237,7 +239,7 @@ The **Memcached - Overview** dashboard provides an at-a-glance view of the Memca
237239

238240
### Operations
239241

240-
The **Memcached - Operations** Dashboard provides detailed analysis on connections, thread requested, network bytes, table size.
242+
The **Memcached - Operations** Dashboard provides detailed analysis on connections, thread requested, network bytes, and table size.
241243

242244
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Memcached-OpenTelemetry/Memcached-Operations.png' alt="Memcached dashboards" />
243245

@@ -247,7 +249,6 @@ The **Memcached - Command Stats** dashboard provides detailed insights into the
247249

248250
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Memcached-OpenTelemetry/Memcached-Command-Stats.png' alt="Memcached dashboards" />
249251

250-
251252
### Cache Information
252253

253254
The **Memcached - Cache Information** dashboard provides insight into cache states, cache hit, and miss rate over time.
@@ -258,4 +259,21 @@ The **Memcached - Cache Information** dashboard provides insight into cache stat
258259

259260
The **Memcached - Logs** dashboard helps you quickly analyze your Memcached error logs, commands executed, and objects stored.
260261

261-
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Memcached-OpenTelemetry/Memcached-Logs.png' alt="Memcached dashboards" />
262+
<img src='https://sumologic-app-data-v2.s3.amazonaws.com/dashboards/Memcached-OpenTelemetry/Memcached-Logs.png' alt="Memcached dashboards" />
263+
264+
265+
## Create monitors for Memcached app
266+
267+
import CreateMonitors from '../../../reuse/apps/create-monitors.md';
268+
269+
<CreateMonitors/>
270+
271+
### Memcached alerts
272+
273+
| Name | Description | Alert Condition | Recover Condition |
274+
|:--|:--|:--|:--|
275+
| `Memcached - Cache Hit Ratio` | This alert is triggered when low cache hit ratio is less than 50%. The hit rate is one of the most important indicators of Memcached performance. A high hit rate means faster responses to your users. If the hit rate is falling, you need quick visibility into why. | Count < = 50% | Count > 50% |
276+
| `Memcached - Commands Error` | This alert is triggered when Memcached has error commands. | Count > 0 | Count < = 0 |
277+
| `Memcached - Current Connections` | This alert is triggered when current connections to Memcached are zero. | Count < = 0 | Count > 0 |
278+
| `Memcached - High Memory Usage` | This alert is triggered when the Memcached exceed given threshold memory usage (in GB). | Count > 5 | Count < = 5 |
279+
| `Memcached - High Number of Connections` | This alert is triggered when the number of current connection for Memcached exceed given threshold. | Count > = 1000 | Count < 1000 |

0 commit comments

Comments
 (0)