-
Notifications
You must be signed in to change notification settings - Fork 170
Closed
Labels
Description
Description
- Everything is working fine at the beginning, but after some time(10 hours to 10 days depends on the “collection_interval”) the metrics from Jvm/Cassandra(for example “cassandra.client.request.range_slice.latency.99p”) stops, but all the otel internal metrics continue working(for example “otelcol_process_uptime”)
- Running the second collector manually while is first one is in the “error” state works(In other words, we can see Jvm/Cassandra metrics from the second collector but not from the first one even if they are running in the same docker container)
Steps to reproduce
Deploy and then wait
Expectation
Jvm/Cassandra metrics continue flowing
What applicable config did you use?
---
receivers:
jmx:
jar_path: "/refinery/opentelemetry-jmx-metrics.jar"
endpoint: localhost:7199
target_system: cassandra,jvm
collection_interval: 3s
log_level: debug
prometheus/internal:
config:
scrape_configs:
- job_name: 'refinery-internal-metrics'
scrape_interval: 10s
static_configs:
- targets: [ 'localhost:8888' ]
metric_relabel_configs:
- source_labels: [ __name__ ]
regex: '.*grpc_io.*'
action: drop
exporters:
myexporter:
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
myexporter:
host: "myexporter.net"
port: "9443"
enable_mtls: true
root_path: /etc/identity
repo_dir_path: /etc/identity/client
service_name: client
gzip: true
processors:
netmetadata:
metrics:
scopes:
service: refinery_tested
subservice: "cassandra"
tags:
version: "1"
k8s_pod_name: "test-cass-alrt-eap-c02-0"
k8s_namespace: "dva-system"
k8s_cluster: "collection-monitoring"
device: "ip-10-11-11-11.us-west-2.compute.internal"
substrate: "aws"
account: "00000"
region: "unknown"
zone: "us-west-2b"
falcon_instance: "dev1-uswest2"
functional_domain: "monitoring"
functional_domain_instance: "monitoring"
environment: "dev1"
environment_type: "dev"
cell: "c02"
service_name: "test-cass-alrt-eap"
service_group: "test-shared"
service_instance: "test-cass-alrt-eap-c02"
memory_limiter/with-settings:
check_interval: 1s
limit_mib: 2000
spike_limit_mib: 400
limit_percentage: 0
spike_limit_percentage: 0
batch:
timeout: 5s
send_batch_size: 8192
send_batch_max_size: 0
service:
extensions: []
telemetry:
logs:
development: false
level: debug
metrics:
level: detailed
address: localhost:8888
pipelines:
metrics:
receivers: ["jmx"]
processors: [memory_limiter/with-settings, batch, netmetadata]
exporters: [myexporter]
metrics/internal:
receivers: ["prometheus/internal"]
processors: [memory_limiter/with-settings, batch, netmetadata]
exporters: [myexporter]Relevant Environment Information
NAME="CentOS Linux" VERSION="7 (Core)" ID="centos"
Additional context