-
Notifications
You must be signed in to change notification settings - Fork 70
Closed
grafana/mimir
#3052Description
Prometheus analyzing process is always crashing after long run at random queries. There are a few workarounds to handle this situation in better way:
- better resiliency by applying retry/timeout per query
- continue to next query in case an error thrown
$ cortextool analyse prometheus ...
command throws an exception during the analyzing process as such:
...
DEBU[0218] additional repository_duration_seconds_bucket 900
DEBU[0218] additional repository_duration_seconds_count 75
DEBU[0218] additional repository_duration_seconds_sum 75
cortextool: error: error querying count by (job) (request_duration_seconds_bucket): server_error: server error: 503, try --help
It throws 503
error but actually it returns 200
response:
$ curl <ADDR>/api/v1/query?query=count%20by%20(job)%20(consul_k8s_p_beholder_p2_1venus_worker_64_runtime_sys_bytes)
# 200 OK
Similar to $ cortextool analyse grafana ...
command, we can continue to querying Prometheus and list the errors in a custom variable like query_errors
as we already do in the grafana by defining a parse_errors
field.
$ cortextool analyse grafana --address <ADDR> --key <KEY>
unmarshal board: json: cannot unmarshal object into Go struct field Current.templating.list.current.text of type []string for MJvznCp7z Prometheus / Remote Write
eminaktas, ryanprobus and m4vr0x
Metadata
Metadata
Assignees
Labels
No labels