Skip to content

Commit e6ff97f

Browse files
authored
Consistent prometheus metric names and documentation (#8728)
* clean prometheus metrics - add new histogram metrics with consistent names - deprecate summary metrics with inconsistent names * update prometheus metrics tests * remove ingress_upstream_header_seconds metric It hasn't been released so it is safe. Use header_duration_seconds metric. * add documentation on prometheus metrics
1 parent 7cb91ef commit e6ff97f

File tree

4 files changed

+318
-99
lines changed

4 files changed

+318
-99
lines changed

docs/user-guide/cli-arguments.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ They are set in the container spec of the `ingress-nginx-controller` Deployment
3232
| `--ingress-class-by-name` | Define if Ingress Controller should watch for Ingress Class by Name together with Controller Class. (default false) |
3333
| `--internal-logger-address` | Define the internal logger address to use when chroot images is used. (default 127.0.0.1:11514) |
3434
| `--kubeconfig` | Path to a kubeconfig file containing authorization and API server information. |
35+
| `--length-buckets` | Set of buckets which will be used for prometheus histogram metrics such as RequestLength, ResponseLength. (default `[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]`) |
3536
| `--log_backtrace_at` | when logging hits line file:N, emit a stack trace (default :0) |
3637
| `--log_dir` | If non-empty, write log files in this directory |
3738
| `--log_file` | If non-empty, use this log file |
@@ -52,13 +53,15 @@ They are set in the container spec of the `ingress-nginx-controller` Deployment
5253
| `--skip_headers` | If true, avoid header prefixes in the log messages |
5354
| `--skip_log_headers` | If true, avoid headers when opening log files |
5455
| `--ssl-passthrough-proxy-port` | Port to use internally for SSL Passthrough. (default 442) |
56+
| `--size-buckets` | Set of buckets which will be used for prometheus histogram metrics such as BytesSent. (default `[10, 100, 1000, 10000, 100000, 1e+06, 1e+07]`) |
5557
| `--status-port` | Port to use for the lua HTTP endpoint configuration. (default 10246) |
5658
| `--status-update-interval` | Time interval in seconds in which the status should check if an update is required. Default is 60 seconds (default 60) |
5759
| `--stderrthreshold` | logs at or above this threshold go to stderr (default 2) |
5860
| `--stream-port` | Port to use for the lua TCP/UDP endpoint configuration. (default 10247) |
5961
| `--sync-period` | Period at which the controller forces the repopulation of its local object stores. Disabled by default. |
6062
| `--sync-rate-limit` | Define the sync frequency upper limit (default 0.3) |
6163
| `--tcp-services-configmap` | Name of the ConfigMap containing the definition of the TCP services to expose. The key in the map indicates the external port to be used. The value is a reference to a Service in the form "namespace/name:port", where "port" can either be a port number or name. TCP ports 80 and 443 are reserved by the controller for servicing HTTP traffic. |
64+
| `--time-buckets` | Set of buckets which will be used for prometheus histogram metrics such as RequestTime, ResponseTime. (default `[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]`) |
6265
| `--udp-services-configmap` | Name of the ConfigMap containing the definition of the UDP services to expose. The key in the map indicates the external port to be used. The value is a reference to a Service in the form "namespace/name:port", where "port" can either be a port name or number. |
6366
| `--update-status` | Update the load-balancer status of Ingress objects this controller satisfies. Requires setting the publish-service parameter to a valid Service reference. (default true) |
6467
| `--update-status-on-shutdown` | Update the load-balancer status of Ingress objects when the controller shuts down. Requires the update-status parameter. (default true) |

docs/user-guide/monitoring.md

Lines changed: 172 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
1-
# Prometheus and Grafana installation
2-
Two different methods to install and configure Prometheus and Grafana are described in this doc.
3-
- Prometheus and Grafana installation using Pod Annotations. This installs Prometheus and Grafana in the same namespace as NGINX Ingress
4-
- Prometheus and Grafana installation using Service Monitors. This installs Prometheus and Grafana in two different namespaces. This is the preferred method, and helm charts supports this by default.
1+
# Monitoring
2+
3+
Two different methods to install and configure Prometheus and Grafana are described in this doc.
4+
* Prometheus and Grafana installation using Pod Annotations. This installs Prometheus and Grafana in the same namespace as NGINX Ingress
5+
* Prometheus and Grafana installation using Service Monitors. This installs Prometheus and Grafana in two different namespaces. This is the preferred method, and helm charts supports this by default.
6+
7+
## Prometheus and Grafana installation using Pod Annotations
58

6-
## PROMETHEUS AND GRAFANA INSTALLATION USING POD ANNOTATIONS
79
This tutorial will show you how to install [Prometheus](https://prometheus.io/) and [Grafana](https://grafana.com/) for scraping the metrics of the NGINX Ingress controller.
810

911
!!! important
@@ -168,7 +170,7 @@ According to the above example, this URL will be http://10.192.0.3:31086
168170
- By default request metrics are labeled with the hostname. When you have a wildcard domain ingress, then there will be no metrics for that ingress (to prevent the metrics from exploding in cardinality). To get metrics in this case you need to run the ingress controller with `--metrics-per-host=false` (you will lose labeling by hostname, but still have labeling by ingress).
169171

170172
### Grafana dashboard using ingress resource
171-
- If you want to expose the dashboard for grafana using a ingress resource, then you can :
173+
- If you want to expose the dashboard for grafana using a ingress resource, then you can :
172174
- change the service type of the prometheus-server service and the grafana service to "ClusterIP" like this :
173175
```
174176
kubectl -n ingress-nginx edit svc grafana
@@ -179,18 +181,18 @@ According to the above example, this URL will be http://10.192.0.3:31086
179181
- create a ingress resource with backend as "grafana" and port as "3000"
180182
- Similarly, you can edit the service "prometheus-server" and add a ingress resource.
181183
182-
## PROMETHEUS AND GRAFANA INSTALLATION USING SERVICE MONITORS
183-
This document assumes you're using helm and using the kube-prometheus-stack package to install Prometheus and Grafana.
184+
## Prometheus and Grafana installation using Service Monitors
185+
This document assumes you're using helm and using the kube-prometheus-stack package to install Prometheus and Grafana.
184186
185187
### Verify NGINX Ingress controller is installed
186188
187189
- The NGINX Ingress controller should already be deployed according to the deployment instructions [here](../deploy/index.md).
188190
189-
- To check if Ingress controller is deployed,
191+
- To check if Ingress controller is deployed,
190192
```
191-
kubectl get pods -n ingress-nginx
193+
kubectl get pods -n ingress-nginx
192194
```
193-
- The result should look something like:
195+
- The result should look something like:
194196
```
195197
NAME READY STATUS RESTARTS AGE
196198
ingress-nginx-controller-7c489dc7b7-ccrf6 1/1 Running 0 19h
@@ -205,8 +207,8 @@ This document assumes you're using helm and using the kube-prometheus-stack pack
205207
```
206208
```
207209
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
208-
ingress-nginx ingress-nginx 10 2022-01-20 18:08:55.267373 -0800 PST deployed ingress-nginx-4.0.16 1.1.1
209-
prometheus prometheus 1 2022-01-20 16:07:25.086828 -0800 PST deployed kube-prometheus-stack-30.1.0 0.53.1
210+
ingress-nginx ingress-nginx 10 2022-01-20 18:08:55.267373 -0800 PST deployed ingress-nginx-4.0.16 1.1.1
211+
prometheus prometheus 1 2022-01-20 16:07:25.086828 -0800 PST deployed kube-prometheus-stack-30.1.0 0.53.1
210212
```
211213
- Notice that prometheus is installed in a differenet namespace than ingress-nginx
212214
@@ -218,9 +220,9 @@ This document assumes you're using helm and using the kube-prometheus-stack pack
218220
```
219221
controller.metrics.enabled=true
220222
controller.metrics.serviceMonitor.enabled=true
221-
controller.metrics.serviceMonitor.additionalLabels.release="prometheus"
223+
controller.metrics.serviceMonitor.additionalLabels.release="prometheus"
222224
```
223-
- The easiest way of doing this is to helm upgrade
225+
- The easiest way of doing this is to helm upgrade
224226
```
225227
helm upgrade ingress-nginx ingress-nginx/ingress-nginx \
226228
--namespace ingress-nginx \
@@ -248,7 +250,7 @@ This document assumes you're using helm and using the kube-prometheus-stack pack
248250
- Since Prometheus is running in a different namespace and not in the ingress-nginx namespace, it would not be able to discover ServiceMonitors in other namespaces when installed. Reconfigure your kube-prometheus-stack Helm installation to set `serviceMonitorSelectorNilUsesHelmValues` flag to false. By default, Prometheus only discovers PodMonitors within its own namespace. This should be disabled by setting `podMonitorSelectorNilUsesHelmValues` to false
249251
- The configurations required are:
250252
```
251-
prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false
253+
prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false
252254
prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
253255
```
254256
- The easiest way of doing this is to use `helm upgrade ...`
@@ -271,12 +273,12 @@ This document assumes you're using helm and using the kube-prometheus-stack pack
271273
```
272274
273275
### Connect and view Prometheus dashboard
274-
- Port forward to Prometheus service. Find out the name of the prometheus service by using the following command:
276+
- Port forward to Prometheus service. Find out the name of the prometheus service by using the following command:
275277
```
276278
kubectl get svc -n prometheus
277279
```
278280
279-
The result of this command would look like:
281+
The result of this command would look like:
280282
```
281283
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
282284
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 7h46m
@@ -292,22 +294,22 @@ This document assumes you're using helm and using the kube-prometheus-stack pack
292294
```
293295
kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n prometheus 9090:9090
294296
```
295-
When you run the above command, you should see something like:
297+
When you run the above command, you should see something like:
296298
```
297299
Forwarding from 127.0.0.1:9090 -> 9090
298300
Forwarding from [::1]:9090 -> 9090
299301
```
300302
- Open your browser and visit the following URL http://localhost:{port-forwarded-port} according to the above example it would be, http://localhost:9090
301-
303+
302304
![Prometheus Dashboard](../images/prometheus-dashboard1.png)
303305
304-
### Connect and view Grafana dashboard
305-
- Port forward to Grafana service. Find out the name of the Grafana service by using the following command:
306+
### Connect and view Grafana dashboard
307+
- Port forward to Grafana service. Find out the name of the Grafana service by using the following command:
306308
```
307309
kubectl get svc -n prometheus
308310
```
309311
310-
The result of this command would look like:
312+
The result of this command would look like:
311313
```
312314
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
313315
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 7h46m
@@ -323,7 +325,7 @@ This document assumes you're using helm and using the kube-prometheus-stack pack
323325
```
324326
kubectl port-forward svc/prometheus-grafana 3000:80 -n prometheus
325327
```
326-
When you run the above command, you should see something like:
328+
When you run the above command, you should see something like:
327329
```
328330
Forwarding from 127.0.0.1:3000 -> 3000
329331
Forwarding from [::1]:3000 -> 3000
@@ -345,4 +347,149 @@ This document assumes you're using helm and using the kube-prometheus-stack pack
345347
- Click "Import"
346348
347349
![Grafana Dashboard](../images/grafana-dashboard1.png)
348-
350+
351+
352+
## Exposed metrics
353+
354+
Prometheus metrics are exposed on port 10254.
355+
356+
### Request metrics
357+
358+
* `nginx_ingress_controller_request_duration_seconds` Histogram
359+
360+
The request processing time in milliseconds (affected by client speed)
361+
362+
nginx var: `request_time`
363+
364+
* `nginx_ingress_controller_response_duration_seconds` Histogram
365+
366+
The time spent on receiving the response from the upstream server (affected by client speed)
367+
368+
nginx var: `upstream_response_time`
369+
370+
* `nginx_ingress_controller_header_duration_seconds` Histogram
371+
372+
The time spent on receiving first header from the upstream server
373+
374+
nginx var: `upstream_header_time`
375+
376+
* `nginx_ingress_controller_connect_duration_seconds` Histogram
377+
378+
The time spent on establishing a connection with the upstream server
379+
380+
nginx var: `upstream_connect_time`
381+
382+
* `nginx_ingress_controller_response_size` Histogram
383+
384+
The response length (including request line, header, and request body)
385+
386+
nginx var: `bytes_sent`
387+
388+
* `nginx_ingress_controller_request_size` Histogram
389+
390+
The request length (including request line, header, and request body)
391+
392+
nginx var: `request_length`
393+
394+
* `nginx_ingress_controller_requests` Counter
395+
396+
The total number of client requests
397+
398+
* `nginx_ingress_controller_bytes_sent` Histogram
399+
400+
The number of bytes sent to a client. **Deprecated**, use `nginx_ingress_controller_response_size`
401+
402+
nginx var: `bytes_sent`
403+
404+
* `nginx_ingress_controller_ingress_upstream_latency_seconds` Summary
405+
406+
Upstream service latency per Ingress. **Deprecated**, use `nginx_ingress_controller_connect_duration_seconds`
407+
408+
nginx var: `upstream_connect_time`
409+
410+
```
411+
# HELP nginx_ingress_controller_bytes_sent The number of bytes sent to a client. DEPRECATED! Use nginx_ingress_controller_response_size
412+
# TYPE nginx_ingress_controller_bytes_sent histogram
413+
# HELP nginx_ingress_controller_connect_duration_seconds The time spent on establishing a connection with the upstream server
414+
# TYPE nginx_ingress_controller_connect_duration_seconds nginx_ingress_controller_connect_duration_seconds
415+
* HELP nginx_ingress_controller_header_duration_seconds The time spent on receiving first header from the upstream server
416+
# TYPE nginx_ingress_controller_header_duration_seconds histogram
417+
# HELP nginx_ingress_controller_ingress_upstream_latency_seconds Upstream service latency per Ingress DEPRECATED! Use nginx_ingress_controller_connect_duration_seconds
418+
# TYPE nginx_ingress_controller_ingress_upstream_latency_seconds summary
419+
# HELP nginx_ingress_controller_request_duration_seconds The request processing time in milliseconds
420+
# TYPE nginx_ingress_controller_request_duration_seconds histogram
421+
# HELP nginx_ingress_controller_request_size The request length (including request line, header, and request body)
422+
# TYPE nginx_ingress_controller_request_size histogram
423+
# HELP nginx_ingress_controller_requests The total number of client requests.
424+
# TYPE nginx_ingress_controller_requests counter
425+
# HELP nginx_ingress_controller_response_duration_seconds The time spent on receiving the response from the upstream server
426+
# TYPE nginx_ingress_controller_response_duration_seconds histogram
427+
# HELP nginx_ingress_controller_response_size The response length (including request line, header, and request body)
428+
# TYPE nginx_ingress_controller_response_size histogram
429+
```
430+
431+
432+
### Nginx process metrics
433+
```
434+
# HELP nginx_ingress_controller_nginx_process_connections current number of client connections with state {active, reading, writing, waiting}
435+
# TYPE nginx_ingress_controller_nginx_process_connections gauge
436+
# HELP nginx_ingress_controller_nginx_process_connections_total total number of connections with state {accepted, handled}
437+
# TYPE nginx_ingress_controller_nginx_process_connections_total counter
438+
# HELP nginx_ingress_controller_nginx_process_cpu_seconds_total Cpu usage in seconds
439+
# TYPE nginx_ingress_controller_nginx_process_cpu_seconds_total counter
440+
# HELP nginx_ingress_controller_nginx_process_num_procs number of processes
441+
# TYPE nginx_ingress_controller_nginx_process_num_procs gauge
442+
# HELP nginx_ingress_controller_nginx_process_oldest_start_time_seconds start time in seconds since 1970/01/01
443+
# TYPE nginx_ingress_controller_nginx_process_oldest_start_time_seconds gauge
444+
# HELP nginx_ingress_controller_nginx_process_read_bytes_total number of bytes read
445+
# TYPE nginx_ingress_controller_nginx_process_read_bytes_total counter
446+
# HELP nginx_ingress_controller_nginx_process_requests_total total number of client requests
447+
# TYPE nginx_ingress_controller_nginx_process_requests_total counter
448+
# HELP nginx_ingress_controller_nginx_process_resident_memory_bytes number of bytes of memory in use
449+
# TYPE nginx_ingress_controller_nginx_process_resident_memory_bytes gauge
450+
# HELP nginx_ingress_controller_nginx_process_virtual_memory_bytes number of bytes of memory in use
451+
# TYPE nginx_ingress_controller_nginx_process_virtual_memory_bytes gauge
452+
# HELP nginx_ingress_controller_nginx_process_write_bytes_total number of bytes written
453+
# TYPE nginx_ingress_controller_nginx_process_write_bytes_total counter
454+
```
455+
456+
### Controller metrics
457+
```
458+
# HELP nginx_ingress_controller_build_info A metric with a constant '1' labeled with information about the build.
459+
# TYPE nginx_ingress_controller_build_info gauge
460+
# HELP nginx_ingress_controller_check_success Cumulative number of Ingress controller syntax check operations
461+
# TYPE nginx_ingress_controller_check_success counter
462+
# HELP nginx_ingress_controller_config_hash Running configuration hash actually running
463+
# TYPE nginx_ingress_controller_config_hash gauge
464+
# HELP nginx_ingress_controller_config_last_reload_successful Whether the last configuration reload attempt was successful
465+
# TYPE nginx_ingress_controller_config_last_reload_successful gauge
466+
# HELP nginx_ingress_controller_config_last_reload_successful_timestamp_seconds Timestamp of the last successful configuration reload.
467+
# TYPE nginx_ingress_controller_config_last_reload_successful_timestamp_seconds gauge
468+
# HELP nginx_ingress_controller_ssl_certificate_info Hold all labels associated to a certificate
469+
# TYPE nginx_ingress_controller_ssl_certificate_info gauge
470+
# HELP nginx_ingress_controller_success Cumulative number of Ingress controller reload operations
471+
# TYPE nginx_ingress_controller_success counter
472+
```
473+
474+
### Admission metrics
475+
```
476+
# HELP nginx_ingress_controller_admission_config_size The size of the tested configuration
477+
# TYPE nginx_ingress_controller_admission_config_size gauge
478+
# HELP nginx_ingress_controller_admission_render_duration The processing duration of ingresses rendering by the admission controller (float seconds)
479+
# TYPE nginx_ingress_controller_admission_render_duration gauge
480+
# HELP nginx_ingress_controller_admission_render_ingresses The length of ingresses rendered by the admission controller
481+
# TYPE nginx_ingress_controller_admission_render_ingresses gauge
482+
# HELP nginx_ingress_controller_admission_roundtrip_duration The complete duration of the admission controller at the time to process a new event (float seconds)
483+
# TYPE nginx_ingress_controller_admission_roundtrip_duration gauge
484+
# HELP nginx_ingress_controller_admission_tested_duration The processing duration of the admission controller tests (float seconds)
485+
# TYPE nginx_ingress_controller_admission_tested_duration gauge
486+
# HELP nginx_ingress_controller_admission_tested_ingresses The length of ingresses processed by the admission controller
487+
# TYPE nginx_ingress_controller_admission_tested_ingresses gauge
488+
```
489+
490+
### Histogram buckets
491+
492+
You can configure buckets for histogram metrics using these command line options (here are their default values):
493+
* `--time-buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]`
494+
* `--length-buckets=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]`
495+
* `--size-buckets=[10, 100, 1000, 10000, 100000, 1e+06, 1e+07]`

0 commit comments

Comments
 (0)