-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Labels
questionlabel for questions asked by userslabel for questions asked by users
Description
Current Behavior
We deploy apisix in K8s cluster and have problem with prometheus metrics.
We noticed that lua_shared_dict prometheus-metrics overflows, then the number of apisix_nginx_metric_errors_total errors starts to grow and all metrics stop displaying correctly.
We try increase the prometheus-metrics parameter to 40m in the ConfigMap (config.yaml), but after 2 months this lua_shared_dict was full on all pods and errors started to occur again.
nginx_config: # config for render the template to genarate nginx.conf
error_log: "/dev/stderr"
error_log_level: "warn" # warn,error
worker_processes: "auto"
enable_cpu_affinity: true
worker_rlimit_nofile: 20480 # the number of files a worker process can open, should be larger than worker_connections
event:
worker_connections: 10620
http:
enable_access_log: true
access_log: "/dev/stdout"
access_log_format: '$remote_addr - $remote_user [$time_local] $http_host \"$request\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time \"$upstream_scheme://$upstream_host$upstream_uri\"'
access_log_format_escape: default
keepalive_timeout: "60s"
client_header_timeout: 60s # timeout for reading client request header, then 408 (Request Time-out) error is returned to the client
client_body_timeout: 60s # timeout for reading client request body, then 408 (Request Time-out) error is returned to the client
send_timeout: 10s # timeout for transmitting a response to the client.then the connection is closed
underscores_in_headers: "on" # default enables the use of underscores in client request header fields
real_ip_header: "X-Real-IP" # http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
real_ip_from: # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
- 127.0.0.1
- 'unix:'
lua_shared_dict:
prometheus-metrics: 40mCurrent Apisix state
- Deployment via Helm chart: https://github.com/apache/apisix-helm-chart
- Helm Chart version: 2.10.0
- K8s pods: 3
- Pod CPU limits: 15 (usage 4%)
- Pod Memory limits: 60Gb (usage 35 GiB)
- Total requests per second: 2500 - 3000
- Active connections: 2000+
- Upstreams: 100+
- Routes: 120+
- Consumers: 60+
- Plugins: basic-auth and kafka-logger on all routes
Expected Behavior
No response
Error Logs
No response
Steps to Reproduce
- Run apisix with default lua_shared_dict: prometheus-metrics
- After 2-3 weeks prometheus-metrics overflows and apisix_nginx_metric_errors_total errors starts to grow and all metrics stop displaying correctly
- Change lua_shared_dict: prometheus-metrics to 40m
- After 2-3 months lua_shared_dict overflows again and we get a similar problem with displaying metrics
Environment
- APISIX version (run
apisix version): 3.10.0 - Operating system (run
uname -a): Linux apisix-69cfdc5fbf-m7k27 5.14.0-362.13.1.el9_3.x86_64 SMP PREEMPT_DYNAMIC Fri Nov 24 01:57:57 EST 2023 x86_64 GNU/Linux - OpenResty / Nginx version (run
openresty -Vornginx -V): openresty/1.25.3.2 - etcd version, if relevant (run
curl http://127.0.0.1:9090/v1/server_info): 3.5.0 - APISIX Dashboard version, if relevant: 3.0.0
- Plugin runner version, for issues related to plugin runners:
- LuaRocks version, for installation issues (run
luarocks --version):
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionlabel for questions asked by userslabel for questions asked by users
Type
Projects
Status
✅ Done

