@@ -9837,6 +9837,86 @@ layers:
9837
9837
derivative: NON_NEGATIVE_DERIVATIVE
9838
9838
how_to_use: This metric measures the length of time, in seconds, that the CockroachDB process has been running. Monitor this metric to detect events such as node restarts, which may require investigation or intervention.
9839
9839
essential: true
9840
+ - name: NETWORKING
9841
+ metrics:
9842
+ - name: sys.host.net.send.tcp.fast_retrans_segs
9843
+ exported_name: sys_host_net_send_tcp_fast_retrans_segs
9844
+ description: |-
9845
+ Segments retransmitted due to the fast retransmission mechanism in TCP.
9846
+ Fast retransmissions occur when the sender learns that intermediate segments have been lost.
9847
+ y_axis_label: Segments
9848
+ type: COUNTER
9849
+ unit: COUNT
9850
+ aggregation: AVG
9851
+ derivative: NON_NEGATIVE_DERIVATIVE
9852
+ - name: sys.host.net.send.tcp.loss_probes
9853
+ exported_name: sys_host_net_send_tcp_loss_probes
9854
+ description: |2-
9855
+
9856
+ Number of TCP tail loss probes sent. Loss probes are an optimization to detect
9857
+ loss of the last packet earlier than the retransmission timer, and can indicate
9858
+ network issues. Tail loss probes are aggressive, so the base rate is often nonzero
9859
+ even in healthy networks.
9860
+ y_axis_label: Probes
9861
+ type: COUNTER
9862
+ unit: COUNT
9863
+ aggregation: AVG
9864
+ derivative: NON_NEGATIVE_DERIVATIVE
9865
+ - name: sys.host.net.send.tcp.retrans_segs
9866
+ exported_name: sys_host_net_send_tcp_retrans_segs
9867
+ description: |2
9868
+
9869
+ The number of TCP segments retransmitted across all network interfaces.
9870
+ This can indicate packet loss occurring in the network. However, it can
9871
+ also be caused by recipient nodes not consuming packets in a timely manner,
9872
+ or the local node overflowing its outgoing buffers, for example due to overload.
9873
+
9874
+ Retransmissions also occur in the absence of problems, as modern TCP stacks
9875
+ err on the side of aggressively retransmitting segments.
9876
+
9877
+ The linux tool 'ss -i' can show the Linux kernel's smoothed view of round-trip
9878
+ latency and variance on a per-connection basis. Additionally, 'netstat -s'
9879
+ shows all TCP counters maintained by the kernel.
9880
+ y_axis_label: Segments
9881
+ type: COUNTER
9882
+ unit: COUNT
9883
+ aggregation: AVG
9884
+ derivative: NON_NEGATIVE_DERIVATIVE
9885
+ how_to_use: |2
9886
+
9887
+ Phase changes, especially when occurring on groups of nodes, can indicate packet
9888
+ loss in the network or a slow consumer of packets. On slow consumers, the
9889
+ 'sys.host.net.rcvd.drop' metric may be elevated; on overloaded senders, it
9890
+ is worth checking the 'sys.host.net.send.drop' metric.
9891
+ Additionally, the 'sys.host.net.send.tcp.*' may provide more insight into the
9892
+ specific type of retransmission.
9893
+ essential: true
9894
+ - name: sys.host.net.send.tcp.slow_start_retrans
9895
+ exported_name: sys_host_net_send_tcp_slow_start_retrans
9896
+ description: |2
9897
+
9898
+ Number of TCP retransmissions in slow start. This can indicate that the network
9899
+ is unable to support the initial fast ramp-up in window size, and can be a sign
9900
+ of packet loss or congestion.
9901
+ y_axis_label: Segments
9902
+ type: COUNTER
9903
+ unit: COUNT
9904
+ aggregation: AVG
9905
+ derivative: NON_NEGATIVE_DERIVATIVE
9906
+ - name: sys.host.net.send.tcp_timeouts
9907
+ exported_name: sys_host_net_send_tcp_timeouts
9908
+ description: |2
9909
+
9910
+ Number of TCP retransmission timeouts. These typically imply that a packet has
9911
+ not been acknowledged within at least 200ms. Modern TCP stacks use
9912
+ optimizations such as fast retransmissions and loss probes to avoid hitting
9913
+ retransmission timeouts. Anecdotally, they still occasionally present themselves
9914
+ even in supposedly healthy cloud environments.
9915
+ y_axis_label: Timeouts
9916
+ type: COUNTER
9917
+ unit: COUNT
9918
+ aggregation: AVG
9919
+ derivative: NON_NEGATIVE_DERIVATIVE
9840
9920
- name: UNSET
9841
9921
metrics:
9842
9922
- name: build.timestamp
0 commit comments