Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 67 additions & 6 deletions src/current/_data/v24.3/metrics/metrics-list.csv
Original file line number Diff line number Diff line change
Expand Up @@ -762,6 +762,8 @@ have a good estimate for this information for all of its followers, and since
followers are expected to be behind (when they are not required as part of a
quorum) *and* the aggregate thus scales like the count of such followers, it is
difficult to meaningfully interpret this metric.",Log Entries,GAUGE,COUNT,AVG,NONE
STORAGE,raftlog.size.max,Approximate size of the largest Raft log on the store.,Bytes,GAUGE,BYTES,AVG,NONE
STORAGE,raftlog.size.total,Approximate size of all Raft logs on the store.,Bytes,GAUGE,BYTES,AVG,NONE
STORAGE,raftlog.truncated,Number of Raft log entries truncated,Log Entries,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
STORAGE,range.adds,Number of range additions,Range Ops,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
STORAGE,range.merges,Number of range merges,Range Ops,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
Expand Down Expand Up @@ -1252,7 +1254,7 @@ APPLICATION,changefeed.frontier_updates,Number of change frontier updates across
APPLICATION,changefeed.internal_retry_message_count,Number of messages for which an attempt to retry them within an aggregator node was made,Messages,GAUGE,COUNT,AVG,NONE
APPLICATION,changefeed.kafka_throttling_hist_nanos,Time spent in throttling due to exceeding kafka quota,Nanoseconds,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,changefeed.lagging_ranges,The number of ranges considered to be lagging behind,Ranges,GAUGE,COUNT,AVG,NONE
APPLICATION,changefeed.max_behind_nanos,(Deprecated in favor of checkpoint_progress) The most any changefeed's persisted checkpoint is behind the present,Nanoseconds,GAUGE,NANOSECONDS,AVG,NONE
APPLICATION,changefeed.max_behind_nanos,The most any changefeed's persisted checkpoint is behind the present,Nanoseconds,GAUGE,NANOSECONDS,AVG,NONE
APPLICATION,changefeed.message_size_hist,Message size histogram,Bytes,HISTOGRAM,BYTES,AVG,NONE
APPLICATION,changefeed.messages.messages_pushback_nanos,Total time spent throttled for messages quota,Nanoseconds,COUNTER,NANOSECONDS,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,changefeed.network.bytes_in,The number of bytes received from the network by changefeeds,Bytes,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
Expand Down Expand Up @@ -1355,6 +1357,7 @@ APPLICATION,distsender.rangefeed.retry.replica_removed,Number of ranges that enc
APPLICATION,distsender.rangefeed.retry.send,Number of ranges that encountered retryable send error,Ranges,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,distsender.rangefeed.retry.slow_consumer,Number of ranges that encountered retryable SLOW_CONSUMER error,Ranges,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,distsender.rangefeed.retry.store_not_found,Number of ranges that encountered retryable store not found error,Ranges,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,distsender.rangefeed.retry.unknown,Number of ranges that encountered retryable unknown error,Ranges,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,distsender.rangefeed.total_ranges,"Number of ranges executing rangefeed

This counts the number of ranges with an active rangefeed.
Expand Down Expand Up @@ -2210,6 +2213,7 @@ APPLICATION,jobs.row_level_ttl.fail_or_cancel_completed,Number of row_level_ttl
APPLICATION,jobs.row_level_ttl.fail_or_cancel_failed,Number of row_level_ttl jobs which failed with a non-retriable error on their failure or cancelation process,jobs,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,jobs.row_level_ttl.fail_or_cancel_retry_error,Number of row_level_ttl jobs which failed with a retriable error on their failure or cancelation process,jobs,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,jobs.row_level_ttl.num_active_spans,Number of active spans the TTL job is deleting from.,num_active_spans,GAUGE,COUNT,AVG,NONE
APPLICATION,jobs.row_level_ttl.num_delete_batch_retries,Number of times the row level TTL job had to reduce the delete batch size and retry.,num_retries,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,jobs.row_level_ttl.protected_age_sec,The age of the oldest PTS record protected by row_level_ttl jobs,seconds,GAUGE,SECONDS,AVG,NONE
APPLICATION,jobs.row_level_ttl.protected_record_count,Number of protected timestamp records held by row_level_ttl jobs,records,GAUGE,COUNT,AVG,NONE
APPLICATION,jobs.row_level_ttl.resume_completed,Number of row_level_ttl jobs which successfully resumed to completion,jobs,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
Expand Down Expand Up @@ -2288,16 +2292,16 @@ APPLICATION,kv.protectedts.reconciliation.records_processed,number of records pr
APPLICATION,kv.protectedts.reconciliation.records_removed,number of records removed during reconciliation runs on this node,Count,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.batch_hist_nanos,Time spent flushing a batch,Nanoseconds,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,logical_replication.catchup_ranges,Source side ranges undergoing catch up scans (inaccurate with multiple LDR jobs),Ranges,GAUGE,COUNT,AVG,NONE
APPLICATION,logical_replication.catchup_ranges_by_label,Source side ranges undergoing catch up scans,Ranges,GAUGE,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.catchup_ranges_by_label,Source side ranges undergoing catch up scans,Ranges,GAUGE,COUNT,AVG,NONE
APPLICATION,logical_replication.checkpoint_events_ingested,Checkpoint events ingested by all replication jobs,Events,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.commit_latency,"Event commit latency: a difference between event MVCC timestamp and the time it was flushed into disk. If we batch events, then the difference between the oldest event in the batch and flush is recorded",Nanoseconds,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,logical_replication.events_dlqed,Row update events sent to DLQ,Failures,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.events_dlqed_age,Row update events sent to DLQ due to reaching the maximum time allowed in the retry queue,Failures,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.events_dlqed_by_label,Row update events sent to DLQ by label,Failures,GAUGE,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.events_dlqed_by_label,Row update events sent to DLQ by label,Failures,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.events_dlqed_errtype,Row update events sent to DLQ due to an error not considered retryable,Failures,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.events_dlqed_space,Row update events sent to DLQ due to capacity of the retry queue,Failures,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.events_ingested,Events ingested by all replication jobs,Events,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.events_ingested_by_label,Events ingested by all replication jobs by label,Events,GAUGE,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.events_ingested_by_label,Events ingested by all replication jobs by label,Events,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.events_initial_failure,Failed attempts to apply an incoming row update,Failures,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.events_initial_success,Successful applications of an incoming row update,Failures,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.events_retry_failure,Failed re-attempts to apply a row update,Failures,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
Expand All @@ -2306,12 +2310,12 @@ APPLICATION,logical_replication.kv.update_too_old,Total number of updates that w
APPLICATION,logical_replication.kv.value_refreshes,Total number of batches that refreshed the previous value,Events,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.logical_bytes,Logical bytes (sum of keys + values) received by all replication jobs,Bytes,COUNTER,BYTES,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.replan_count,Total number of dist sql replanning events,Events,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.replicated_time_by_label,Replicated time of the logical replication stream by label,Seconds,GAUGE,SECONDS,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.replicated_time_by_label,Replicated time of the logical replication stream by label,Seconds,GAUGE,SECONDS,AVG,NONE
APPLICATION,logical_replication.replicated_time_seconds,The replicated time of the logical replication stream in seconds since the unix epoch.,Seconds,GAUGE,SECONDS,AVG,NONE
APPLICATION,logical_replication.retry_queue_bytes,The replicated time of the logical replication stream in seconds since the unix epoch.,Bytes,GAUGE,BYTES,AVG,NONE
APPLICATION,logical_replication.retry_queue_events,The replicated time of the logical replication stream in seconds since the unix epoch.,Events,GAUGE,COUNT,AVG,NONE
APPLICATION,logical_replication.scanning_ranges,Source side ranges undergoing an initial scan (inaccurate with multiple LDR jobs),Ranges,GAUGE,COUNT,AVG,NONE
APPLICATION,logical_replication.scanning_ranges_by_label,Source side ranges undergoing an initial scan,Ranges,GAUGE,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,logical_replication.scanning_ranges_by_label,Source side ranges undergoing an initial scan,Ranges,GAUGE,COUNT,AVG,NONE
APPLICATION,obs.tablemetadata.update_job.duration,Time spent running the update table metadata job.,Duration,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,obs.tablemetadata.update_job.errors,The total number of errors that have been emitted from the update table metadata job.,Errors,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,obs.tablemetadata.update_job.runs,The total number of runs of the update table metadata job.,Executions,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
Expand All @@ -2333,6 +2337,10 @@ Note that this is not a good signal for KV health. The remote side of the
RPCs tracked here may experience contention, so an end user can easily
cause values for this metric to be emitted by leaving a transaction open
for a long time and contending with it using a second transaction.",Requests,GAUGE,COUNT,AVG,NONE
APPLICATION,round-trip-default-class-latency,"Distribution of round-trip latencies with other nodes.

Similar to round-trip-latency, but only for default class connections.
",Round-trip time,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,round-trip-latency,"Distribution of round-trip latencies with other nodes.

This only reflects successful heartbeats and measures gRPC overhead as well as
Expand All @@ -2343,6 +2351,18 @@ metrics such as packet loss, retransmits, etc, to conclusively diagnose network
issues. Heartbeats are not very frequent (~seconds), so they may not capture
rare or short-lived degradations.
",Round-trip time,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,round-trip-raft-class-latency,"Distribution of round-trip latencies with other nodes.

Similar to round-trip-latency, but only for raft class connections.
",Round-trip time,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,round-trip-rangefeed-class-latency,"Distribution of round-trip latencies with other nodes.

Similar to round-trip-latency, but only for rangefeed class connections.
",Round-trip time,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,round-trip-system-class-latency,"Distribution of round-trip latencies with other nodes.

Similar to round-trip-latency, but only for system class connections.
",Round-trip time,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,rpc.client.bytes.egress,Counter of TCP bytes sent via gRPC on connections we initiated.,Bytes,COUNTER,BYTES,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,rpc.client.bytes.ingress,Counter of TCP bytes received via gRPC on connections we initiated.,Bytes,COUNTER,BYTES,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,rpc.connection.avg_round_trip_latency,"Sum of exponentially weighted moving average of round-trip latencies, as measured through a gRPC RPC.
Expand Down Expand Up @@ -2575,6 +2595,7 @@ APPLICATION,sql.savepoint.rollback.started.count.internal,Number of `ROLLBACK TO
APPLICATION,sql.savepoint.started.count,Number of SQL SAVEPOINT statements started,SQL Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.savepoint.started.count.internal,Number of SQL SAVEPOINT statements started (internal queries),SQL Internal Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.schema.invalid_objects,Gauge of detected invalid objects within the system.descriptor table (measured by querying crdb_internal.invalid_objects),Objects,GAUGE,COUNT,AVG,NONE
APPLICATION,sql.schema_changer.object_count,Counter of the number of objects in the cluster,Objects,GAUGE,COUNT,AVG,NONE
APPLICATION,sql.schema_changer.permanent_errors,Counter of the number of permanent errors experienced by the schema changer,Errors,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.schema_changer.retry_errors,Counter of the number of retriable errors experienced by the schema changer,Errors,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.schema_changer.running,Gauge of currently running schema changes,Schema changes,GAUGE,COUNT,AVG,NONE
Expand All @@ -2585,8 +2606,12 @@ APPLICATION,sql.select.started.count,Number of SQL SELECT statements started,SQL
APPLICATION,sql.select.started.count.internal,Number of SQL SELECT statements started (internal queries),SQL Internal Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.service.latency,Latency of SQL request execution,Latency,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,sql.service.latency.internal,Latency of SQL request execution (internal queries),SQL Internal Statements,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,sql.statement_timeout.count,Count of statements that failed because they exceeded the statement timeout,SQL Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.statement_timeout.count.internal,Count of statements that failed because they exceeded the statement timeout (internal queries),SQL Internal Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.statements.active,Number of currently active user SQL statements,Active Statements,GAUGE,COUNT,AVG,NONE
APPLICATION,sql.statements.active.internal,Number of currently active user SQL statements (internal queries),SQL Internal Statements,GAUGE,COUNT,AVG,NONE
APPLICATION,sql.statements.auto_retry.count,Number of SQL statement automatic retries,SQL Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.statements.auto_retry.count.internal,Number of SQL statement automatic retries (internal queries),SQL Internal Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.stats.activity.update.latency,The latency of updates made by the SQL activity updater job. Includes failed update attempts,Nanoseconds,HISTOGRAM,NANOSECONDS,AVG,NONE
APPLICATION,sql.stats.activity.updates.failed,Number of update attempts made by the SQL activity updater job that failed with errors,failed updates,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.stats.activity.updates.successful,Number of successful updates made by the SQL activity updater job,successful updates,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
Expand All @@ -2606,8 +2631,12 @@ APPLICATION,sql.temp_object_cleaner.active_cleaners,number of cleaner tasks curr
APPLICATION,sql.temp_object_cleaner.schemas_deletion_error,number of errored schema deletions by the temp object cleaner on this node,Count,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.temp_object_cleaner.schemas_deletion_success,number of successful schema deletions by the temp object cleaner on this node,Count,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.temp_object_cleaner.schemas_to_delete,number of schemas to be deleted by the temp object cleaner on this node,Count,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.transaction_timeout.count,Count of statements that failed because they exceeded the transaction timeout,SQL Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.transaction_timeout.count.internal,Count of statements that failed because they exceeded the transaction timeout (internal queries),SQL Internal Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.txn.abort.count,Number of SQL transaction abort errors,SQL Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.txn.abort.count.internal,Number of SQL transaction abort errors (internal queries),SQL Internal Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.txn.auto_retry.count,Number of SQL transaction automatic retries,SQL Transactions,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.txn.auto_retry.count.internal,Number of SQL transaction automatic retries (internal queries),SQL Internal Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.txn.begin.count,Number of SQL transaction BEGIN statements successfully executed,SQL Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.txn.begin.count.internal,Number of SQL transaction BEGIN statements successfully executed (internal queries),SQL Internal Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
APPLICATION,sql.txn.begin.started.count,Number of SQL transaction BEGIN statements started,SQL Statements,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
Expand Down Expand Up @@ -2739,6 +2768,38 @@ SERVER,sys.host.net.send.bytes,Bytes sent on all network interfaces since this p
SERVER,sys.host.net.send.drop,Sending packets that got dropped on all network interfaces since this process started (as reported by the OS),Packets,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
SERVER,sys.host.net.send.err,Error on sending packets on all network interfaces since this process started (as reported by the OS),Packets,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
SERVER,sys.host.net.send.packets,Packets sent on all network interfaces since this process started (as reported by the OS),Packets,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
SERVER,sys.host.net.send.tcp.fast_retrans_segs,"Segments retransmitted due to the fast retransmission mechanism in TCP.
Fast retransmissions occur when the sender learns that intermediate segments have been lost.",Segments,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
SERVER,sys.host.net.send.tcp.loss_probes,"
Number of TCP tail loss probes sent. Loss probes are an optimization to detect
loss of the last packet earlier than the retransmission timer, and can indicate
network issues. Tail loss probes are aggressive, so the base rate is often nonzero
even in healthy networks.",Probes,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
SERVER,sys.host.net.send.tcp.retrans_segs,"
The number of TCP segments retransmitted across all network interfaces.
This can indicate packet loss occurring in the network. However, it can
also be caused by recipient nodes not consuming packets in a timely manner,
or the local node overflowing its outgoing buffers, for example due to overload.

Retransmissions also occur in the absence of problems, as modern TCP stacks
err on the side of aggressively retransmitting segments.

The linux tool 'ss -i' can show the Linux kernel's smoothed view of round-trip
latency and variance on a per-connection basis. Additionally, 'netstat -s'
shows all TCP counters maintained by the kernel.
",Segments,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
SERVER,sys.host.net.send.tcp.slow_start_retrans,"
Number of TCP retransmissions in slow start. This can indicate that the network
is unable to support the initial fast ramp-up in window size, and can be a sign
of packet loss or congestion.
",Segments,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
SERVER,sys.host.net.send.tcp_timeouts,"
Number of TCP retransmission timeouts. These typically imply that a packet has
not been acknowledged within at least 200ms. Modern TCP stacks use
optimizations such as fast retransmissions and loss probes to avoid hitting
retransmission timeouts. Anecdotally, they still occasionally present themselves
even in supposedly healthy cloud environments.
",Timeouts,COUNTER,COUNT,AVG,NON_NEGATIVE_DERIVATIVE
SERVER,sys.rss,Current process RSS,RSS,GAUGE,BYTES,AVG,NONE
SERVER,sys.runnable.goroutines.per.cpu,"Average number of goroutines that are waiting to run, normalized by number of cores",goroutines,GAUGE,COUNT,AVG,NONE
SERVER,sys.totalmem,Total memory (both free and used),Memory,GAUGE,BYTES,AVG,NONE
Expand Down
Loading