Skip to content

Proposal: stability enhancement during overload conditions #20396

@silentred

Description

@silentred

Problem Scenario

When an etcd server encounters a large number of write requests, and its applying rate is unable to process these requests in a timely manner (for example, when the Kubernetes API server writes a large number of Event objects, creates a large number of Pods in bulk, or unexpected batch Pods restarts occur), the server may return ErrTooManyRequests due to the current logic as below. This is understood as a protective mechanism. However, the current strategy can lead to catastrophic consequences.

func (s *EtcdServer) processInternalRaftRequestOnce(ctx context.Context, r pb.InternalRaftRequest) (*apply2.Result, error) {

	if ci > ai+maxGapBetweenApplyAndCommitIndex {
		return nil, errors.ErrTooManyRequests
	}
}

The Kubernetes API server binds the storage of Event objects to Leases, aiming to automatically clean up associated Events once the Lease expires. Lease expiration is handled by the etcd leader polling the Leases and initiating LeaseRevoke requests. If the etcd cluster is in the aforementioned protective state at this time, the LeaseRevoke requests may have no chance to be executed (because Txn request is much more than LeaseRevoke), leading to a surge in the number of keys, further negatively affecting the apply rate, and ultimately causing etcd completely unavailable.

The core issue is that the current protection logic does not differentiate between user requests and internal system requests, resulting in indiscriminate rejection. When internal system requests cannot be executed, the system state may deteriorate, finally causing a system crash. LeaseRevoke is just one such internal system request. Compact requests have similar issues.

Proposal

  1. As discussed in errors.ErrTooManyRequests and maxGapBetweenApplyAndCommitIndex #18175 , configurable maxGapBetweenApplyAndCommitIndex would be more flexible.
  2. Within the protection logic, reserve some queue space specifically for critical requests, so that essential system requests (LeaseRevoke, Compact) would have an chance to be executed under system pressure, preventing system crashes. Demo codes as below.
func (s *EtcdServer) processInternalRaftRequestOnce(ctx context.Context, r pb.InternalRaftRequest) (*apply2.Result, error) {

	if isTooLargeGap(ai, ci, &r) {
		return nil, errors.ErrTooManyRequests
	}
}

func isTooLargeGap(ai, ci uint64, r *pb.InternalRaftRequest) bool {
	isCriticalReq := r != nil && (r.Compact != nil || r.LeaseRevoke != nil)
	// for normal request
	if ci > ai+maxGapBetweenApplyAndCommitIndex && !isCriticalReq {
		return true
	}
	// for system critical request, have a seperate 500 queue buffer.
	if ci - ai > maxGapBetweenApplyAndCommitIndex + 500 && isCriticalReq {
	    return true
	}

	return false
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions