Skip to content

Latest commit

 

History

History
114 lines (106 loc) · 3.6 KB

File metadata and controls

114 lines (106 loc) · 3.6 KB

Enable maintenance mode

  1. Check that all brokers are healthy:

    rpk cluster health
    Example output:
    CLUSTER HEALTH OVERVIEW
    =======================
    Healthy:                     true (1)
    Controller ID:               0
    All nodes:                   [0 1 2] (2)
    Nodes down:                  [] (3)
    Leaderless partitions:       [] (3)
    Under-replicated partitions: [1] (3)
    1. The cluster is either healthy (true) or unhealthy (false).

    2. The node IDs of all brokers in the cluster.

    3. If the cluster is unhealthy, these fields will contain data.

  2. Optional: You can use the Admin API (default port: 9644) to perform additional checks for potential risks with restarting a specific broker.

    curl -X GET "http://<broker-address>:<admin-api-port>/v1/broker/pre_restart_probe" | jq .
    Example output:
    // Returns tuples of partitions (in the format {namespace}/{topic_name}/{partition_id}) affected by the broker restart.
    
    {
      "risks": {
        "rf1_offline": [
          "kafka/topic_a/0"
        ],
        "full_acks_produce_unavailable": [],
        "unavailable": [],
        "acks1_data_loss": []
      }
    }

    In this example, the restart probe indicates that there is an under-replicated partition kafka/topic_a/0 (with a replication factor of 1) at risk of going offline if the broker is restarted.

    See the Admin API reference for more details on the restart probe endpoint.

    rpk cluster maintenance enable <node-id> --wait

    The --wait option tells the command to wait until a given broker, 0 in this example, finishes draining all partitions it originally served. After the partition draining completes, the command completes.

    Expected output:
    Successfully enabled maintenance mode for node 0
    Waiting for node to drain...
  3. Verify that the broker is in maintenance mode:

    rpk cluster maintenance status
    Expected output:
    NODE-ID  DRAINING  FINISHED  ERRORS  PARTITIONS  ELIGIBLE  TRANSFERRING  FAILED
    0        true      true      false   3           0         2             0
    1        false     false     false   0           0         0             0
    2        false     false     false   0           0         0             0

    The Finished column should read true for the broker that you put into maintenance mode.

  4. Validate the health of the cluster again:

    rpk cluster health --watch --exit-when-healthy

    The combination of the --watch and --exit-when-healthy flags tell rpk to monitor the cluster health and exit only when the cluster is back in a healthy state.

    Note
    rpk cluster maintenance disable <node-id>