Skip to content

NULL Dereference in LeaveGroup When Coordinator Is Unavailable #5347

@MAlostaz

Description

@MAlostaz

Summary

When a consumer is destroyed (rd_kafka_destroy()) and the group coordinator is unavailable, rd_kafka_cgrp_leave() calls rd_kafka_cgrp_handle_LeaveGroup() with a NULL broker pointer (rkb = rkcg->rkcg_coord). The error path then dereferences this NULL pointer:

`rd_kafka_dbg(rkb->rkb_rk, CGRP, "LEAVEGROUP", ...);  // CRASH: rkb is NULL`

This crash requires the coordinator to become unavailable at the exact moment the consumer is shutting down. Typical triggers:

  • Rolling broker upgrades where the coordinator broker restarts
  • Coordinator failover during consumer shutdown
  • Network partition isolating the coordinator

Identification

  • We observed intermittent SIGSEGV crashes in production during consumer shutdown
  • We captured the core dump and analyzed with gdb
    #0  rd_kafka_cgrp_handle_LeaveGroup (rk=0x..., rkb=0x0, err=RD_KAFKA_RESP_ERR__WAIT_COORD, ...)
        at rdkafka_cgrp.c:984
    #1  rd_kafka_cgrp_leave (rkcg=0x...) at rdkafka_cgrp.c:1158
    #2  rd_kafka_cgrp_terminate (rkcg=0x...) at rdkafka_cgrp.c:...
    #3  rd_kafka_destroy_internal (rk=0x...) at rdkafka.c:...
  • The above backtrace shows rkb=0x0 (NULL) in rd_kafka_cgrp_handle_LeaveGroup()

  • We traced the call site in rd_kafka_cgrp_leave() (line 1158):

    } else
        rd_kafka_cgrp_handle_LeaveGroup(rkcg->rkcg_rk, rkcg->rkcg_coord,  // <-- rkcg_coord is NULL here
                                         RD_KAFKA_RESP_ERR__WAIT_COORD,
                                         NULL, NULL, rkcg);
  • This else branch is taken when no coordinator is available (rkcg->rkcg_coord == NULL)

  • The function then attempts to log using rkb->rkb_rk at line 984, causing NULL dereference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions