Skip to content

Comments

Add support for MSK clusters with kraft metadata nodes#19

Merged
joshm91 merged 4 commits intostatsbomb:mainfrom
errm:kraft-clusters
Feb 7, 2025
Merged

Add support for MSK clusters with kraft metadata nodes#19
joshm91 merged 4 commits intostatsbomb:mainfrom
errm:kraft-clusters

Conversation

@errm
Copy link
Contributor

@errm errm commented Jan 30, 2025

Since kafka version 3.7.0 MSK has supported using kraft metadata nodes instead of zookeeper. https://aws.amazon.com/about-aws/whats-new/2024/05/amazon-msk-kraft-mode-apache-kafka-clusters/

When running prometheus-msk-discovery against a cluster in kraft node there is a panic like this:

2025/01/30 12:03:36 http: panic serving 10.3.57.169:45616: runtime error: invalid memory address or nil pointer dereference
goroutine 18 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1868 +0xb9
panic({0x7675e0?, 0xa8f9c0?})
	/usr/local/go/src/runtime/panic.go:920 +0x270
main.getBrokers({0x86e250?, 0xc00013a140}, {0xc00002bf20, 0x5b})
	/src/main.go:131 +0x24d
main.buildClusterDetails({0x86e250?, 0xc00013a140?}, {0x0, 0xc000268730, 0xc000232b40, 0xc000232a80, 0xc000232a70, 0xc000226930, 0xc000226948, 0xc000232b20, ...})
	/src/main.go:140 +0x56
main.GetStaticConfigs({0x86e250, 0xc00013a140}, {0xc00025ba18, 0x1, 0xc0000ef988?})
	/src/main.go:199 +0x26b
main.httpSD.func1({0x86feb0, 0xc00015a000}, 0xc0000efb18?)
	/src/main.go:248 +0xbb
net/http.HandlerFunc.ServeHTTP(0x440480?, {0x86feb0?, 0xc00015a000?}, 0x6457fa?)
	/usr/local/go/src/net/http/server.go:2136 +0x29
net/http.(*ServeMux).ServeHTTP(0xace040?, {0x86feb0, 0xc00015a000}, 0xc000154000)
	/usr/local/go/src/net/http/server.go:2514 +0x142
net/http.serverHandler.ServeHTTP({0xc000150090?}, {0x86feb0?, 0xc00015a000?}, 0x6?)
	/usr/local/go/src/net/http/server.go:2938 +0x8e
net/http.(*conn).serve(0xc00011a1b0, {0x8704c0, 0xc0000a3f80})
	/usr/local/go/src/net/http/server.go:2009 +0x5f4
created by net/http.(*Server).Serve in goroutine 1
	/usr/local/go/src/net/http/server.go:3086 +0x5cb

This is caused because the list nodes api returns records like this:

{
    "AddedToClusterTime": null,
    "BrokerNodeInfo": null,
    "ControllerNodeInfo": {
        "Endpoints": [
            "c-10002.foo.xxxxxx.c7.kafka.us-east-1.amazonaws.com"
        ]
    },
    "InstanceType": null,
    "NodeARN": null,
    "NodeType": "CONTROLLER",
    "ZookeeperNodeInfo": null
}

which have a nil BrokerNodeInfo

This PR fixes this bug, and also adds these controller nodes to the target endpoints.

When JMX Exporter and Node Exporter are enabled on the cluster, these nodes only seem to be running JMX Exporter, so we are only adding this endpoint for these nodes.

Since kafka version 3.7.0 MSK has supported using kraft metadata nodes
instead of zookeeper. https://aws.amazon.com/about-aws/whats-new/2024/05/amazon-msk-kraft-mode-apache-kafka-clusters/

When running prometheus-msk-discovery against a cluster in kraft node
there is a panic like this:

```
2025/01/30 12:03:36 http: panic serving 10.3.57.169:45616: runtime error: invalid memory address or nil pointer dereference
goroutine 18 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1868 +0xb9
panic({0x7675e0?, 0xa8f9c0?})
	/usr/local/go/src/runtime/panic.go:920 +0x270
main.getBrokers({0x86e250?, 0xc00013a140}, {0xc00002bf20, 0x5b})
	/src/main.go:131 +0x24d
main.buildClusterDetails({0x86e250?, 0xc00013a140?}, {0x0, 0xc000268730, 0xc000232b40, 0xc000232a80, 0xc000232a70, 0xc000226930, 0xc000226948, 0xc000232b20, ...})
	/src/main.go:140 +0x56
main.GetStaticConfigs({0x86e250, 0xc00013a140}, {0xc00025ba18, 0x1, 0xc0000ef988?})
	/src/main.go:199 +0x26b
main.httpSD.func1({0x86feb0, 0xc00015a000}, 0xc0000efb18?)
	/src/main.go:248 +0xbb
net/http.HandlerFunc.ServeHTTP(0x440480?, {0x86feb0?, 0xc00015a000?}, 0x6457fa?)
	/usr/local/go/src/net/http/server.go:2136 +0x29
net/http.(*ServeMux).ServeHTTP(0xace040?, {0x86feb0, 0xc00015a000}, 0xc000154000)
	/usr/local/go/src/net/http/server.go:2514 +0x142
net/http.serverHandler.ServeHTTP({0xc000150090?}, {0x86feb0?, 0xc00015a000?}, 0x6?)
	/usr/local/go/src/net/http/server.go:2938 +0x8e
net/http.(*conn).serve(0xc00011a1b0, {0x8704c0, 0xc0000a3f80})
	/usr/local/go/src/net/http/server.go:2009 +0x5f4
created by net/http.(*Server).Serve in goroutine 1
	/usr/local/go/src/net/http/server.go:3086 +0x5cb
```

This is caused because the list nodes api returns records like this:

```
{
    "AddedToClusterTime": null,
    "BrokerNodeInfo": null,
    "ControllerNodeInfo": {
        "Endpoints": [
            "c-10002.foo.xxxxxx.c7.kafka.us-east-1.amazonaws.com"
        ]
    },
    "InstanceType": null,
    "NodeARN": null,
    "NodeType": "CONTROLLER",
    "ZookeeperNodeInfo": null
}
```
which have a nil BrokerNodeInfo

This PR fixes this bug, and also adds these controller nodes to
the target endpoints.

When JMX Exporter and Node Exporter are enabled on the cluster, these nodes
only seem to be running JMX Exporter, so we are only adding this
endpoint for these nodes.
@errm
Copy link
Contributor Author

errm commented Feb 5, 2025

Ping @joshm91

@joshm91
Copy link
Collaborator

joshm91 commented Feb 5, 2025

Thanks for this, @errm. I've set some time aside next week to give this repo a bit of attention so I'll get your change reviewed and merged then.

@errm
Copy link
Contributor Author

errm commented Feb 5, 2025

Awesome! Thanks 🙇

@joshm91 joshm91 merged commit 9725e73 into statsbomb:main Feb 7, 2025
1 check passed
@joshm91
Copy link
Collaborator

joshm91 commented Feb 7, 2025

Thanks again for this! Merged and pushed to prometheus-msk-discovery:0.7.0 and prometheus-msk-discovery:latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants