Skip to content

Conversation

@pancak3
Copy link
Contributor

@pancak3 pancak3 commented Aug 25, 2025

resolve #164

@irar2
Copy link
Collaborator

irar2 commented Aug 26, 2025

Looking at https://libzmq.readthedocs.io/en/zeromq4-1/zmq_tcp.html, it looks like the address to start should be tcp://*:* (please make it a constant)

@pancak3
Copy link
Contributor Author

pancak3 commented Aug 26, 2025

Looking at https://libzmq.readthedocs.io/en/zeromq4-1/zmq_tcp.html, it looks like the address to start should be tcp://*:* (please make it a constant)

How about tcp://127.0.0.1:*? Listening on all interface sounds too open for test purpose
Using wildcard as port will panic zmq.NewSocket (https://libzmq.readthedocs.io/en/zeromq4-1/zmq_socket.html#toc3)
Which fails case run add/remove requests in parallel, use partial cache at
https://github.com/pancak3/llm-d-inference-sim/blob/fab7019849fdcdfba989d4f66b3ecd230b53e9d6/pkg/kv-cache/kv_cache_test.go#L421

Among the combinations of host [*, 127.0.0.1, localhost] with port [*, 0], only tcp://127.0.0.1:0 works.
Can we use tcp://127.0.0.1:0?

@irar2
Copy link
Collaborator

irar2 commented Aug 27, 2025

The tests fail in https://github.com/pancak3/llm-d-inference-sim/blob/fab7019849fdcdfba989d4f66b3ecd230b53e9d6/pkg/kv-cache/kv_cache_test.go#L421
because there is no sub there, the solution for this is to prevent publishing events by not updating ZMQEndpoint in the configuration, and in this case there is no need to create a publisher (in newBlockCache) and then in KVEventSender.Run we should continue if there is no publisher.

@pancak3
Copy link
Contributor Author

pancak3 commented Aug 28, 2025

@irar2 Not sure whether the change matches the above comment. I think I am not very familiar with the business logic of the KVEvent

DataParallelRank: &dpRank,
}

if s.publisher == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should be in Run() in two cases:
case eventData, ok := <-s.eventChan:
after if !ok{} please add in line 90
if s.publisher == nil { continue }
and
case <-timer.C:
please add the same check first thing in this case.

I don't think we need a log message here, there can be too many.

@irar2
Copy link
Collaborator

irar2 commented Sep 1, 2025

@pancak3 Thanks!

/lgtm
/approve

@github-actions github-actions bot added the lgtm label Sep 1, 2025
@irar2 irar2 merged commit 33e7210 into llm-d:main Sep 1, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ZMQ sub *:port failed randomly in CI

2 participants