Question about NCCL algorithms and profiling #1969
-
|
Hello, hopefully this isn't a stupid question, but I'm having trouble with something. I am trying to profile some NCCL primitive function to get a better idea of whats happening during algos. I am also wrapping the entire algo like so: I am profiling one thread per warp. When looking at results I see this: thread_id,primitive,start,end I don't understand this. Is this actually the same thread entering the algorithm code? I don't see where this actually happens in the code, the kernel is launched once, the channel and block are both always the same so its not like its the first thread of the second block for example. So my question would be is this the same thread, and if not what can I use to distinguish between them? And also what is actually causing this, where can I find it? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
I have realised I was storing the tid value in a uint8, which overflowed, causing my confusion. It works as intended now. |
Beta Was this translation helpful? Give feedback.
I have realised I was storing the tid value in a uint8, which overflowed, causing my confusion. It works as intended now.