-
Notifications
You must be signed in to change notification settings - Fork 56
Add the greedy ambiguity solver to throughput examples #1053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add the greedy ambiguity solver to throughput examples #1053
Conversation
022693e to
126a871
Compare
126a871 to
d61322d
Compare
4c9698f to
adcb664
Compare
|
Maybe adding a new algorithm to the chain makes comparison currupted? I knew that |
62adba1 to
88347ea
Compare
|
@stephenswat Can you generate the plot again as the issue on covfie hash has been resolved? |
|
Running now! Mind you that due to the large number of kernels the ambiguity resolution needs to run, the benchmark takes about 3 hours per commit. |
This comment was marked as outdated.
This comment was marked as outdated.
|
Hi @beomki-yeo I tried running the multi-threaded CUDA throughput example with this PR and it is giving me segmentation faults. Have you seen this? |
|
I will check |
|
It's running normally for me. Could you share your command? gcc version: 31.1 |
b4437b2 to
154f3c7
Compare
|
|
The above benchmark may not be correct due to the bugs reported in #1083
|
5e61612 to
4d31318
Compare
4d31318 to
0817b70
Compare
|
|
This is the output of single thread chain: This PR Main As the ambiguity solver still increases the even processing time 15% but I think this number can be tolerated considering the importance of ambiguity resolver. There is still room to improve in the ambiguity solver as well |
|
@stephenswat If you don't mind, could you also generate the plots for throughput and physical performance? |
|
Running the compute plots for you now. The physics plots are based on the CUDA seeding example, so I cannot make those from this commit unfortunately; the ambiguity resolution would need to be added to the seeding example. |
|
By the way, on the RTX A5000 we have at CERN the ambiguity resolution tests fail: Is this something you have seen? |
Performance summaryHere is a summary of the performance effects of this PR: GraphicalTabular
Important All metrics in this report are given as reciprocal throughput, not as wallclock runtime. Warning At least one kernel incurred a significant performance regression. Note This is an automated message produced upon the explicit request of a human being. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This type of failure in |
|
Just as an update, I found that the tests on RTX A6000 are OK with cuda 12.4 and 12.6. |
|
@stephenswat Would you be able to check if your failure is relevant with #1159 ? |





This PR depends on #1057
The greedy ambiguity solver is added to the CUDA full chain