-
Notifications
You must be signed in to change notification settings - Fork 102
Open
Description
I ran example throughput.cu and it failed on 4XGPU,
Command: 'cudaMemsetAsync(m_l2_buffer, 0, static_cast<std::size_t>(m_l2_size), stream)'
Run: [5/8] throughput_bench [Device=0]
Fail: Unexpected error: /data/github/build/cache/nvbench/b2fc/nvbench/detail/l2flush.cuh:55: Cuda API call returned error: cudaErrorInvalidValue: invalid argument
Command: 'cudaMemsetAsync(m_l2_buffer, 0, static_cast<std::size_t>(m_l2_size), stream)'
Run: [6/8] throughput_bench [Device=1]
Fail: Unexpected error: /data/github/build/cache/nvbench/b2fc/nvbench/detail/l2flush.cuh:55: Cuda API call returned error: cudaErrorInvalidValue: invalid argument
Command: 'cudaMemsetAsync(m_l2_buffer, 0, static_cast<std::size_t>(m_l2_size), stream)'
Run: [7/8] throughput_bench [Device=2]
Fail: Unexpected error: /data/github/build/cache/nvbench/b2fc/nvbench/detail/l2flush.cuh:55: Cuda API call returned error: cudaErrorInvalidValue: invalid argument
Command: 'cudaMemsetAsync(m_l2_buffer, 0, static_cast<std::size_t>(m_l2_size), stream)'
Run: [8/8] throughput_bench [Device=3]
Pass: Cold: 0.007061ms GPU, 0.016156ms CPU, 0.50s total GPU, 6.81s total wall, 70816x
Pass: Batch: 0.002299ms GPU, 0.50s total GPU, 0.50s toI noticed examples/stream.cu that can set_cuda_stream
state.set_cuda_stream(nvbench::make_cuda_stream_view(default_stream));
so I added it to throughput.cu which works fine
# Log
Run: [1/4] throughput_bench [Device=0]
Pass: Cold: 0.663276ms GPU, 0.672594ms CPU, 0.51s total GPU, 0.54s total wall, 768x
Pass: Batch: 0.659212ms GPU, 0.53s total GPU, 0.53s total wall, 800x
Run: [2/4] throughput_bench [Device=1]
Pass: Cold: 0.665058ms GPU, 0.674441ms CPU, 0.50s total GPU, 0.53s total wall, 752x
Pass: Batch: 0.660540ms GPU, 0.54s total GPU, 0.54s total wall, 815x
Run: [3/4] throughput_bench [Device=2]
Pass: Cold: 0.664827ms GPU, 0.674139ms CPU, 0.51s total GPU, 0.55s total wall, 768x
Pass: Batch: 0.660413ms GPU, 0.53s total GPU, 0.53s total wall, 809x
Run: [4/4] throughput_bench [Device=3]
Pass: Cold: 0.665416ms GPU, 0.674786ms CPU, 0.50s total GPU, 0.53s total wall, 752x
Pass: Batch: 0.660745ms GPU, 0.53s total GPU, 0.53s total wall, 807x
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels