Skip to content

Commit 6958ef1

Browse files
tianleiwugedoensmax
authored andcommitted
Skip node output dump for MemcpyToHost (microsoft#25651)
Fix node output dump for MemcpyToHost. The statistics data is not correct, since data might not be copied to CPU yet: ``` MemcpyToHost node: Memcpy_token_232 Input 0 Name: /model/layers.6/moe/router/Add/output_0_CUDAExecutionProvider Shape: {1,1,32} OrtMemoryInfo:[name:Cuda OrtMemType:0 OrtAllocatorType:1 Device:[DeviceType:1 MemoryType:0 VendorId:4318 DeviceId:0 Alignment:0]] Min=-2.5136719,Max=1.6914062 ----------- Output 0 Name: /model/layers.6/moe/router/Add/output_0 Shape: {1,1,32} Min=-4888,Max=6672,NaN=2 ``` This fix will skip the output dump (or statistics) like ``` ----------- Output 0 Name: /model/layers.6/moe/router/Add/output_0 Shape: {1,1,32} is same as input ```
1 parent 913573f commit 6958ef1

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

onnxruntime/core/framework/debug_node_inputs_outputs_utils.cc

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -667,6 +667,13 @@ void DumpNodeOutputs(
667667
const bool is_shape_set = (dump_options.dump_flags & NodeDumpOptions::DumpFlags::Shape) != 0;
668668
PrintIf(is_shape_set, MakeString(" Shape: ", shape, "\n"));
669669

670+
// For MemcpyToHost, the memory copy has not been syncronized so the data is not ready to read yet.
671+
// Here we skip it since it is just a copy of input tensor (or output of previous node) which has been dumped.
672+
if (node.OpType() == "MemcpyToHost") {
673+
std::cout << " is same as input.\n";
674+
continue;
675+
}
676+
670677
if ((dump_options.dump_flags & NodeDumpOptions::DumpFlags::OutputData) != 0 || check_half_overflow) {
671678
tensor_metadata.name = output_defs[i]->Name();
672679
tensor_metadata.step = dump_context.iteration;

0 commit comments

Comments
 (0)