-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Summary
Disclaimer: I'm not very familiar with this library, or with neural network processing in general. I'm working with a model developed by a sister company (created using PyTorch, converted to OpenVINO IR from ONNX) and we are getting a persistent crash when running on Windows with a large number of concurrent streams. With some effort I narrowed down the cause and identified a fix that worked for us, but there are probably much more elegant ways to fix this, hence this is an issue report instead of a pull request.
The cause of the crash seems to be this instruction:
00000179D623120F C4 E3 6D 4A 18 00 vblendvps ymm3,ymm2,ymmword ptr [rax],ymm0
which is part of a JIT routine called by
template <cpu_isa_t isa, data_type_t acc_type, data_type_t dst_type>
void jit_pp_kernel_t<isa, acc_type, dst_type>::operator()(dst_data_t *dst,
const acc_data_t *acc, const char *bias, const float *scales,
size_t start, size_t end, size_t runtime_oc,
const float *dst_zero_points) const
in jit_gemm_inner_product_utils.cpp. At the time of the crash RAX is pointing less than 32 bytes from the end of an allocated memory page, and the addresses following this page are invalid. The YMM instructions operate on 32 bytes (256 bits) at a time, so this causes an access violation.
For example, in one of my crash dumps RAX has the value 00000179EC8A6FE4, and the memory there looks like this:
0x00000179EC8A6FE4 06 6f 8f 3d fd fd fd fd .o.=ýýýý
0x00000179EC8A6FEC dd dd dd dd dd dd dd dd ÝÝÝÝÝÝÝÝ
0x00000179EC8A6FF4 dd dd dd dd dd dd dd dd ÝÝÝÝÝÝÝÝ
0x00000179EC8A6FFC dd dd dd dd ?? ?? ?? ?? ÝÝÝÝ....
0x00000179EC8A7004 ?? ?? ?? ?? ?? ?? ?? ?? ........
...
I seem to have fixed this by adding 32 bytes of padding to all node/edge memory allocations, using the following patch:
diff --git a/src/common/memory_desc_wrapper.hpp b/src/common/memory_desc_wrapper.hpp
index 4017db3dc..158120ad5 100644
--- a/src/common/memory_desc_wrapper.hpp
+++ b/src/common/memory_desc_wrapper.hpp
@@ -162,7 +162,7 @@ struct memory_desc_wrapper : public c_compatible {
max_size = utils::array_product(bd.inner_blks, bd.inner_nblks);
}
- return max_size * data_type_size() + additional_buffer_size();
+ return max_size * data_type_size() + additional_buffer_size() + 32;
}
}
This is enough to allow us to move forward, but I'm sure someone with a better understanding of the code can do better.
Version
Git hash is e0381c3. This is the version referenced by OpenVINO 2021.4.
Environment
CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
OS version: Windows 10 Enterprise LTSC (10.0.17763)
Compiler version: Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30133 for x64
CMake version: 3.16.2
Steps to reproduce
I wish I could give you something better to go on here, but we were only able to reproduce this in a system that was live streaming from 64 network cameras and running them all through the model in real time. I was not able to reproduce using a test harness that read the data from disk instead of over a network.
Observed behavior
The program crashes intermittently (see summary). This can happen after a couple of minutes or a couple of hours.
Expected behavior
The program does not crash.