Concurrent ArmNN CpuAcc Delegate Inference

When running two models each requesting the use of the ArmNN delegate, there is a race condition resulting in the call stack:
```
0# 0x0000AAAAD17609D4 in tritonserver
 1# __kernel_rt_sigreturn in linux-vdso.so.1
 2# armnn::NeonConvolution2dWorkload::Execute() const in /opt/tritonserver/backends/armnn_tflite/libarmnn.so.29
 3# armnn::LoadedNetwork::Execute(std::unique_ptr<arm::pipe::TimelineUtilityMethods, std::default_delete<arm::pipe::TimelineUtilityMethods> >&, arm::pipe::ProfilingGuid) in /opt/tritonserver/backends/armnn_tflite/libarmnn.so.29
 4# armnn::LoadedNetwork::EnqueueWorkload(std::vector<std::pair<int, armnn::ConstTensor>, std::allocator<std::pair<int, armnn::ConstTensor> > > const&, std::vector<std::pair<int, armnn::Tensor>, std::allocator<std::pair<int, armnn::Tensor> > > const&, std::vector<unsigned int, std::allocator<unsigned int> >, std::vector<unsigned int, std::allocator<unsigned int> >) in /opt/tritonserver/backends/armnn_tflite/libarmnn.so.29
 5# armnn::RuntimeImpl::EnqueueWorkload(int, std::vector<std::pair<int, armnn::ConstTensor>, std::allocator<std::pair<int, armnn::ConstTensor> > > const&, std::vector<std::pair<int, armnn::Tensor>, std::allocator<std::pair<int, armnn::Tensor> > > const&, std::vector<unsigned int, std::allocator<unsigned int> >, std::vector<unsigned int, std::allocator<unsigned int> >) in /opt/tritonserver/backends/armnn_tflite/libarmnn.so.29
 6# armnn::IRuntime::EnqueueWorkload(int, std::vector<std::pair<int, armnn::ConstTensor>, std::allocator<std::pair<int, armnn::ConstTensor> > > const&, std::vector<std::pair<int, armnn::Tensor>, std::allocator<std::pair<int, armnn::Tensor> > > const&, std::vector<unsigned int, std::allocator<unsigned int> >, std::vector<unsigned int, std::allocator<unsigned int> >) in /opt/tritonserver/backends/armnn_tflite/libarmnn.so.29
 7# armnnDelegate::ArmnnSubgraph::Invoke(TfLiteContext*, TfLiteNode*) in /opt/tritonserver/backends/armnn_tflite/libarmnnDelegate.so.26
 8# 0x0000FFFF188CC990 in /opt/tritonserver/backends/armnn_tflite/libtriton_armnn_tflite.so
 9# 0x0000FFFF188A8954 in /opt/tritonserver/backends/armnn_tflite/libtriton_armnn_tflite.so
10# 0x0000FFFF187D457C in /opt/tritonserver/backends/armnn_tflite/libtriton_armnn_tflite.so
11# 0x0000FFFF187DCF04 in /opt/tritonserver/backends/armnn_tflite/libtriton_armnn_tflite.so
12# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/armnn_tflite/libtriton_armnn_tflite.so
13# 0x0000FFFF8BF836A0 in /opt/tritonserver/bin/../lib/libtritonserver.so
14# 0x0000FFFF8BF845C4 in /opt/tritonserver/bin/../lib/libtritonserver.so
15# 0x0000FFFF8BFF4F50 in /opt/tritonserver/bin/../lib/libtritonserver.so
16# 0x0000FFFF8BF7D908 in /opt/tritonserver/bin/../lib/libtritonserver.so
17# 0x0000FFFF8BDFBFAC in /lib/aarch64-linux-gnu/libstdc++.so.6
18# 0x0000FFFF8C3C3624 in /lib/aarch64-linux-gnu/libpthread.so.0
19# 0x0000FFFF8BBBE49C in /lib/aarch64-linux-gnu/libc.so.6

Segmentation fault (core dumped)
```

It seems that the ArmNN delegate is only created once when loading two ArmNN accelerated models.

The above was generated using the model config:
```
max_batch_size: 1
input {
  name: "input_1"
  data_type: TYPE_UINT8
  format: FORMAT_NHWC
  dims: 300
  dims: 300
  dims: 3
}
output {
  name: "Identity"
  data_type: TYPE_UINT8
  dims: 8
}
instance_group {
  count: 1
  kind: KIND_CPU
}
optimization {
  execution_accelerators {
    cpu_execution_accelerator {
      name: "armnn"
      parameters {
        key: "fast_math_enabled"
        value: "on"
      }
      parameters {
        key: "num_threads"
        value: "4"
      }
      parameters {
        key: "reduce_fp32_to_bf16"
        value: "off"
      }
      parameters {
        key: "reduce_fp32_to_fp16"
        value: "off"
      }
    }
  }
}
parameters {
  key: "tflite_num_threads"
  value {
    string_value: "1"
  }
}
backend: "armnn_tflite"
```
for efficientnet_quant and the following:
```
max_batch_size: 1
input {
  name: "input"
  data_type: TYPE_FP32
  format: FORMAT_NHWC
  dims: 299
  dims: 299
  dims: 3
}
output {
  name: "InceptionV3/Predictions/Reshape_1"
  data_type: TYPE_FP32
  dims: 1001
}
instance_group {
  count: 1
  kind: KIND_CPU
}
optimization {
  execution_accelerators {
    cpu_execution_accelerator {
      name: "armnn"
      parameters {
        key: "fast_math_enabled"
        value: "on"
      }
      parameters {
        key: "num_threads"
        value: "2"
      }
      parameters {
        key: "reduce_fp32_to_bf16"
        value: "off"
      }
      parameters {
        key: "reduce_fp32_to_fp16"
        value: "off"
      }
    }
  }
}
parameters {
  key: "tflite_num_threads"
  value {
    string_value: "2"
  }
}
backend: "armnn_tflite"
```
for inceptionv3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Concurrent ArmNN CpuAcc Delegate Inference #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Concurrent ArmNN CpuAcc Delegate Inference #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions