Replies: 2 comments
-
By after inference you mean after a Run() call but you still have the OrtSession? Can you show a simplified code example of what you're doing? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Here is an example code and in comments the current CPU memory usage: #include <onnxruntime_cxx_api.h>
#include "onnxruntime_run_options_config_keys.h"
#include <vector>
#include <string>
int main() {
std::unique_ptr<Ort::Env> envPtr = std::make_unique<Ort::Env>(ORT_LOGGING_LEVEL_WARNING, "OnnxTestModel");
std::wstring model_path = L"path/to/model";
Ort::SessionOptions sessionOptions;
{
OrtCUDAProviderOptions cudaOptions;
const double gpuMemoryLimitGb = 5.0;
const size_t gpuMemoryLimitBytes = static_cast<size_t>(gpuMemoryLimitGb * 1024 * 1024 * 1024);
cudaOptions.device_id = 0;
cudaOptions.cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::OrtCudnnConvAlgoSearchHeuristic;
cudaOptions.gpu_mem_limit = gpuMemoryLimitBytes;
sessionOptions.AppendExecutionProvider_CUDA(cudaOptions);
}
{
sessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
}
//283MB
std::unique_ptr<Ort::Session> sessionPtr = std::make_unique<Ort::Session>(
*envPtr, model_path.c_str(), sessionOptions);
//1.3GB
const Ort::TypeInfo inputTypeInfo = sessionPtr->GetInputTypeInfo(0);
const auto inputTensorTypeShape = inputTypeInfo.GetTensorTypeAndShapeInfo();
const ONNXTensorElementDataType inputElementType = inputTensorTypeShape.GetElementType();
const std::vector<int64_t> inputShape = { 1, 128, 128, 128, 1 };;
Ort::AllocatorWithDefaultOptions allocator{};
size_t input_tensor_size = 1 * 128 * 128 * 128 * 1;
std::vector<float> input_tensor_values(input_tensor_size, 0.0f);
Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
Ort::Value inputTensor = Ort::Value::CreateTensor<float>(memory_info, input_tensor_values.data(), input_tensor_size, inputShape.data(), inputShape.size());
Ort::RunOptions runOptions = Ort::RunOptions();
runOptions.AddConfigEntry(
kOrtRunOptionsConfigEnableMemoryArenaShrinkage, "cpu:0;gpu:0");
std::vector<const char*> _inputTensorCNames{ "input_1" };
std::vector<const char*> _outputTensorCNames{ "activation" };
std::vector<Ort::Value> outputTensors = sessionPtr->Run(
runOptions,
_inputTensorCNames.data(),
&inputTensor,
_inputTensorCNames.size(),
_outputTensorCNames.data(),
_outputTensorCNames.size());
//1.8GB
outputTensors.clear();
inputTensor = Ort::Value(nullptr);
sessionPtr.reset();
//1.77GB
envPtr.reset();
//1.16GB
} I would have expected the memory usage to go down more after the inference but also after the session is destroyed. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
we observe that ONNX runtime keeps a lot of CPU memory allocated after inference. Is it possible to free this memory somehow?
We tried to enable CPU memory arena shrinkage but it did not really have any effect.
Kind regards,
Johannes
Beta Was this translation helpful? Give feedback.
All reactions