Clear CPU memory after inference #18013

hahne · 2023-10-18T09:15:42Z

hahne
Oct 18, 2023

Hello,

we observe that ONNX runtime keeps a lot of CPU memory allocated after inference. Is it possible to free this memory somehow?

We tried to enable CPU memory arena shrinkage but it did not really have any effect.

Kind regards,
Johannes

RyanUnderhill · 2023-10-28T01:46:38Z

RyanUnderhill
Oct 28, 2023

By after inference you mean after a Run() call but you still have the OrtSession? Can you show a simplified code example of what you're doing?

0 replies

hahne · 2023-11-03T16:17:46Z

hahne
Nov 3, 2023
Author

Here is an example code and in comments the current CPU memory usage:

#include <onnxruntime_cxx_api.h>
#include "onnxruntime_run_options_config_keys.h"
#include <vector>
#include <string>

int main() {

	std::unique_ptr<Ort::Env> envPtr = std::make_unique<Ort::Env>(ORT_LOGGING_LEVEL_WARNING, "OnnxTestModel");

	std::wstring model_path = L"path/to/model";

	Ort::SessionOptions sessionOptions;

	{
		OrtCUDAProviderOptions cudaOptions;
		const double gpuMemoryLimitGb = 5.0;
		const size_t gpuMemoryLimitBytes = static_cast<size_t>(gpuMemoryLimitGb * 1024 * 1024 * 1024);
		cudaOptions.device_id = 0;
		cudaOptions.cudnn_conv_algo_search = OrtCudnnConvAlgoSearch::OrtCudnnConvAlgoSearchHeuristic;
		cudaOptions.gpu_mem_limit = gpuMemoryLimitBytes;
		sessionOptions.AppendExecutionProvider_CUDA(cudaOptions);
	}

	{
		sessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_ALL);
	}
	//283MB
	std::unique_ptr<Ort::Session> sessionPtr = std::make_unique<Ort::Session>(
		*envPtr, model_path.c_str(), sessionOptions);
	//1.3GB
	const Ort::TypeInfo inputTypeInfo = sessionPtr->GetInputTypeInfo(0);
	const auto inputTensorTypeShape = inputTypeInfo.GetTensorTypeAndShapeInfo();
	const ONNXTensorElementDataType inputElementType = inputTensorTypeShape.GetElementType();

	const std::vector<int64_t> inputShape = { 1, 128, 128, 128, 1 };;
	Ort::AllocatorWithDefaultOptions allocator{};

	size_t input_tensor_size = 1 * 128 * 128 * 128 * 1;
	std::vector<float> input_tensor_values(input_tensor_size, 0.0f);

	Ort::MemoryInfo memory_info = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
	Ort::Value inputTensor = Ort::Value::CreateTensor<float>(memory_info, input_tensor_values.data(), input_tensor_size, inputShape.data(), inputShape.size());


	Ort::RunOptions runOptions = Ort::RunOptions();
	runOptions.AddConfigEntry(
		kOrtRunOptionsConfigEnableMemoryArenaShrinkage, "cpu:0;gpu:0");

	std::vector<const char*> _inputTensorCNames{ "input_1" };
	std::vector<const char*> _outputTensorCNames{ "activation" };

	std::vector<Ort::Value> outputTensors = sessionPtr->Run(
		runOptions,
		_inputTensorCNames.data(),
		&inputTensor,
		_inputTensorCNames.size(),
		_outputTensorCNames.data(),
		_outputTensorCNames.size());
	//1.8GB
	outputTensors.clear();

	inputTensor = Ort::Value(nullptr);
	
	sessionPtr.reset();
	//1.77GB
	envPtr.reset();
	//1.16GB
}

I would have expected the memory usage to go down more after the inference but also after the session is destroyed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clear CPU memory after inference #18013

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Clear CPU memory after inference #18013

Uh oh!

hahne Oct 18, 2023

Replies: 2 comments

Uh oh!

RyanUnderhill Oct 28, 2023

Uh oh!

Uh oh!

hahne Nov 3, 2023 Author

hahne
Oct 18, 2023

RyanUnderhill
Oct 28, 2023

hahne
Nov 3, 2023
Author