Skip to content

Conversation

@vinser52
Copy link
Contributor

@vinser52 vinser52 commented Nov 17, 2024

Description

Enable Jemalloc pool test with BA_GLOBAL_PROVIDER

Ref: #903

Checklist

  • Code compiles without errors locally
  • All tests pass locally
  • CI workflows execute properly

@vinser52
Copy link
Contributor Author

@bratpiorka @lplewa I created this PR as a reproducer for the issue with jemalloc (See #903 ). It works only from a certain version of jemalloc, but looks like our CI needs to be fixed to use the "proper" version of jemalloc.

@vinser52 vinser52 force-pushed the svinogra_jemalloc_ba_provider branch 2 times, most recently from 01ce947 to a262693 Compare November 27, 2024 11:01
@ldorau
Copy link
Contributor

ldorau commented Dec 2, 2024

@vinser52 Please test this PR on top of #941

Copy link
Contributor

@ldorau ldorau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinser52 Please test this PR on top of #941

@vinser52 I've checked - this PR works on top of #941

@vinser52
Copy link
Contributor Author

vinser52 commented Dec 3, 2024

@vinser52 Please test this PR on top of #941

@vinser52 I've checked - this PR works on top of #941

thank you, @ldorau, Sorry I was busy yesterday and wanted to check it today.

@ldorau
Copy link
Contributor

ldorau commented Dec 5, 2024

@vinser52 #941 has been merged, so you can rebase this PR now

@vinser52 vinser52 force-pushed the svinogra_jemalloc_ba_provider branch from a262693 to 1b99bc7 Compare December 5, 2024 10:36
@vinser52
Copy link
Contributor Author

vinser52 commented Dec 5, 2024

@vinser52 #941 has been merged, so you can rebase this PR now

Done. Works locally on my machine. Waiting for the CI.

@bratpiorka
Copy link
Contributor

I guess this would require to add the wrapper for hwloc_topology_init() with attribute((disable_sanitizer_instrumentation)) or similar

@vinser52 vinser52 force-pushed the svinogra_jemalloc_ba_provider branch from 1b99bc7 to 48736b6 Compare December 6, 2024 16:43
@vinser52 vinser52 force-pushed the svinogra_jemalloc_ba_provider branch from 48736b6 to 0169514 Compare January 14, 2025 13:19
@ldorau
Copy link
Contributor

ldorau commented Jan 24, 2025

@vinser52 I've tried to fix that, but compiler attributes disabling AddressSanitizer do not work (build still fails):

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 70ac087..8835c40 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -452,6 +452,8 @@ endif()
 # Sanitizer flags
 if(UMF_USE_ASAN)
     add_sanitizer_flag(address)
+    set(UMF_COMMON_COMPILE_DEFINITIONS ${UMF_COMMON_COMPILE_DEFINITIONS}
+                                       UMF_USE_ASAN=1)
 endif()
 if(UMF_USE_UBSAN)
     add_sanitizer_flag(undefined)
diff --git a/src/provider/provider_os_memory.c b/src/provider/provider_os_memory.c
index 1fe4679..c26d3ed 100644
--- a/src/provider/provider_os_memory.c
+++ b/src/provider/provider_os_memory.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2022-2024 Intel Corporation
+ * Copyright (C) 2022-2025 Intel Corporation
  *
  * Under the Apache License v2.0 with LLVM Exceptions. See LICENSE.TXT.
  * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
@@ -17,6 +17,20 @@
 #include <umf/memory_provider_ops.h>
 #include <umf/providers/provider_os_memory.h>

+#ifdef UMF_USE_ASAN
+#ifdef _WIN32
+#define ATTR_NO_ASAN __declspec(noinline) __declspec(no_sanitize_address)
+#else /* !_WIN32 */
+#ifdef __clang__
+#define ATTR_NO_ASAN __attribute__((no_sanitize("address"), noinline))
+#else /* !__clang__ */
+#define ATTR_NO_ASAN __attribute__((no_sanitize_address, noinline))
+#endif /* !__clang__ */
+#endif /* !_WIN32 */
+#else  /* !UMF_USE_ASAN */
+#define ATTR_NO_ASAN
+#endif /* !UMF_USE_ASAN */
+
 // OS Memory Provider requires HWLOC
 #if defined(UMF_NO_HWLOC)

@@ -568,6 +582,10 @@ static umf_result_t translate_params(umf_os_memory_provider_params_t *in_params,
     return UMF_RESULT_SUCCESS;
 }

+static int ATTR_NO_ASAN hwloc_topology_init_no_asan(hwloc_topology_t *topo) {
+    return hwloc_topology_init(topo);
+}
+
 static umf_result_t os_initialize(void *params, void **provider) {
     umf_result_t ret;

@@ -594,7 +612,7 @@ static umf_result_t os_initialize(void *params, void **provider) {

     memset(os_provider, 0, sizeof(*os_provider));

-    int r = hwloc_topology_init(&os_provider->topo);
+    int r = hwloc_topology_init_no_asan(&os_provider->topo);
     if (r) {
         LOG_ERR("HWLOC topology init failed");
         ret = UMF_RESULT_ERROR_OUT_OF_HOST_MEMORY;

@ldorau
Copy link
Contributor

ldorau commented Jan 24, 2025

BTW: on Ubuntu 24.04 it works well

@ldorau
Copy link
Contributor

ldorau commented Feb 21, 2025

@vinser52 @bratpiorka Should this PR be still opened?

@vinser52
Copy link
Contributor Author

@vinser52 @bratpiorka Should this PR be still opened?

So I just have no enough time to resolve the CI issues. if someone can take over I would be happy. But in general I think it makes sense to have such tests in CI

@vinser52 vinser52 force-pushed the svinogra_jemalloc_ba_provider branch 2 times, most recently from afed5ba to f54d8f5 Compare February 26, 2025 20:29
@vinser52 vinser52 force-pushed the svinogra_jemalloc_ba_provider branch from f54d8f5 to d4efb0e Compare March 4, 2025 15:16
@ldorau
Copy link
Contributor

ldorau commented Mar 13, 2025

@vinser52 @bratpiorka This test will not work because of an issue in jemalloc - jemalloc does not free the allocated memory extents on which the arena_extent_split() failed:

Jemalloc splits allocated memory extents - it calls arena_extent_split() that calls umfMemoryProviderAllocationSplit() but BA_GLOBAL_PROVIDER does not support split() op and arena_extent_split() always fails. Then jemalloc allocates a new smaller memory extent, but it never frees the first extent on which the split() failed and - as a result - the assert in the tracker is always triggered, because the tracker is not empty:

$ ./test/test_jemalloc_pool --gtest_filter="*.allocFree/1*"
[PID:4827   TID:4827   INFO  UMF] utils_log_init: Logger enabled (UMF version: 0.11.0-dev4.git1.g8c111e8e, level: DEBUG, flush: DEBUG, pid: yes, timestamp: no)
[PID:4827   TID:4827   DEBUG UMF] umf_ba_create_global: UMF base allocator created
[PID:4827   TID:4827   DEBUG UMF] umfMemoryTrackerCreate: tracker created, handle=0x7f1ae2ab0068, alloc_segments_map=0x7f1ae2ab0070
[PID:4827   TID:4827   DEBUG UMF] umfInit: UMF tracker created
[PID:4827   TID:4827   DEBUG UMF] umfInit: UMF IPC cache initialized
[PID:4827   TID:4827   DEBUG UMF] umfInit: UMF library initialized
[PID:4827   TID:4827   INFO  UMF] utils_log_init: Logger enabled (UMF version: 0.11.0-dev4.git1.g8c111e8e, level: DEBUG, flush: DEBUG, pid: yes, timestamp: no)
Running main() from /home/ldorau/work/unified-memory-framework/build/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *allocFree/1*
[==========] Running 6 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 6 tests from jemallocPoolTest/umfPoolTest
[ RUN      ] jemallocPoolTest/umfPoolTest.allocFree/1
[PID:4827   TID:4827   DEBUG UMF] umfTrackingMemoryProviderCreate: upstream=0x7f1ae2ab0168, tracker=0x7f1ae2ab0068, pool=0x7f1ae2ab0268, ipcCache=0x7f1ae2a9e008, hIpcMappedCache=0x7f1ae2ac1068
[PID:4827   TID:4827   INFO  UMF] umfPoolCreateInternal: Memory pool created: 0x7f1ae2ab0268
[PID:4827   TID:4827   DEBUG UMF] umfMemoryTrackerAddAtLevel: memory region is added, tracker=0x7f1ae2ab0068, level=0, pool=0x7f1ae2ab0268, ptr=0x7f1ae1bff000, size=2097152
[PID:4827   TID:4827   ERROR UMF] trackingAllocationSplit: upstream provider failed to split the region
[PID:4827   TID:4827   ERROR UMF] trackingAllocationSplit: failed to split memory region: ptr=0x7f1ae1bff000, totalSize=2097152, firstSize=4096
[PID:4827   TID:4827   DEBUG UMF] umfMemoryTrackerAddAtLevel: memory region is added, tracker=0x7f1ae2ab0068, level=0, pool=0x7f1ae2ab0268, ptr=0x7f1ae2a88000, size=4096
[PID:4827   TID:4827   DEBUG UMF] umfMemoryTrackerRemove: memory region removed: tracker=0x7f1ae2ab0068, level=0, pool=0x7f1ae2ab0268, ptr=0x7f1ae2a88000, size=4096
[PID:4827   TID:4827   ERROR UMF] check_if_tracker_is_empty: [0] ptr = 0x7f1ae1bff000, pool = 0x7f1ae2ab0268, size = 2097152
[PID:4827   TID:4827   ERROR UMF] check_if_tracker_is_empty: tracking provider of pool 0x7f1ae2ab0268 is not empty! (1 items left)
test_jemalloc_pool: /home/ldorau/work/unified-memory-framework/src/provider/provider_tracking.c:827: check_if_tracker_is_empty: Assertion `n_items == 0 && "tracking provider is not empty!"' failed.
Aborted

@bratpiorka
Copy link
Contributor

@vinser52 @bratpiorka This test will not work because of an issue in jemalloc - jemalloc does not free the allocated memory extents on which the arena_extent_split() failed:

Jemalloc splits allocated memory extents - it calls arena_extent_split() that calls umfMemoryProviderAllocationSplit() but BA_GLOBAL_PROVIDER does not support split() op and arena_extent_split() always fails. Then jemalloc allocates a new smaller memory extent, but it never frees the first extent on which the split() failed and - as a result - the assert in the tracker is always triggered, because the tracker is not empty

Thanks for looking at this. I think that there would be no problem with adding support for split and merge for BA but the question is if we want to do this because of these tests

@ldorau
Copy link
Contributor

ldorau commented Mar 14, 2025

I think that there would be no problem with adding support for split and merge for BA but the question is if we want to do this because of these tests

@vinser52 @bratpiorka
In my opinion adding support for split() and merge() to BA_GLOBAL_PROVIDER would require too much effort now and it is not worth doing that just for enabling only one test.

In my opinion this PR can be closed for now.

Copy link
Contributor

@ldorau ldorau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will not work because of an issue in jemalloc - see: #902 (comment)

@vinser52
Copy link
Contributor Author

I think that there would be no problem with adding support for split and merge for BA but the question is if we want to do this because of these tests

@vinser52 @bratpiorka In my opinion adding support for split() and merge() to BA_GLOBAL_PROVIDER would require too much effort now and it is not worth doing that just for enabling only one test.

In my opinion this PR can be closed for now.

I think we should not waste time on adding split/merge support to BA allocator.

did I get correctly that there is a bug in jemalloc? It should support scenario when split/merge fails but in fact, there is a memory leak, right? Should we create an issue in their GitHub?

@ldorau
Copy link
Contributor

ldorau commented Mar 14, 2025

I think that there would be no problem with adding support for split and merge for BA but the question is if we want to do this because of these tests

@vinser52 @bratpiorka In my opinion adding support for split() and merge() to BA_GLOBAL_PROVIDER would require too much effort now and it is not worth doing that just for enabling only one test.
In my opinion this PR can be closed for now.

I think we should not waste time on adding split/merge support to BA allocator.

did I get correctly that there is a bug in jemalloc? It should support scenario when split/merge fails but in fact, there is a memory leak, right? Should we create an issue in their GitHub?

Right. I have submitted the issue in jemalloc: jemalloc/jemalloc#2815

@vinser52 Can we close this PR?

@vinser52 vinser52 closed this Mar 14, 2025
@ldorau
Copy link
Contributor

ldorau commented Mar 14, 2025

@vinser52 I have submitted #1193 instead of this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants