Skip to content

Commit c0f0f5a

Browse files
SWDEV-492623: Hip Host Function to Device Symbols Mapping (#18)
* Adding changes to register and read symbols from the hip fat binary * adding json output for host_functions * added error handling * adding json tool support * Adding tests * formatting changes * Adding documentation * refactoring as per amd-staging * Adding intializers and changing macros * Fix page-migration background thread on fork (#31) * Fix page-migration background thread on fork After falling off main in the forked child, all the children try to join on on the parent's monitoring thread. This results in a deadlock. Parent is waiting for the child to exit, but the child is trying to join the parent's thread which is signaled from the parent's static destructors. Even with just one parent and child, due to copy-on-write semantics, a child signalling the background thread to join will still block (thread's updated state is not visible in the child). This fix creates background treads on fork per-child with a pthread_atfork handler, ensuring that each child has its own monitoring thread. * Formatting fixes * Detach page-migration background thread and update test timeout * Attach files with ctest * Update corr-id assert * Tweak on-fork, simplify background thread * Revert thread detach * Adding --collection-period feature in rocprofv3 to match v1/v2 parity (#9) * Adding Trace Period feature to rocprofv3 * Adding feature documentation * Update source/bin/rocprofv3.py Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Fixing format * Moving to Collection Period and changing the input params * Format Fixes * Fixing rebasing issues * Removing atomic include from the tool * Adding more options for units, optimizing the code * Fixing rocprofv3.py * Fixing time conv & adding time controlled app * Fixing format * Changing to shared memory testing methodology * use of shmem use * Fix include headers for transpose-time-controlled.cpp * Format upload-image-to-github.py * Removing shmem and using only env var to dump timestamps from the tool * Tool Fixes + Test Config * Adding Tests * Fixing Review comments * Update trace period implementation * Update trace period tests * check between start and stop timestamps * Merge Fix * Update validate.py * Improve safety of rocprofiler_stop_context after finalization * Pass context id to collection_period_cntrl by value * Adding 20 us error margin * Ensure log level for collection-period test is not more than warning --------- Co-authored-by: Ammar ELWazir <aelwazir@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - move error code check macros to implementation - fix macros which check error code - use constexpr values instead of #define * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - debugging for error that cannot be locally reproduced * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - improve error handling and logging * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - tweak to non-fatal logging messages * Update lib/rocprofiler-sdk/code_object/hip/code_object.* - cleanup of logging messages * Update host kernel symbol register data fields * Update source/lib/rocprofiler-sdk/code_object/hip/code_object.hpp --------- Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com> Co-authored-by: Kuricheti, Mythreya <Mythreya.Kuricheti@amd.com> Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com> Co-authored-by: Ammar ELWazir <aelwazir@amd.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> [ROCm/rocprofiler-sdk commit: 78d8f4b]
1 parent 4a9ce65 commit c0f0f5a

File tree

25 files changed

+899
-17
lines changed

25 files changed

+899
-17
lines changed

source/docs/rocprofv3-schema.json

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -725,6 +725,97 @@
725725
]
726726
}
727727
},
728+
"host_functions": {
729+
"type": "array",
730+
"description": "Host function records.",
731+
"items": {
732+
"type": "object",
733+
"properties": {
734+
"size": {
735+
"type": "integer",
736+
"description": "Size of the host function record."
737+
},
738+
"host_function_id": {
739+
"type": "integer",
740+
"description": "ID of the HIP host function."
741+
},
742+
"kernel_id": {
743+
"type": "integer",
744+
"description": "ID of the corressponding kernel."
745+
},
746+
"code_object_id": {
747+
"type": "integer",
748+
"description": "ID of the corressponding code object."
749+
},
750+
"host_function_address": {
751+
"type": "integer",
752+
"description": "Address of the host function registered."
753+
},
754+
"modules_address": {
755+
"type": "integer",
756+
"description": "Address of the hip fat binary registered"
757+
},
758+
"device_function": {
759+
"type": "string",
760+
"description": "Device function name registered"
761+
},
762+
"thread_limit": {
763+
"type": "integer",
764+
"description": "Size of the thread limit."
765+
},
766+
"thread_id_address": {
767+
"type": "integer",
768+
"description": "Address of thread Id."
769+
},
770+
"block_id_address": {
771+
"type": "string",
772+
"description": "Address of Block Id."
773+
},
774+
"block_dim_address": {
775+
"type": "string",
776+
"description": "Address of Block Dimension"
777+
},
778+
"grid_dim_address": {
779+
"type": "string",
780+
"description": "Address of the Grid Dimension."
781+
},
782+
"workgroup_size_address": {
783+
"type": "string",
784+
"description": "Address of the work group size."
785+
},
786+
"formatted_host_function_name": {
787+
"type": "string",
788+
"description": "Formatted name of the device function."
789+
},
790+
"demangled_host_function_name": {
791+
"type": "string",
792+
"description": "Demangled name of device function."
793+
},
794+
"truncated_host_function_name": {
795+
"type": "string",
796+
"description": "Truncated name of device function."
797+
}
798+
},
799+
"required": [
800+
"size",
801+
"host_function_id",
802+
"kernel_id",
803+
"code_object_id",
804+
"host_function_address",
805+
"modules_address",
806+
"device_function",
807+
"thread_limit",
808+
"thread_id_address",
809+
"block_id_address",
810+
"block_dim_address",
811+
"grid_dim_address",
812+
"workgroup_size_address",
813+
"formatted_host_function_name",
814+
"demangled_host_function_name",
815+
"truncated_host_function_name"
816+
]
817+
}
818+
},
728819
"callback_records": {
729820
"type": "object",
730821
"description": "Callback record details.",

source/include/rocprofiler-sdk/callback_tracing.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,27 @@ typedef struct
181181
uint32_t accum_vgpr_count; ///< Accum vector general purpose register count
182182

183183
} rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t;
184+
// rename struct
185+
186+
typedef struct
187+
{
188+
uint64_t size; ///< size of this struct
189+
uint64_t host_function_id; ///< unique host function identifier value
190+
uint64_t kernel_id; ///< unique symbol identifier value
191+
uint64_t code_object_id; ///< parent unique code object identifier
192+
rocprofiler_address_t host_function; ///< kernel host function pointer
193+
rocprofiler_address_t modules; ///< reference address where modules will be loaded
194+
const char* device_function;
195+
uint32_t thread_limit; ///< thread limit
196+
rocprofiler_dim3_t thread_ids; ///< thread ids address
197+
rocprofiler_dim3_t block_ids; ///< block ids address
198+
rocprofiler_dim3_t block_dims; ///< block dimensions address
199+
rocprofiler_dim3_t grid_dims; ///< grid dimensions address
200+
uint64_t workgroup_size; ///< workgroup size address
201+
202+
/// @var device_function
203+
/// @brief device function name used to map the metadata during kernel launch
204+
} rocprofiler_callback_tracing_code_object_host_kernel_symbol_register_data_t;
184205

185206
/**
186207
* @brief ROCProfiler Kernel Dispatch Callback Tracer Record.

source/include/rocprofiler-sdk/cxx/serialization.hpp

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,13 @@ save(ArchiveT& ar, rocprofiler_dim3_t data)
139139
ROCP_SDK_SAVE_DATA_FIELD(z);
140140
}
141141

142+
template <typename ArchiveT>
143+
void
144+
save(ArchiveT& ar, rocprofiler_address_t data)
145+
{
146+
ROCP_SDK_SAVE_DATA_FIELD(value);
147+
}
148+
142149
template <typename ArchiveT>
143150
void
144151
save(ArchiveT& ar, rocprofiler_callback_tracing_code_object_load_data_t data)
@@ -181,6 +188,25 @@ save(ArchiveT& ar, rocprofiler_callback_tracing_code_object_kernel_symbol_regist
181188
ROCP_SDK_SAVE_DATA_FIELD(accum_vgpr_count);
182189
}
183190

191+
template <typename ArchiveT>
192+
void
193+
save(ArchiveT& ar, rocprofiler_callback_tracing_code_object_host_kernel_symbol_register_data_t data)
194+
{
195+
ROCP_SDK_SAVE_DATA_FIELD(size);
196+
ROCP_SDK_SAVE_DATA_FIELD(host_function_id);
197+
ROCP_SDK_SAVE_DATA_FIELD(kernel_id);
198+
ROCP_SDK_SAVE_DATA_FIELD(code_object_id);
199+
ROCP_SDK_SAVE_DATA_FIELD(host_function);
200+
ROCP_SDK_SAVE_DATA_FIELD(modules);
201+
ROCP_SDK_SAVE_DATA_CSTR(device_function);
202+
ROCP_SDK_SAVE_DATA_FIELD(thread_limit);
203+
ROCP_SDK_SAVE_DATA_FIELD(thread_ids);
204+
ROCP_SDK_SAVE_DATA_FIELD(block_ids);
205+
ROCP_SDK_SAVE_DATA_FIELD(block_dims);
206+
ROCP_SDK_SAVE_DATA_FIELD(grid_dims);
207+
ROCP_SDK_SAVE_DATA_FIELD(workgroup_size);
208+
}
209+
184210
template <typename ArchiveT>
185211
void
186212
save(ArchiveT& ar, rocprofiler_hsa_api_retval_t data)

source/include/rocprofiler-sdk/cxx/utility.hpp

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -34,21 +34,12 @@ namespace utility
3434
{
3535
template <typename Tp>
3636
auto
37-
_as_hex(Tp val, size_t width = 0)
37+
as_hex(Tp val, size_t width = 0)
3838
{
3939
auto ss = std::stringstream{};
40-
ss << "0x" << std::hex << std::setw(width) << std::setfill('0') << val;
40+
ss << "0x" << std::hex << std::setfill('0') << std::setw(width) << val;
4141
return ss.str();
4242
}
43-
44-
#define ROCPROFILER_CXX_DEFINE_AS_HEX(TYPE) \
45-
inline auto as_hex(TYPE val, size_t width = 0) \
46-
{ \
47-
return ::rocprofiler::sdk::utility::_as_hex(val, width); \
48-
}
49-
50-
ROCPROFILER_CXX_DEFINE_AS_HEX(uint64_t)
51-
#undef ROCPROFILER_CXX_DEFINE_AS_HEX
5243
} // namespace utility
5344
} // namespace sdk
5445
} // namespace rocprofiler

source/include/rocprofiler-sdk/fwd.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,8 @@ typedef enum // NOLINT(performance-enum-size)
216216
{
217217
ROCPROFILER_CODE_OBJECT_NONE = 0, ///< Unknown code object operation
218218
ROCPROFILER_CODE_OBJECT_LOAD, ///< Code object containing kernel symbols
219-
ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER, ///< Kernel symbols
219+
ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER, ///< Kernel symbols - Device
220+
ROCPROFILER_CODE_OBJECT_HOST_KERNEL_SYMBOL_REGISTER, ///< Kernel symbols - Host
220221
ROCPROFILER_CODE_OBJECT_LAST,
221222
} rocprofiler_code_object_operation_t;
222223

source/lib/output/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ set(TOOL_OUTPUT_HEADERS
1919
generateStats.hpp
2020
generator.hpp
2121
kernel_symbol_info.hpp
22+
host_symbol_info.hpp
2223
metadata.hpp
2324
output_config.hpp
2425
output_key.hpp

source/lib/output/generateJSON.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,10 +164,12 @@ write_json(json_output& json_ar,
164164

165165
{
166166
auto kern_sym_data = tool_metadata.get_kernel_symbols();
167+
auto host_sym_data = tool_metadata.get_host_symbols();
167168
auto code_obj_data = tool_metadata.get_code_objects();
168169

169170
json_ar(cereal::make_nvp("code_objects", code_obj_data));
170171
json_ar(cereal::make_nvp("kernel_symbols", kern_sym_data));
172+
json_ar(cereal::make_nvp("host_functions", host_sym_data));
171173
}
172174
}
173175

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
// MIT License
2+
//
3+
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
4+
//
5+
// Permission is hereby granted, free of charge, to any person obtaining a copy
6+
// of this software and associated documentation files (the "Software"), to deal
7+
// in the Software without restriction, including without limitation the rights
8+
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
// copies of the Software, and to permit persons to whom the Software is
10+
// furnished to do so, subject to the following conditions:
11+
//
12+
// The above copyright notice and this permission notice shall be included in all
13+
// copies or substantial portions of the Software.
14+
//
15+
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
// SOFTWARE.
22+
23+
#pragma once
24+
25+
#include "lib/common/demangle.hpp"
26+
#include "lib/common/logging.hpp"
27+
28+
#include <rocprofiler-sdk/callback_tracing.h>
29+
#include <rocprofiler-sdk/fwd.h>
30+
#include <rocprofiler-sdk/cxx/hash.hpp>
31+
#include <rocprofiler-sdk/cxx/name_info.hpp>
32+
#include <rocprofiler-sdk/cxx/operators.hpp>
33+
#include <rocprofiler-sdk/cxx/serialization.hpp>
34+
35+
#include <cstdint>
36+
#include <string>
37+
#include <unordered_map>
38+
#include <vector>
39+
40+
namespace rocprofiler
41+
{
42+
namespace tool
43+
{
44+
using rocprofiler_host_kernel_symbol_data_t =
45+
rocprofiler_callback_tracing_code_object_host_kernel_symbol_register_data_t;
46+
47+
struct host_function_info : rocprofiler_host_kernel_symbol_data_t
48+
{
49+
using base_type = rocprofiler_host_kernel_symbol_data_t;
50+
51+
template <typename FuncT>
52+
host_function_info(const base_type& _base, FuncT&& _formatter)
53+
: base_type{_base}
54+
, formatted_host_function_name{_formatter(CHECK_NOTNULL(_base.device_function))}
55+
, demangled_host_function_name{common::cxx_demangle(CHECK_NOTNULL(_base.device_function))}
56+
, truncated_host_function_name{common::truncate_name(demangled_host_function_name)}
57+
{}
58+
59+
host_function_info();
60+
~host_function_info() = default;
61+
host_function_info(const host_function_info&) = default;
62+
host_function_info(host_function_info&&) noexcept = default;
63+
host_function_info& operator=(const host_function_info&) = default;
64+
host_function_info& operator=(host_function_info&&) noexcept = default;
65+
66+
std::string formatted_host_function_name = {};
67+
std::string demangled_host_function_name = {};
68+
std::string truncated_host_function_name = {};
69+
};
70+
71+
using host_function_data_vec_t = std::vector<host_function_info>;
72+
using host_function_info_map_t = std::unordered_map<uint64_t, host_function_info>;
73+
} // namespace tool
74+
} // namespace rocprofiler
75+
76+
namespace cereal
77+
{
78+
#define SAVE_DATA_FIELD(FIELD) ar(make_nvp(#FIELD, data.FIELD))
79+
80+
template <typename ArchiveT>
81+
void
82+
save(ArchiveT& ar, const ::rocprofiler::tool::host_function_info& data)
83+
{
84+
cereal::save(
85+
ar, static_cast<const ::rocprofiler::tool::rocprofiler_host_kernel_symbol_data_t&>(data));
86+
SAVE_DATA_FIELD(formatted_host_function_name);
87+
SAVE_DATA_FIELD(demangled_host_function_name);
88+
SAVE_DATA_FIELD(truncated_host_function_name);
89+
}
90+
91+
#undef SAVE_DATA_FIELD
92+
} // namespace cereal

0 commit comments

Comments
 (0)