Skip to content

Commit 6a85c1d

Browse files
zma2mschilling0
andauthored
Summarizing device timing regardless of kernel shapes by default (#37)
* Initial version of unitrace * Initial commits of unitrace * Initial commits of unitrace * Initial commits of unitrace * Initial commits of unitrace * Initial commits of unitrace * Initial commits of unitrace * Initial commits of unitrace * Initial commits of unitrace * Initial commits of unitrace * Initial commits of unitrace * Initial commits of unitrace * Unhide Symbols Required By XPTI * Initial commits of unitrace * Initial commits of unitrace * Summarizing device timing regardless of kernel shapes by default * Summarizing device timing with out kernel shapes by default * Summarizing device timing with out kernel shapes by default * Summarizing device timing with out kernel shapes by default --------- Co-authored-by: Schilling, Matthew <[email protected]>
1 parent 93f66e7 commit 6a85c1d

File tree

8 files changed

+46
-67
lines changed

8 files changed

+46
-67
lines changed

tools/unitrace/README.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,18 +131,26 @@ To trace/profile device and kernel activities, one can use one or more of the fo
131131

132132
The **--device-timing [-d]** option outputs a timing summary of kernels and commands executed on the device:
133133

134-
![Device Timing!](/tools/unitrace/doc/images/device-timing.png)
134+
135+
![Device Timing With No Shape!](/tools/unitrace/doc/images/device-timing-with-no-shape.png)
135136

136137
In addition, it also outputs kernel information that helps to identify kernel performance issues that relate to occupancy caused by shared local memory usage and register spilling.
137138

138-
![Kernel Info!](/tools/unitrace/doc/images/kernel-info.png)
139+
![Kernel Info With No Shape!](/tools/unitrace/doc/images/kernel-info-with-no-shape.png)
139140

140141
Here, the **"SLM Per Work Group"** shows the amount of shared local memory needed for each work group in bytes. This size can potentially affect occupancy.
141142

142143
The **"Private Memory Per Thread"** is the private memory allocated for each thread in bytes. A non-zero value indicates that one or more thread private variables are not in registers.
143144

144145
The **"Spill Memory Per Thread"** is the memory used for register spilled for each thread in bytes. A non-zero value indicates that one or more thread private variables are allocated in registers but are later spilled to memory.
145146

147+
By default, the kernel timing is summarized regardless of shapes. In case the kernel has different shapes, using **-v** along with **-d** is strongly recommended:
148+
149+
![Device Timing!](/tools/unitrace/doc/images/device-timing.png)
150+
151+
![Kernel Info!](/tools/unitrace/doc/images/kernel-info.png)
152+
153+
146154
The **--kernel-submission [-s]** option outputs a time summary of kernels spent in queuing, submission and execution:
147155
![Kernel Submissions!](/tools/unitrace/doc/images/kernel-submissions.png)
148156

35 KB
Loading
22.4 KB
Loading

tools/unitrace/src/chromelogger.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ static uint32_t mpi_rank = std::atoi(rank.c_str());
3939
static std::string process_start_time = std::to_string(UniTimer::GetEpochTimeInUs(UniTimer::GetHostTimestamp()));
4040
static std::string pmi_hostname = GetHostName();
4141

42-
std::string GetZeKernelCommandName(uint64_t id, ze_group_count_t& group_count, size_t size);
42+
std::string GetZeKernelCommandName(uint64_t id, ze_group_count_t& group_count, size_t size, bool detailed);
4343
ze_pci_ext_properties_t *GetZeDevicePciPropertiesAndId(ze_device_handle_t device, int32_t *parent_device_id, int32_t *device_id, int32_t *subdevice_id);
4444

4545
static Logger* logger_ = nullptr;

tools/unitrace/src/levelzero/ze_collector.h

Lines changed: 26 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -942,41 +942,43 @@ typedef void (*OnZeKernelFinishCallback)(uint64_t kid, uint64_t tid, uint64_t st
942942

943943
ze_result_t (*zexKernelGetBaseAddress)(ze_kernel_handle_t hKernel, uint64_t *baseAddress) = nullptr;
944944

945-
inline std::string GetZeKernelCommandName(uint64_t id, const ze_group_count_t& group_count, size_t size) {
945+
inline std::string GetZeKernelCommandName(uint64_t id, const ze_group_count_t& group_count, size_t size, bool detailed = true) {
946946
std::stringstream s;
947947
kernel_command_properties_mutex_.lock_shared();
948948
auto it = kernel_command_properties_->find(id);
949949
if (it != kernel_command_properties_->end()) {
950950
s << utils::Demangle(it->second.name_.c_str());
951-
if (it->second.type_ == KERNEL_COMMAND_TYPE_COMPUTE) {
952-
if (it->second.simd_width_ > 0) {
953-
s << "[SIMD";
954-
if (it->second.simd_width_ == 1) {
955-
s << "_ANY";
956-
} else {
957-
s << it->second.simd_width_;
951+
if (detailed) {
952+
if (it->second.type_ == KERNEL_COMMAND_TYPE_COMPUTE) {
953+
if (it->second.simd_width_ > 0) {
954+
s << "[SIMD";
955+
if (it->second.simd_width_ == 1) {
956+
s << "_ANY";
957+
} else {
958+
s << it->second.simd_width_;
959+
}
958960
}
961+
s << " {" <<
962+
group_count.groupCountX << "; " <<
963+
group_count.groupCountY << "; " <<
964+
group_count.groupCountZ << "} {" <<
965+
it->second.group_size_.x << "; " <<
966+
it->second.group_size_.y << "; " <<
967+
it->second.group_size_.z << "}]";
968+
}
969+
else if ((it->second.type_ == KERNEL_COMMAND_TYPE_MEMORY) && (size > 0)) {
970+
s << "[" << size << "]";
959971
}
960-
s << " {" <<
961-
group_count.groupCountX << "; " <<
962-
group_count.groupCountY << "; " <<
963-
group_count.groupCountZ << "} {" <<
964-
it->second.group_size_.x << "; " <<
965-
it->second.group_size_.y << "; " <<
966-
it->second.group_size_.z << "}]";
967-
}
968-
else if ((it->second.type_ == KERNEL_COMMAND_TYPE_MEMORY) && (size > 0)) {
969-
s << "[" << size << "]";
970972
}
971973
}
972974

973975
kernel_command_properties_mutex_.unlock_shared();
974976
return s.str();
975977
}
976978

977-
inline std::string GetZeKernelCommandName(uint64_t id, ze_group_count_t& group_count, size_t size) {
979+
inline std::string GetZeKernelCommandName(uint64_t id, ze_group_count_t& group_count, size_t size, bool detailed = true) {
978980
const ze_group_count_t& gcount = group_count;
979-
return GetZeKernelCommandName(id, gcount, size);
981+
return GetZeKernelCommandName(id, gcount, size, detailed);
980982
}
981983

982984
inline ze_pci_ext_properties_t *GetZeDevicePciPropertiesAndId(ze_device_handle_t device, int32_t *parent_device_id, int32_t *device_id, int32_t *subdevice_id){
@@ -1115,10 +1117,10 @@ class ZeCollector {
11151117
total_time += it.second.execute_time_;
11161118
std::string kname;
11171119
if (it.first.tile_ >= 0) {
1118-
kname = "Tile #" + std::to_string(it.first.tile_) + ": " + GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_);
1120+
kname = "Tile #" + std::to_string(it.first.tile_) + ": " + GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_, options_.verbose);
11191121
}
11201122
else {
1121-
kname = GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_);
1123+
kname = GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_, options_.verbose);
11221124
}
11231125
if (kname.size() > max_name_size) {
11241126
max_name_size = kname.size();
@@ -1204,10 +1206,10 @@ class ZeCollector {
12041206
total_submit_time += it.second.submit_time_;
12051207
std::string kname;
12061208
if (it.first.tile_ >= 0) {
1207-
kname = "Tile #" + std::to_string(it.first.tile_) + ": " + GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_);
1209+
kname = "Tile #" + std::to_string(it.first.tile_) + ": " + GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_, options_.verbose);
12081210
}
12091211
else {
1210-
kname = GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_);
1212+
kname = GetZeKernelCommandName(it.first.kernel_command_id_, it.first.group_count_, it.first.mem_size_, options_.verbose);
12111213
}
12121214
if (kname.size() > max_name_size) {
12131215
max_name_size = kname.size();
@@ -1615,37 +1617,6 @@ class ZeCollector {
16151617

16161618
sub_desc.driver_ = driver;
16171619
sub_desc.context_ = context;
1618-
#if 0
1619-
if (options_.metric_query) {
1620-
zet_metric_group_handle_t group = nullptr;
1621-
uint32_t num_groups = 0;
1622-
status = zetMetricGroupGet(sub_devices[j], &num_groups, nullptr);
1623-
PTI_ASSERT(status == ZE_RESULT_SUCCESS);
1624-
if (num_groups > 0) {
1625-
std::vector<zet_metric_group_handle_t> groups(num_groups, nullptr);
1626-
status = zetMetricGroupGet(sub_devices[j], &num_groups, groups.data());
1627-
PTI_ASSERT(status == ZE_RESULT_SUCCESS);
1628-
1629-
for (uint32_t k = 0; k < num_groups; ++k) {
1630-
zet_metric_group_properties_t group_props{};
1631-
group_props.stype = ZET_STRUCTURE_TYPE_METRIC_GROUP_PROPERTIES;
1632-
status = zetMetricGroupGetProperties(groups[k], &group_props);
1633-
PTI_ASSERT(status == ZE_RESULT_SUCCESS);
1634-
1635-
1636-
if ((strcmp(group_props.name, utils::GetEnv("UNITRACE_MetricGroup").c_str()) == 0) && (group_props.samplingType & ZET_METRIC_GROUP_SAMPLING_TYPE_FLAG_EVENT_BASED)) {
1637-
group = groups[k];
1638-
break;
1639-
}
1640-
}
1641-
}
1642-
status = zetContextActivateMetricGroups(context, sub_devices[j], 1, &group);
1643-
PTI_ASSERT(status == ZE_RESULT_SUCCESS);
1644-
metric_activations_.insert({context, sub_devices[j]});
1645-
1646-
sub_desc.metric_group_ = group;
1647-
}
1648-
#endif /* 0 */
16491620

16501621
sub_desc.metric_group_ = nullptr;
16511622

tools/unitrace/src/tracer.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ static TraceOptions ReadArgs() {
155155
}
156156

157157
std::string get_version() {
158-
return std::string(VERSION) + " ("+ std::string(COMMIT_HASH) + ")";
158+
return std::string(UNITRACE_VERSION) + " ("+ std::string(COMMIT_HASH) + ")";
159159
}
160160

161161
void __attribute__((constructor)) Init(void) {
@@ -168,7 +168,7 @@ void __attribute__((constructor)) Init(void) {
168168
if (unitrace_version.size() > 0) {
169169
auto libunitrace_version = get_version();
170170
if (unitrace_version.compare(libunitrace_version) != 0) {
171-
std::cerr << "[ERROR] Versions of Unitrace and libUnitrace_tool.so do not match." << std::endl;
171+
std::cerr << "[ERROR] Versions of unitrace and libunitrace_tool.so do not match." << std::endl;
172172
exit(-1);
173173
}
174174
}

tools/unitrace/src/unitrace.cc

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -119,8 +119,8 @@ void Usage(char * progname) {
119119
std::endl;
120120
std::cout <<
121121
"--verbose [-v] " <<
122-
"Enable verbose mode to show more kernel information. For OpenCL backend only." << std::endl <<
123-
" Verbose is always enabled for Level Zero backend" <<
122+
"Enable verbose mode to show kernel shapes" << std::endl <<
123+
" Kernel shapes are always enabled in timelines for Level Zero backend" <<
124124
std::endl;
125125
std::cout <<
126126
"--demangle " <<
@@ -384,7 +384,7 @@ int ParseArgs(int argc, char* argv[]) {
384384
show_metric_list = true;
385385
++app_index;
386386
} else if (strcmp(argv[i], "--version") == 0) {
387-
std::cout << VERSION << " (" << COMMIT_HASH << ")" << std::endl;
387+
std::cout << UNITRACE_VERSION << " (" << COMMIT_HASH << ")" << std::endl;
388388
return 0;
389389
} else {
390390
break;
@@ -555,7 +555,7 @@ int main(int argc, char *argv[]) {
555555
#endif
556556

557557
// Set unitrace version
558-
auto unitrace_version = std::string(VERSION) + " (" + std::string(COMMIT_HASH) + ")";
558+
auto unitrace_version = std::string(UNITRACE_VERSION) + " (" + std::string(COMMIT_HASH) + ")";
559559
utils::SetEnv("UNITRACE_VERSION", unitrace_version.c_str());
560560

561561
SetProfilingEnvironment();
@@ -589,11 +589,11 @@ int main(int argc, char *argv[]) {
589589
if (utils::GetEnv("UNITRACE_ChromeMpiLogging") == "1") {
590590
preload = preload + ":" + mpi_interceptor_path;
591591
// For tracing MPI calls from oneCCL, we need to set CCL_MPI_LIBRARY_PATH
592-
// with Unitrace's MPI intercepter path, because oneCCL directly picks up
592+
// with unitrace's MPI intercepter path, because oneCCL directly picks up
593593
// MPI functions with dlopen/dlsym, not through the dynamic linker. Thus,
594594
// LD_PRELOAD would not work.
595595
// TODO: We have to consider a case where CCL_MPI_LIBRARY_PATH is already
596-
// set. Unitrace will need to call the MPIs in the specified libs
596+
// set. In this case, unitrace needs to call MPIs in the specified libs
597597
// before/after ITT annotation.
598598
utils::SetEnv("CCL_MPI_LIBRARY_PATH", mpi_interceptor_path.c_str());
599599
}

tools/unitrace/src/version.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#ifndef PTI_TOOLS_UNITRACE_VERSION_H_
22
#define PTI_TOOLS_UNITRACE_VERSION_H_
33

4-
#define VERSION "2.0.0"
4+
#define UNITRACE_VERSION "2.0.1"
55

66
std::string get_version();
77

0 commit comments

Comments
 (0)