-
Notifications
You must be signed in to change notification settings - Fork 42
Description
I'm trying to trace an OpenMP application on Mare Nostrum 5 - GPP. I'm using the Extrae 4.3.3 environment module to trace it. While doing so, I noticed the traces that use more than 1 OpenMP thread show computational bursts on the first thread of impossibly high IPCs(1000-10000). As a result, the performance metrics I get for Instruction scalability, Frequency scalability and IPC scalability are incorrect as follows:
Overview of the Speedup, IPC and Frequency:
-------------------------------------------------
| 1[1] | 2[2]
-------------------------------------------------
Elapsed time (sec) | 81.48 | 47.03
Efficiency | 1.00 | 0.87
Speedup | 1.00 | 1.73
Average IPC | 1.96 | 15.64
Average frequency (GHz) | 2.85 | 308.13
-------------------------------------------------
======== Output File: Other Metrics ========
Speedup, IPC, Frequency, I/O and Flushing written to /home/stefan/Downloads/basicanalysis-0.3.9/other_metrics.csv
Overview of the Efficiency metrics:
===========================================================
Trace mode | OpenMP | OpenMP
Processes [Trace Order] | 1[1] | 2[2]
===========================================================
Global efficiency | 99.70% | 86.37%
-- Parallel efficiency | 99.70% | 78.46%
-- Load balance | 100.00% | 89.41%
-- Communication efficiency | 99.70% | 87.76%
-- Serialization efficiency | Non-Avail | Non-Avail
-- Transfer efficiency | Non-Avail | Non-Avail
-- Computation scalability | 100.00% | 110.08%
-- IPC scalability | 100.00% | 799.07%
-- Instruction scalability | 100.00% | 0.13%
-- Frequency scalability | 100.00% | 10802.30%
===========================================================
The parallel efficiency numbers as well as overall Computational Scalability seem to be correct, but of course the Frequency and IPC values are implausible.
This doesn't seem to occur when tracing a separate use case of the same application that uses a different call tree.
I've tried tracing with a few test cases that all seem to produce correct results. I think the issue might have to do with the OpenMP implementation in the code, but I'm not sure what to look out for. Moreover the basic analysis script was run for the whole trace with no cutting or filtering. I'm hoping this behaviour may have been observed before and a fix exists.
My extrae.xml configuration is as follows:
<?xml version='1.0'?>
<trace enabled="yes"
home="@sub_PREFIXDIR@"
initial-mode="detail"
type="paraver"
>
<openmp enabled="yes" ompt="no">
<locks enabled="no" />
<taskloop enabled="no" />
<counters enabled="yes" />
</openmp>
<pthread enabled="no">
<locks enabled="no" />
<counters enabled="yes" />
</pthread>
<counters enabled="yes">
<cpu enabled="yes" starting-set-distribution="1">
<set enabled="yes" domain="all" changeat-time="0">
PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_L2_DCM
</set>
<set enabled="yes" domain="all" changeat-time="0">
PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_VEC_SP,PAPI_SR_INS,PAPI_LD_INS,PAPI_FP_INS
<sampling enabled="no" period="1000000000">PAPI_TOT_CYC</sampling>
</set>
</cpu>
<network enabled="no" />
<resource-usage enabled="no" />
<memory-usage enabled="no" />
</counters>
<callers enabled="no">
<mpi enabled="no">1-3</mpi>
<sampling enabled="yes">1-5</sampling>
<dynamic-memory enabled="no">1-3</dynamic-memory>
<input-output enabled="yes">1-3</input-output>
<syscall enabled="yes">1-3</syscall>
</callers>
<storage enabled="no">
<trace-prefix enabled="yes">TRACE</trace-prefix>
<size enabled="no">5</size>
<temporal-directory enabled="yes">/scratch</temporal-directory>
<final-directory enabled="yes">/gpfs/scratch/bsc41/bsc41273</final-directory>
</storage>
<buffer enabled="yes">
<size enabled="yes">5000000</size>
<circular enabled="no" />
</buffer>
<trace-control enabled="yes">
<file enabled="no" frequency="5M">/gpfs/scratch/bsc41/bsc41273/control</file>
<global-ops enabled="no"></global-ops>
</trace-control>
<others enabled="yes">
<minimum-time enabled="no">10M</minimum-time>
<finalize-on-signal enabled="yes"
SIGUSR1="no" SIGUSR2="no" SIGINT="yes"
SIGQUIT="yes" SIGTERM="yes" SIGXCPU="yes"
SIGFPE="yes" SIGSEGV="yes" SIGABRT="yes"
/>
<flush-sampling-buffer-at-instrumentation-point enabled="yes" />
</others>
<sampling enabled="no" type="virtual" period="50m" variability="10m" />
<dynamic-memory enabled="no" />
<input-output enabled="yes" internals="no"/>
<syscall enabled="no" />
<merge enabled="yes"
synchronization="default"
tree-fan-out="16"
max-memory="512"
joint-states="yes"
keep-mpits="yes"
translate-addresses="yes"
sort-addresses="yes"
translate-data-addresses="yes"
overwrite="yes"
/>
</trace>
I've attached both full traces here:
traces.tar.gz
Thank you in advance for your help.