Skip to content

Implausible performance counter data in traces #138

@stefandesouza

Description

@stefandesouza

I'm trying to trace an OpenMP application on Mare Nostrum 5 - GPP. I'm using the Extrae 4.3.3 environment module to trace it. While doing so, I noticed the traces that use more than 1 OpenMP thread show computational bursts on the first thread of impossibly high IPCs(1000-10000). As a result, the performance metrics I get for Instruction scalability, Frequency scalability and IPC scalability are incorrect as follows:

Overview of the Speedup, IPC and Frequency:
-------------------------------------------------
                        |       1[1] |       2[2]
-------------------------------------------------
Elapsed time (sec)      |      81.48 |      47.03
Efficiency              |       1.00 |       0.87
Speedup                 |       1.00 |       1.73
Average IPC             |       1.96 |      15.64
Average frequency (GHz) |       2.85 |     308.13
-------------------------------------------------


======== Output File: Other Metrics ========
Speedup, IPC, Frequency, I/O and Flushing written to /home/stefan/Downloads/basicanalysis-0.3.9/other_metrics.csv

Overview of the Efficiency metrics:
===========================================================
                       Trace mode |     OpenMP |     OpenMP
          Processes [Trace Order] |       1[1] |       2[2]
===========================================================
Global efficiency                 |     99.70% |     86.37%
-- Parallel efficiency            |     99.70% |     78.46%
   -- Load balance                |    100.00% |     89.41%
   -- Communication efficiency    |     99.70% |     87.76%
      -- Serialization efficiency |  Non-Avail |  Non-Avail
      -- Transfer efficiency      |  Non-Avail |  Non-Avail
-- Computation scalability        |    100.00% |    110.08%
   -- IPC scalability             |    100.00% |    799.07%
   -- Instruction scalability     |    100.00% |      0.13%
   -- Frequency scalability       |    100.00% |  10802.30%
===========================================================

The parallel efficiency numbers as well as overall Computational Scalability seem to be correct, but of course the Frequency and IPC values are implausible.
This doesn't seem to occur when tracing a separate use case of the same application that uses a different call tree.
I've tried tracing with a few test cases that all seem to produce correct results. I think the issue might have to do with the OpenMP implementation in the code, but I'm not sure what to look out for. Moreover the basic analysis script was run for the whole trace with no cutting or filtering. I'm hoping this behaviour may have been observed before and a fix exists.

My extrae.xml configuration is as follows:

<?xml version='1.0'?>

<trace enabled="yes"
 home="@sub_PREFIXDIR@"
 initial-mode="detail"
 type="paraver"
>

  <openmp enabled="yes" ompt="no">
    <locks enabled="no" />
    <taskloop enabled="no" />
    <counters enabled="yes" />
  </openmp>

  <pthread enabled="no">
    <locks enabled="no" />
    <counters enabled="yes" />
  </pthread>

  <counters enabled="yes">
    <cpu enabled="yes" starting-set-distribution="1">
      <set enabled="yes" domain="all" changeat-time="0">
        PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_L2_DCM
      </set>
      <set enabled="yes" domain="all" changeat-time="0">
        PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_VEC_SP,PAPI_SR_INS,PAPI_LD_INS,PAPI_FP_INS
        <sampling enabled="no" period="1000000000">PAPI_TOT_CYC</sampling>
      </set>
    </cpu>

    <network enabled="no" />

    <resource-usage enabled="no" />

    <memory-usage enabled="no" />
  </counters>

  <callers enabled="no">
    <mpi enabled="no">1-3</mpi>
    <sampling enabled="yes">1-5</sampling>
    <dynamic-memory enabled="no">1-3</dynamic-memory>
    <input-output enabled="yes">1-3</input-output>
    <syscall enabled="yes">1-3</syscall>
  </callers>
  <storage enabled="no">
    <trace-prefix enabled="yes">TRACE</trace-prefix>
    <size enabled="no">5</size>
    <temporal-directory enabled="yes">/scratch</temporal-directory>
    <final-directory enabled="yes">/gpfs/scratch/bsc41/bsc41273</final-directory>
  </storage>

  <buffer enabled="yes">
    <size enabled="yes">5000000</size>
    <circular enabled="no" />
  </buffer>

  <trace-control enabled="yes">
    <file enabled="no" frequency="5M">/gpfs/scratch/bsc41/bsc41273/control</file>
    <global-ops enabled="no"></global-ops>
  </trace-control>

  <others enabled="yes">
    <minimum-time enabled="no">10M</minimum-time>
    <finalize-on-signal enabled="yes" 
      SIGUSR1="no" SIGUSR2="no" SIGINT="yes"
      SIGQUIT="yes" SIGTERM="yes" SIGXCPU="yes"
      SIGFPE="yes" SIGSEGV="yes" SIGABRT="yes"
    />
    <flush-sampling-buffer-at-instrumentation-point enabled="yes" />
  </others>

  <sampling enabled="no" type="virtual" period="50m" variability="10m" />

  <dynamic-memory enabled="no" />

  <input-output enabled="yes" internals="no"/>

  <syscall enabled="no" />
  <merge enabled="yes" 
    synchronization="default"
    tree-fan-out="16"
    max-memory="512"
    joint-states="yes"
    keep-mpits="yes"
    translate-addresses="yes"
    sort-addresses="yes"
    translate-data-addresses="yes"
    overwrite="yes"
  />

</trace>

I've attached both full traces here:
traces.tar.gz

Thank you in advance for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions