Add initial MPI_T-like profiling support in Mercury by srini009 · Pull Request #350 · mercury-hpc/mercury

srini009 · 2020-03-04T23:35:52Z

Add an MPI_T-like profiling interface in Mercury.
Profiling interface is active by default.
Currently, only one PVAR is exported called "hg_pvar_hg_forward_count" that counts the number of times the HG_Forward call is invoked on this instance.

Refer to: https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node372.htm#Node372 for the corresponding MPI 3.1 PVAR interface specification that this implementation is largely based off of.

Notable differences with MPI (implementation, not interface spec):

PVAR sessions in Mercury are associated with the hg_class object to avoid the use of global variables.
Similarly, PVARs themselves are not global variables and aren't declared globally and used as such. Instead, each module interested in exporting PVARs must register these PVARs and refer to them by the use of handles (that internally fetch the address of the PVAR) within the local function where these PVARs are updated.
We used atomics for updating counter PVARs where MPI does not.

src/mercury_prof_interface.h

soumagne · 2020-03-11T05:09:41Z

src/mercury_prof_interface.h

+ */
+HG_PUBLIC hg_return_t 
+HG_Prof_init();
+


That should probably take an hg_class as a parameter. We usually try to avoid using globals.

See my previous comment, you should return an hg_prof_class_t * here

src/mercury_prof_interface.h

soumagne · 2020-03-11T05:24:28Z

src/mercury.c

+/* PVAR profiling support */
+#include "mercury_prof_pvar_impl.h"
+HG_PROF_PVAR_UINT_COUNTER_DECL_EXTERN(hg_pvar_hg_forward_count);
+


As a general comment I think it would be good to see if/how we can avoid declaring global variables.

carns · 2020-03-12T02:21:16Z

I would also suggest that the comments for this interface reference a URL (I'm not sure which one off hand) for the MPI_T interface with a brief comment along the lines of "The following interface is patterned off of the MPI_T interface, except that X, Y, Z", where X, Y, an Z are notable philosophical differences (like being oriented around HG classes rather than globals) if there are any.

That way we can lean on the MPI documentation to some degree and also help rationalize the API conventions, since there is some value to reusing terminology for this stuff.

src/mercury_prof_pvar_impl.h

srini009 · 2020-03-24T00:36:57Z

Hey guys @carns, @soumagne: Thank you for your useful comments. I generally agree with all the points that have been put forward. I shall address most if not all comments in the next 2 days. I hope that works!

srini009 · 2020-04-01T23:29:35Z

@soumagne, @carns: I have addressed all your comments. Kindly check if this is okay. I apologize for the delay, it took longer than I expected.

carns · 2020-04-06T21:20:06Z

@soumagne, @carns: I have addressed all your comments. Kindly check if this is okay. I apologize for the delay, it took longer than I expected.

Thanks @srini009 !

I just checked two things that I had commented on:

a) did you find a URL or some other reference to put in the comments for the general model the API is using? It might be there and I'm missing it.

b) there has got to be a better way to implement COUNTER_INC() than looping over an incr() right? Maybe do a get() and then a cas() instead (with a retry loop if needed)? Jerome may have another idea. I'm thinking of the case where the val could be relatively large, like if you are accumulating bytes transmitted or something. The current code might be slow, and also isn't technically atomic across the macro.

srini009 · 2020-04-06T21:50:04Z

Hi @carns,
As for (a), I have addressed the comment by modifying the commit description (see top). Oops, I think you probably meant the code! :) I shall add the same comments in the interface header file as well.

As for (b), indeed, I did give this a thought. As far as atomicity is concerned, hopefully, I'm not mistaken, but I think we don't need to enforce atomicity at the macro level since individual updates are atomic and addition is commutative. I don't think we shall be encountering race conditions here. But you're right, it is incredibly inefficient for large updates related to bytes transferred, etc. Perhaps I can take @soumagne's help here.

I am excited about this becoming a part of mercury. After that, it is a relatively quick step to introduce a bunch of new PVARs that are insightful.

Regards,

carns · 2020-04-07T13:53:04Z

Hi @carns,
As for (a), I have addressed the comment by modifying the commit description (see top). Oops, I think you probably meant the code! :) I shall add the same comments in the interface header file as well.

As for (b), indeed, I did give this a thought. As far as atomicity is concerned, hopefully, I'm not mistaken, but I think we don't need to enforce atomicity at the macro level since individual updates are atomic and addition is commutative. I don't think we shall be encountering race conditions here. But you're right, it is incredibly inefficient for large updates related to bytes transferred, etc. Perhaps I can take @soumagne's help here.

I am excited about this becoming a part of mercury. After that, it is a relatively quick step to introduce a bunch of new PVARs that are insightful.

Regards,

Ah! Yes, that link and explanation in the PR comment is perfect, just drop it in the code. My thinking is that in the future someone might look at the code and want to change things without realizing that it's intentionally matching some broader conventions.

…sed off of

carns · 2020-04-10T12:30:03Z

@soumagne what's the best what to do the performance counter macro? Do a get(), calculate new value, then cas() (in a loop to retry if the cas fails)?

soumagne

Thanks @srini009 for all the changes I added some more comments, suggestions for fixes.

soumagne · 2020-04-10T16:00:21Z

src/mercury_prof_interface.c

+    int hg_prof_is_initialized;     /* Is profiling initialized */
+    int num_pvars;          /* No of PVARs currently supported */
+    hg_prof_pvar_session_t session; /* Is profiling initialized on the session */
+};


Maybe I am missing something but you should have a struct hg_prof_class here, not hg_private_class.

soumagne · 2020-04-10T16:01:05Z

src/mercury_prof_interface.c

+hg_prof_set_is_initialized(struct hg_private_class * hg_private_class)
+{
+  hg_private_class->hg_prof_is_initialized = 1;
+}


Just a detail but instead of using 1 or 0, there are HG_TRUE and HG_FALSE macros

soumagne · 2020-04-10T16:02:27Z

src/mercury_prof_interface.c

+static int 
+hg_prof_get_session_is_initialized(struct hg_prof_pvar_session * session) {
+   return session->is_initialized;
+}


Same here, you could return an hg_bool_t instead.

soumagne · 2020-04-10T16:06:19Z

src/mercury_prof_interface.c

+  hg_prof_set_is_initialized(hg_private_class);
+  hg_private_class->num_pvars = NUM_PVARS;
+  hg_private_class->session = NULL;
+


I don't think you are allowed to do that here :) HG_Prof_init() should return a new hg_prof_class_t * so I would expect you to malloc a new struct hg_prof_class in that routine, fill it and return it.

soumagne · 2020-04-10T16:06:52Z

src/mercury_prof_interface.c

+/*---------------------------------------------------------------------------*/
+hg_return_t 
+HG_Prof_finalize(hg_class_t *hg_class) {
+


Here instead it should take an hg_prof_class_t *

soumagne · 2020-04-10T16:09:13Z

src/mercury_prof_interface.c

+
+/*---------------------------------------------------------------------------*/
+hg_return_t 
+HG_Prof_pvar_get_info(hg_class_t *hg_class, int pvar_index, char *name, int *name_len, hg_prof_class_t *var_class, hg_prof_datatype_t *datatype, char *desc, int *desc_len, hg_prof_bind_t *bind, int *continuous) {


So that means hg_prof_class_t becomes hg_prof_var_class_t ?

soumagne · 2020-04-10T16:10:46Z

src/mercury_prof_interface.c

+
+  struct hg_prof_pvar_session s = *session;
+  unsigned int key = pvar_index;
+  hg_prof_pvar_data_t * val;


Please keep declarations at beginning of blocks

soumagne · 2020-04-10T16:14:01Z

src/mercury_prof_interface.c

+HG_Prof_pvar_handle_free(hg_prof_pvar_session_t session, int pvar_index, hg_prof_pvar_handle_t *handle) {
+
+  if(!hg_prof_get_session_is_initialized(session))
+    return HG_NA_ERROR;


that seems like the wrong error code :) HG_NA_ERROR means it's an NA layer error so maybe just HG_INVALID_ARG?

soumagne · 2020-04-10T16:15:11Z

src/mercury_prof_interface.h

+ */
+HG_PUBLIC hg_return_t 
+HG_Prof_init();
+


See my previous comment, you should return an hg_prof_class_t * here

soumagne · 2020-04-10T16:21:21Z

src/mercury_prof_types.h

+   HG_PVAR_CLASS_SIZE, /* PVAR that represents the size of a given resource at any given point in time */
+   HG_PVAR_CLASS_HIGHWATERMARK, /* PVAR that represents a high watermark value */
+   HG_PVAR_CLASS_LOWWATERMARK /* PVAR that represents a low watermark value */
+} hg_prof_class_t;


yes you should rename that one to hg_prof_var_class_t, given the changes that I suggested earlier.

soumagne · 2020-04-10T16:31:59Z

@carns @srini009 yes for the performance counter instead of:

#define HG_PROF_PVAR_COUNTER_INC(name, val)
    addr_##name = (addr_##name == NULL ? hg_prof_get_pvar_addr_from_name(#name): addr_##name);
    for(int i=0; i < val; i++)
        hg_atomic_incr32(addr_##name);

you should do something like:

#define HG_PROF_PVAR_COUNTER_INC(name, val) do {
    hg_util_int32_t tmp;

    addr_##name = (addr_##name == NULL ? hg_prof_get_pvar_addr_from_name(#name): addr_##name);
    do {
        tmp = hg_atomic_get32(addr_##name);
    } while (!hg_atomic_cas32(addr_##name, tmp, (tmp + val)));
} while (0)

…i009/mercury into mercury_profiling_interface

Add initial MPI_T-like profiling support in Mercury

6d26e00

soumagne reviewed Mar 5, 2020

View reviewed changes

src/mercury_prof_interface.h Outdated Show resolved Hide resolved

Add in Mercury-style comments

16c6512