Add initial MPI_T-like profiling support in Mercury#350
Add initial MPI_T-like profiling support in Mercury#350srini009 wants to merge 40 commits intomercury-hpc:masterfrom
Conversation
| */ | ||
| HG_PUBLIC hg_return_t | ||
| HG_Prof_init(); | ||
|
|
There was a problem hiding this comment.
That should probably take an hg_class as a parameter. We usually try to avoid using globals.
There was a problem hiding this comment.
See my previous comment, you should return an hg_prof_class_t * here
| /* PVAR profiling support */ | ||
| #include "mercury_prof_pvar_impl.h" | ||
| HG_PROF_PVAR_UINT_COUNTER_DECL_EXTERN(hg_pvar_hg_forward_count); | ||
|
|
There was a problem hiding this comment.
As a general comment I think it would be good to see if/how we can avoid declaring global variables.
|
I would also suggest that the comments for this interface reference a URL (I'm not sure which one off hand) for the MPI_T interface with a brief comment along the lines of "The following interface is patterned off of the MPI_T interface, except that X, Y, Z", where X, Y, an Z are notable philosophical differences (like being oriented around HG classes rather than globals) if there are any. That way we can lean on the MPI documentation to some degree and also help rationalize the API conventions, since there is some value to reusing terminology for this stuff. |
Thanks @srini009 ! I just checked two things that I had commented on: a) did you find a URL or some other reference to put in the comments for the general model the API is using? It might be there and I'm missing it. b) there has got to be a better way to implement COUNTER_INC() than looping over an incr() right? Maybe do a get() and then a cas() instead (with a retry loop if needed)? Jerome may have another idea. I'm thinking of the case where the val could be relatively large, like if you are accumulating bytes transmitted or something. The current code might be slow, and also isn't technically atomic across the macro. |
|
Hi @carns, As for (b), indeed, I did give this a thought. As far as atomicity is concerned, hopefully, I'm not mistaken, but I think we don't need to enforce atomicity at the macro level since individual updates are atomic and addition is commutative. I don't think we shall be encountering race conditions here. But you're right, it is incredibly inefficient for large updates related to bytes transferred, etc. Perhaps I can take @soumagne's help here. I am excited about this becoming a part of mercury. After that, it is a relatively quick step to introduce a bunch of new PVARs that are insightful. Regards, |
Ah! Yes, that link and explanation in the PR comment is perfect, just drop it in the code. My thinking is that in the future someone might look at the code and want to change things without realizing that it's intentionally matching some broader conventions. |
|
@soumagne what's the best what to do the performance counter macro? Do a get(), calculate new value, then cas() (in a loop to retry if the cas fails)? |
| int hg_prof_is_initialized; /* Is profiling initialized */ | ||
| int num_pvars; /* No of PVARs currently supported */ | ||
| hg_prof_pvar_session_t session; /* Is profiling initialized on the session */ | ||
| }; |
There was a problem hiding this comment.
Maybe I am missing something but you should have a struct hg_prof_class here, not hg_private_class.
| hg_prof_set_is_initialized(struct hg_private_class * hg_private_class) | ||
| { | ||
| hg_private_class->hg_prof_is_initialized = 1; | ||
| } |
There was a problem hiding this comment.
Just a detail but instead of using 1 or 0, there are HG_TRUE and HG_FALSE macros
| static int | ||
| hg_prof_get_session_is_initialized(struct hg_prof_pvar_session * session) { | ||
| return session->is_initialized; | ||
| } |
There was a problem hiding this comment.
Same here, you could return an hg_bool_t instead.
| hg_prof_set_is_initialized(hg_private_class); | ||
| hg_private_class->num_pvars = NUM_PVARS; | ||
| hg_private_class->session = NULL; | ||
|
|
There was a problem hiding this comment.
I don't think you are allowed to do that here :) HG_Prof_init() should return a new hg_prof_class_t * so I would expect you to malloc a new struct hg_prof_class in that routine, fill it and return it.
| /*---------------------------------------------------------------------------*/ | ||
| hg_return_t | ||
| HG_Prof_finalize(hg_class_t *hg_class) { | ||
|
|
There was a problem hiding this comment.
Here instead it should take an hg_prof_class_t *
|
|
||
| /*---------------------------------------------------------------------------*/ | ||
| hg_return_t | ||
| HG_Prof_pvar_get_info(hg_class_t *hg_class, int pvar_index, char *name, int *name_len, hg_prof_class_t *var_class, hg_prof_datatype_t *datatype, char *desc, int *desc_len, hg_prof_bind_t *bind, int *continuous) { |
There was a problem hiding this comment.
So that means hg_prof_class_t becomes hg_prof_var_class_t ?
|
|
||
| struct hg_prof_pvar_session s = *session; | ||
| unsigned int key = pvar_index; | ||
| hg_prof_pvar_data_t * val; |
There was a problem hiding this comment.
Please keep declarations at beginning of blocks
| HG_Prof_pvar_handle_free(hg_prof_pvar_session_t session, int pvar_index, hg_prof_pvar_handle_t *handle) { | ||
|
|
||
| if(!hg_prof_get_session_is_initialized(session)) | ||
| return HG_NA_ERROR; |
There was a problem hiding this comment.
that seems like the wrong error code :) HG_NA_ERROR means it's an NA layer error so maybe just HG_INVALID_ARG?
| */ | ||
| HG_PUBLIC hg_return_t | ||
| HG_Prof_init(); | ||
|
|
There was a problem hiding this comment.
See my previous comment, you should return an hg_prof_class_t * here
| HG_PVAR_CLASS_SIZE, /* PVAR that represents the size of a given resource at any given point in time */ | ||
| HG_PVAR_CLASS_HIGHWATERMARK, /* PVAR that represents a high watermark value */ | ||
| HG_PVAR_CLASS_LOWWATERMARK /* PVAR that represents a low watermark value */ | ||
| } hg_prof_class_t; |
There was a problem hiding this comment.
yes you should rename that one to hg_prof_var_class_t, given the changes that I suggested earlier.
|
@carns @srini009 yes for the performance counter instead of: #define HG_PROF_PVAR_COUNTER_INC(name, val)
addr_##name = (addr_##name == NULL ? hg_prof_get_pvar_addr_from_name(#name): addr_##name);
for(int i=0; i < val; i++)
hg_atomic_incr32(addr_##name);you should do something like: #define HG_PROF_PVAR_COUNTER_INC(name, val) do {
hg_util_int32_t tmp;
addr_##name = (addr_##name == NULL ? hg_prof_get_pvar_addr_from_name(#name): addr_##name);
do {
tmp = hg_atomic_get32(addr_##name);
} while (!hg_atomic_cas32(addr_##name, tmp, (tmp + val)));
} while (0) |
b1fe49b to
c174d8d
Compare
…i009/mercury into mercury_profiling_interface
…i009/mercury into mercury_profiling_interface
…i009/mercury into mercury_profiling_interface
…i009/mercury into mercury_profiling_interface
937d6e8 to
d98a378
Compare
Add an MPI_T-like profiling interface in Mercury.
Profiling interface is active by default.
Currently, only one PVAR is exported called "hg_pvar_hg_forward_count" that counts the number of times the HG_Forward call is invoked on this instance.
Refer to: https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node372.htm#Node372 for the corresponding MPI 3.1 PVAR interface specification that this implementation is largely based off of.
Notable differences with MPI (implementation, not interface spec):