Skip to content

Conversation

@xinzhao3
Copy link
Owner

No description provided.

size_t len;
} ucx_iovec_t;

OBJ_CLASS_INSTANCE(thread_local_info_t, opal_list_item_t, NULL, NULL);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Has to be in the common/ucx *.c file
  • let's create constructor/destructor


OBJ_CLASS_INSTANCE(thread_local_info_t, opal_list_item_t, NULL, NULL);

pthread_key_t my_thread_key = {0};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This has to be declared in common/ucx/ *.c file
  • Use opal_common_ucx_ prefix
  • extern declaration has to be in common/ucx

if (ret != OMPI_SUCCESS) {
return ret;
}
pthread_mutex_unlock(&mca_osc_ucx_component.worker_mutex);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Move this into the common code.
  • Make this function inline and provide the parameter to select either EP or worker flush (to allow compiler static optimization
inline in func(int flag){
{
....
if( flag ){
   do1;
} else {
   do2;
}
....
}

....
func(0); // compiler will optimiza and remove if branch

static void ompi_osc_ucx_unregister_progress(void);

opal_list_t active_workers = {{0}}, idle_workers = {{0}};
pthread_mutex_t active_workers_mutex = PTHREAD_MUTEX_INITIALIZER;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has to go to the common code

int flavor, int *model);
static void ompi_osc_ucx_unregister_progress(void);

opal_list_t active_workers = {{0}}, idle_workers = {{0}};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one also goes to the common code

UCP_PARAM_FIELD_MT_WORKERS_SHARED |
UCP_PARAM_FIELD_ESTIMATED_NUM_EPS |
UCP_PARAM_FIELD_REQUEST_INIT |
UCP_PARAM_FIELD_REQUEST_SIZE;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • have a hadler that describes UCX worker pool
    • UCX context
    • list of active/idle workers
    • all required thread-level information.

handler = opal_common_ucx_workerpool_init(allgather_ptr, allgatherv_ptr);

opal_common_ucx_workerpool_init

  • Initializes UCX: context creation
  • create main worker
  • exchange EP's
  • records main thread

ompi_datatype_type_size(origin_dt, &origin_len);
origin_len *= origin_count;

pthread_mutex_lock(mutex_ptr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block goes into the common code.

curr_len = MIN(origin_ucx_iov[origin_ucx_iov_idx].len,
target_ucx_iov[target_ucx_iov_idx].len);

pthread_mutex_lock(mutex_ptr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This block goes to common
  • Make sure to always unlock (including the error path)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has to be an inline function with argument put/get that will allow static compile optimization

} else if (!is_origin_contig) {
size_t prev_len = 0;
while (origin_ucx_iov_idx < origin_ucx_iov_count) {
pthread_mutex_lock(mutex_ptr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This block goes to common
  • See above comment about locks

} else {
size_t prev_len = 0;
while (target_ucx_iov_idx < target_ucx_iov_count) {
pthread_mutex_lock(mutex_ptr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This block also goes to the common
  • see above about locks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants