Skip to content

Latest commit

 

History

History
131 lines (91 loc) · 4.16 KB

File metadata and controls

131 lines (91 loc) · 4.16 KB

Topology: Distributed

This example uses RPC to enable a coordinator instance to get the topology of all the other active instances. The :ref:`rpcengine`

This example showcases how the abstract HiCR Core API can be used to discover compute and memory devices in the system. The code is structured as follows:

  • source/ contains the different variants of this example corresponding to different backends

    • mpi.cpp corresponds to the :ref:`mpi backend` backend implementation. It uses hwloc to discover local resources and then communicates them via MPI

Local Topology Discovery

Each instance discovers its own local topology

// Creating HWloc topology object
hwloc_topology_t topology;

// Reserving memory for hwloc
hwloc_topology_init(&topology);

// Initializing hwloc (CPU) topology manager
HiCR::backend::hwloc::TopologyManager tm(&topology);

// Gathering topology from the topology manager
const auto t = tm.queryTopology();

Topology RPC registration

Each instance instantiates the RPCEngine and registers and RPC to serialize and send its local topology to the caller

// Creating RPC engine instance
HiCR::frontend::RPCEngine rpcEngine(...);

// Initialize RPC engine
rpcEngine.initialize();

// Creating execution unit to run as RPC
auto executionUnit = std::make_shared<HiCR::backend::pthreads::ExecutionUnit>([&](void *closure) { sendTopology(rpcEngine); });

// Adding RPC target by name and the execution unit id to run
rpcEngine.addRPCTarget(TOPOLOGY_RPC_NAME, executionUnit);

Listening for RPCs

All the instances except Root listens for incoming RPC requests

// Listening for RPC requests
rpcEngine.listen();

RPC Invokation

The Root instance requests the topology from all the other instances, merge them, and then displays them

// Getting instance manager from the rpc engine
auto im = rpcEngine.getInstanceManager();

// Querying instance list
auto &instances = im->getInstances();

// Getting the pointer to our own (coordinator) instance
auto coordinator = im->getCurrentInstance();

// Invoke RPC
for (const auto &instance : instances)
  if (instance->getId() != coordinator->getId()) rpcEngine.requestRPC(instance->getId(), TOPOLOGY_RPC_NAME);

// Getting return values from the RPCs containing each of the worker's topology
for (const auto &instance : instances)
  if (instance == coordinator)
  {
    // Getting return value as a memory slot
    auto returnValue = rpcEngine.getReturnValue();

    // Receiving raw serialized topology information from the worker
    std::string serializedTopology = (char *)returnValue->getPointer();

    // Parsing serialized raw topology into a json object
    auto topologyJson = nlohmann::json::parse(serializedTopology);

    // Freeing return value
    rpcEngine.getMemoryManager()->freeLocalMemorySlot(returnValue);

    // HiCR topology object to obtain
    HiCR::Topology topology;

    // Merge topologies
    topology.merge(HiCR::backend::hwloc::TopologyManager::deserializeTopology(topologyJson));
    }
}

The result should look like the following:

* Worker 1 Topology:
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 93.071026 Gb
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 94.437321 Gb
* Worker 2 Topology:
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 93.071026 Gb
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 94.437321 Gb
* Worker 3 Topology:
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 93.071026 Gb
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 94.437321 Gb