Skip to content

Commit 07c8d57

Browse files
authored
DAOS-17127 cart: Fix random rank calculation (#16014) (#16111)
Fix the code of the DAOS lib calculating the target rank to send RPC. Update the DAOS user documentation to properly handle system extension and rank exclusion. Signed-off-by: Cedric Koch-Hofer <[email protected]>
1 parent 34cf7ea commit 07c8d57

File tree

7 files changed

+172
-48
lines changed

7 files changed

+172
-48
lines changed

docs/admin/administration.md

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -899,6 +899,31 @@ and then restarted to rejoin the system. An failed engine might also be excluded
899899
from the pools it hosted, please check the pool operation section on how to
900900
reintegrate an excluded engine.
901901
902+
After one or more DAOS engines being excluded, the DAOS agent cache needs to be
903+
refreshed. For detailed information, please refer to the [1][System Deployment
904+
documentation]. Before refreshing the DAOS Agent cache, it should be checked
905+
that the exclusion information has been spread to the Management Service leader.
906+
This could be done using the `dump-attachinfo` sub-command of the `daos_agent`
907+
executable:
908+
909+
```bash
910+
daos_agent -o /tmp/daos_agent-tmp.yml dump-attachinfo
911+
```
912+
913+
This usage of the `daos_agent` command needs a minimal DAOS agent configuration
914+
file `/tmp/daos_agent-tmp.yml` such as:
915+
916+
```yaml
917+
name: daos_server
918+
access_points:
919+
- sertver-1
920+
port: 10001
921+
transport_config:
922+
allow_insecure: true
923+
log_file: /tmp/daos_agent-tmp.log
924+
```
925+
926+
902927
### Shutdown
903928
904929
When up and running, the entire system can be shutdown.
@@ -1041,7 +1066,7 @@ dmg as follows:
10411066
$ dmg storage format -l ${new_storage_node}
10421067
```
10431068
1044-
new_storage_node should be replaced with the hostname or the IP address of the
1069+
`new_storage_node` should be replaced with the hostname or the IP address of the
10451070
new storage node (comma separated list or range of hosts for multiple nodes)
10461071
to be added.
10471072
@@ -1054,6 +1079,11 @@ the system (this can be checked with `dmg system query -v`).
10541079
said, existing pools won't be automatically extended to use the new servers.
10551080
Please see the pool operation section for how to extend the pool membership.
10561081
1082+
After extending the system, the cache of the `daos_agent` service of the client
1083+
nodes needs to be refreshed. For detailed information, please refer to the
1084+
[1][System Deployment documentation].
1085+
1086+
10571087
## Software Upgrade
10581088
10591089
The DAOS v2.0 wire protocol and persistent layout is not compatible with
@@ -1102,3 +1132,5 @@ Examples:
11021132
* daos_server 2.4.0 is only compatible with daos_engine 2.4.0
11031133
* daos_agent 2.6.0 is compatible with daos_server 2.4.0 (2.5 is a development version)
11041134
* dmg 2.4.1 is compatible with daos_server 2.4.0
1135+
1136+
[1]: <deployment.md#refresh-agent-cache>(Refresh DAOS Agent Cache)

docs/admin/deployment.md

Lines changed: 51 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1360,9 +1360,6 @@ engines:
13601360
<end>
13611361
```
13621362

1363-
There are a few optional providers that are not built by default. For detailed
1364-
information, please refer to the [DAOS build documentation][6].
1365-
13661363
!!! note
13671364
DAOS Control Servers will need to be restarted on all hosts after updates to the server
13681365
configuration file.
@@ -1372,9 +1369,6 @@ information, please refer to the [DAOS build documentation][6].
13721369

13731370
This will be the host which bootstraps the DAOS management service (MS).
13741371

1375-
>The support of the optional providers is not guarantee and can be removed
1376-
>without further notification.
1377-
13781372
### Network Configuration
13791373

13801374
#### Network Scan
@@ -1731,42 +1725,56 @@ Once the service file is installed and `systemctl daemon-reload` has been run to
17311725
reload the configuration, the `daos_agent` can be started through systemd
17321726
as shown above.
17331727

1734-
#### Disable Agent Cache (Optional)
1728+
#### Refresh Agent Cache
17351729

1736-
In certain circumstances (e.g. for DAOS development or system evaluation), it
1737-
may be desirable to disable the DAOS Agent's caching mechanism in order to avoid
1738-
stale system information being retained across reformats of a system. The DAOS
1739-
Agent normally caches a map of rank-to-fabric URI lookups as well as client network
1740-
configuration data in order to reduce the number of management RPCs required to
1741-
start an application. When this information becomes stale, the Agent must be
1742-
restarted in order to repopulate the cache with new information.
1743-
Alternatively, the caching mechanism may be disabled, with the tradeoff that
1744-
each application launch will invoke management RPCs in order to obtain system
1745-
connection information.
1730+
In certain circumstances (e.g. [system extension][6] with new servers), it may
1731+
be needed to refresh the DAOS Agent cache. The DAOS Agent normally caches a map
1732+
of rank-to-fabric URI lookups as well as client network configuration data in
1733+
order to reduce the number of management RPCs required to start an application.
17461734

1747-
To disable the DAOS Agent caching mechanism, set the following environment
1748-
variable before starting the `daos_agent` process:
1735+
After a new server has been added to the system, or an existing server has been
1736+
permanently removed from the system, the administrator should ensure that the
1737+
Agent is not serving stale system information to new clients. There are three
1738+
options to achieve this goal:
17491739

1750-
`DAOS_AGENT_DISABLE_CACHE=true`
1740+
1. Send the `SIGUSR2` signal to the `daos_agent` process to force a refresh on
1741+
demand. This could be done with running the following command
17511742

1752-
If running from systemd, add the following to the `daos_agent` service file in
1753-
the `[Service]` section before reloading systemd and restarting the
1754-
`daos_agent` service:
1743+
```bash
1744+
pkill -x -SIGUSR2 daos_agent
1745+
```
17551746

1756-
`Environment=DAOS_AGENT_DISABLE_CACHE=true`
1747+
2. Add a cache expiration value, defined in minutes, to the Agent configuration
1748+
file `daos_agent.yml` in order to cause a cache refresh when the data is
1749+
older than the defined value.
17571750

1751+
```yaml
1752+
cache_expiration: 30
1753+
```
17581754

1759-
[^1]: https://github.com/intel/ipmctl
1755+
3. Disable the caching mechanism completely, with the tradeoff that each
1756+
application launch will invoke management RPCs in order to obtain system
1757+
connection information. To disable the DAOS Agent caching mechanism, set the
1758+
following environment variable before starting the `daos_agent` process:
17601759

1761-
[^2]: https://github.com/daos-stack/daos/tree/master/utils/config
1760+
```bash
1761+
DAOS_AGENT_DISABLE_CACHE=true
1762+
```
17621763

1763-
[^3]: [https://www.open-mpi.org/faq/?category=running\#mpirun-hostfile](https://www.open-mpi.org/faq/?category=running#mpirun-hostfile)
1764+
If running from systemd, add the following line to the `daos_agent` service
1765+
file in the `[Service]` section before reloading systemd and restarting the
1766+
`daos_agent` service:
17641767

1765-
[^4]: https://github.com/daos-stack/daos/tree/master/src/control/README.md
1768+
```ini
1769+
Environment=DAOS_AGENT_DISABLE_CACHE=true
1770+
```
17661771

1767-
[^5]: https://github.com/pmem/ndctl/issues/130
1772+
It is also possible to disable the `daos_agent` cache with adding the
1773+
following entry into the `daos_agent.yml` configuration file:
17681774

1769-
[6]: <../dev/development.md#building-optional-components> (Building DAOS for Development)
1775+
```yaml
1776+
disable_caching: true
1777+
```
17701778

17711779
## Multi-user DFuse setup
17721780

@@ -1781,3 +1789,16 @@ fuse and must be enabled by root before any user can use it. To allow this then
17811789
uncomment a line in `/etc/fuse.conf` to enable the `user_allow_other` setting. The daos-client rpm
17821790
does not do this automatically. An administrator must set this option on all nodes on which they
17831791
want to provide a persistent multi-user dfuse service.
1792+
1793+
1794+
[^1]: https://github.com/intel/ipmctl
1795+
1796+
[^2]: https://github.com/daos-stack/daos/tree/master/utils/config
1797+
1798+
[^3]: [https://www.open-mpi.org/faq/?category=running\#mpirun-hostfile](https://www.open-mpi.org/faq/?category=running#mpirun-hostfile)
1799+
1800+
[^4]: https://github.com/daos-stack/daos/tree/master/src/control/README.md
1801+
1802+
[^5]: https://github.com/pmem/ndctl/issues/130
1803+
1804+
[6]: <administration.md#system-extension>(Extension of the DAOS system)

src/client/api/event.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
/**
22
* (C) Copyright 2016-2024 Intel Corporation.
3+
* (C) Copyright 2025 Hewlett Packard Enterprise Development LP
4+
* (C) Copyright 2025 Google LLC
35
*
46
* SPDX-License-Identifier: BSD-2-Clause-Patent
57
*/
@@ -117,7 +119,7 @@ daos_eq_lib_reset_after_fork(void)
117119
eq_ref = 0;
118120
ev_thpriv_is_init = false;
119121
crt_info = daos_crt_init_opt_get(false, 1);
120-
rc = dc_mgmt_net_cfg(NULL, crt_info);
122+
rc = dc_mgmt_net_cfg_init(NULL, crt_info);
121123
if (rc == 0)
122124
rc = daos_eq_lib_init(crt_info);
123125
D_FREE(crt_info->cio_provider);

src/client/api/init.c

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ daos_init(void)
212212
* get CaRT configuration (see mgmtModule.handleGetAttachInfo for the
213213
* handling of NULL system names)
214214
*/
215-
rc = dc_mgmt_net_cfg(NULL, crt_info);
215+
rc = dc_mgmt_net_cfg_init(NULL, crt_info);
216216
if (rc != 0)
217217
D_GOTO(out_attach, rc);
218218

@@ -223,7 +223,7 @@ daos_init(void)
223223
D_FREE(crt_info->cio_domain);
224224
if (rc != 0) {
225225
D_ERROR("failed to initialize eq_lib: "DF_RC"\n", DP_RC(rc));
226-
D_GOTO(out_attach, rc);
226+
D_GOTO(out_net_cfg, rc);
227227
}
228228

229229
/**
@@ -290,6 +290,8 @@ daos_init(void)
290290
pl_fini();
291291
out_eq:
292292
daos_eq_lib_fini();
293+
out_net_cfg:
294+
dc_mgmt_net_cfg_fini();
293295
out_attach:
294296
dc_mgmt_drop_attach_info();
295297
out_job:
@@ -350,6 +352,7 @@ daos_fini(void)
350352
if (rc != 0)
351353
D_ERROR("failed to disconnect some resources may leak, " DF_RC "\n", DP_RC(rc));
352354

355+
dc_mgmt_net_cfg_fini();
353356
dc_tm_fini();
354357
dc_mgmt_drop_attach_info();
355358
dc_agent_fini();

src/client/api/rpc.c

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
/**
22
* (C) Copyright 2016-2024 Intel Corporation.
3+
* (C) Copyright 2025 Hewlett Packard Enterprise Development LP
34
*
45
* SPDX-License-Identifier: BSD-2-Clause-Patent
56
*/
@@ -96,7 +97,7 @@ daos_rpc_send_wait(crt_rpc_t *rpc)
9697
}
9798

9899
struct rpc_proto {
99-
int nr_ranks;
100+
int rank_idx;
100101
crt_endpoint_t ep;
101102
int version;
102103
int rc;
@@ -114,7 +115,17 @@ query_cb(struct crt_proto_query_cb_info *cb_info)
114115
int rc;
115116

116117
if (daos_rpc_retryable_rc(cb_info->pq_rc)) {
117-
rproto->ep.ep_rank = (rproto->ep.ep_rank + 1) % rproto->nr_ranks;
118+
int nr_ranks;
119+
d_rank_t rank;
120+
121+
/** select next rank to issue the retry proto query rpc to */
122+
nr_ranks = dc_mgmt_net_get_num_srv_ranks();
123+
D_ASSERT(nr_ranks > 0);
124+
rproto->rank_idx = (rproto->rank_idx + 1) % nr_ranks;
125+
rank = dc_mgmt_net_get_srv_rank(rproto->rank_idx);
126+
D_ASSERT(rank != CRT_NO_RANK);
127+
rproto->ep.ep_rank = rank;
128+
118129
rproto->timeout += 3;
119130
rc = crt_proto_query_with_ctx(&rproto->ep, rproto->base_opc, rproto->ver_array,
120131
rproto->array_size, rproto->timeout, query_cb, rproto,
@@ -139,6 +150,8 @@ daos_rpc_proto_query(crt_opcode_t base_opc, uint32_t *ver_array, int count, int
139150
crt_context_t ctx = daos_get_crt_ctx();
140151
int rc;
141152
int i;
153+
int nr_ranks;
154+
d_rank_t rank;
142155

143156
rc = dc_mgmt_sys_attach(NULL, &sys);
144157
if (rc != 0) {
@@ -151,8 +164,16 @@ daos_rpc_proto_query(crt_opcode_t base_opc, uint32_t *ver_array, int count, int
151164
D_GOTO(out_detach, rc = -DER_NOMEM);
152165

153166
/** select a random rank to issue the proto query rpc to */
154-
rproto->nr_ranks = dc_mgmt_net_get_num_srv_ranks();
155-
rproto->ep.ep_rank = d_rand() % rproto->nr_ranks;
167+
nr_ranks = dc_mgmt_net_get_num_srv_ranks();
168+
if (nr_ranks == 0) {
169+
D_ERROR("failed to select an attached ranks: no attached ranks");
170+
D_GOTO(out_free, -DER_NONEXIST);
171+
}
172+
rproto->rank_idx = d_rand() % nr_ranks;
173+
rank = dc_mgmt_net_get_srv_rank(rproto->rank_idx);
174+
D_ASSERT(rank != CRT_NO_RANK);
175+
rproto->ep.ep_rank = rank;
176+
156177
rproto->ep.ep_tag = 0;
157178
rproto->ver_array = ver_array;
158179
rproto->array_size = count;

src/include/daos/mgmt.h

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
/**
22
* (C) Copyright 2016-2024 Intel Corporation.
3+
* (C) Copyright 2025 Hewlett Packard Enterprise Development LP
34
*
45
* SPDX-License-Identifier: BSD-2-Clause-Patent
56
*/
@@ -59,7 +60,10 @@ ssize_t dc_mgmt_sys_encode(struct dc_mgmt_sys *sys, void *buf, size_t cap);
5960
ssize_t dc_mgmt_sys_decode(void *buf, size_t len, struct dc_mgmt_sys **sysp);
6061

6162
int
62-
dc_mgmt_net_cfg(const char *name, crt_init_options_t *crt_info);
63+
dc_mgmt_net_cfg_init(const char *name, crt_init_options_t *crt_info);
64+
void
65+
dc_mgmt_net_cfg_fini(void);
66+
6367
int dc_mgmt_net_cfg_check(const char *name);
6468
int dc_mgmt_get_pool_svc_ranks(struct dc_mgmt_sys *sys, const uuid_t puuid,
6569
d_rank_list_t **svcranksp);
@@ -71,6 +75,8 @@ int dc_mgmt_notify_pool_connect(struct dc_pool *pool);
7175
int dc_mgmt_notify_pool_disconnect(struct dc_pool *pool);
7276
int dc_mgmt_notify_exit(void);
7377
int dc_mgmt_net_get_num_srv_ranks(void);
78+
d_rank_t
79+
dc_mgmt_net_get_srv_rank(int idx);
7480
int dc_mgmt_get_sys_info(const char *sys, struct daos_sys_info **info);
7581
void dc_mgmt_put_sys_info(struct daos_sys_info *info);
7682
int dc_get_attach_info(const char *name, bool all_ranks, struct dc_mgmt_sys_info *info,

0 commit comments

Comments
 (0)