You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted a fast, lightweight job completion plugin for slurm that has good support for client-side filtering of job completion criteria. Redis fits this need very nicely because it is memory-based and very fast indeed. This jobcomp_redis plugin can produce permanent redis keys or, using redis key expiry, it can produce keys which live only for a duration that you configure. The jobcomp_redis plugin can be a good complement to accounting storage plugins, e.g. mysql/mariadb. For example, you could configure the jobcomp_redis plugin so that keys live only for a week, thus implementing a super-fast, memory-based cache of a rolling week's worth of jobs. If you save the keys permanently (the default), you can configure redis persistence to suit your needs, write scripts to manage your redis job data, etc.
@@ -20,22 +22,57 @@ In terms of design, the jobcomp_redis slurm plugin works with a partner plugin t
20
22
When job data is requested from slurm, jobcomp_redis sends job criteria to redis and then issues the command `SLURMJC.MATCH` to ask redis to perform the job matching. In this way, we avoid pulling job candidates across the wire just to test if they match which can waste network bandwidth and slow us down. If matches are found, the slurm-side partner will issue `SLURMJC.FETCH` to receive the job data from redis.
21
23
22
24
Let me know if you find this plugin useful. More plugins to follow ...
25
+
___
23
26
24
27
### Requirements
25
28
26
-
To configure, build and run this package from source you will need the following:
27
-
- a working slurm installation with slurm libraries (including `libslurm.so` symlink) in your library search path
28
-
- a *_configured_* slurm source tree (`slurm/slurm.h` exists) that *_matches_* your slurm installation
29
-
-`gcc` and `pkg-config` which you already have if you've built slurm successfully
30
-
-[cmake](https://cmake.org/), a relatively recent version (3.4.0+)
29
+
Two methods are provided for building this package from source:
30
+
1. A patch set that applies patches directly over the slurm source tree and
31
+
2. The slurm-redis native build system which uses [cmake](https://cmake.org/)
32
+
33
+
The patch set method is perhaps easier as it does not require that you have the cmake build system installed. The plugins were developed using the cmake build system, however, and using that method allows you to configure and build these plugins independently from a working slurm installation. All compile-time options are available in either case.
34
+
35
+
These are the additional software requirements to run slurm-redis:
36
+
31
37
-[redis](https://redis.io/), including its `redismodule.h` development header
32
38
-[hiredis](https://github.com/redis/hiredis), the c client for redis, headers and library
33
39
-`libuuid`, its header (uuid/uuid.h) and library `libuuid.so`, (available in utils-linux)
40
+
___
41
+
42
+
#### Build slurm-redis using provided slurm patch set
43
+
44
+
The patch sets are named according to the version of slurm-redis and the version of slurm to which they apply, for example:
34
45
46
+
`slurm-redis-0.1.0-slurm-19.05.patch.bz2` would be the patch you could use for slurm-redis version 0.1.0 that patches over the slurm 19.05 source tree. Here's an example of using the patch set:
# Re-run autoreconf (or autogen.sh on 18.08) to apply build system changes
57
+
./autoreconf
58
+
# Configure slurm as you normally would, noting these additional options:
59
+
./configure --help # See section "Advanced configuration"
60
+
...
61
+
--with-jcr-cache-size=N set jobcomp/redis cache size [128]
62
+
--with-jcr-cache-ttl=N set jobcomp/redis cache ttl [120]
63
+
--with-jcr-fetch-count=N
64
+
set jobcomp/redis fetch count [500]
65
+
--with-jcr-fetch-limit=N
66
+
set jobcomp/redis fetch limit [1000]
67
+
--with-jcr-query-ttl=N set jobcomp/redis query ttl [60]
68
+
--with-jcr-ttl=N set jobcomp/redis ttl: -1=permanent [-1]
69
+
--with-jcr-tmf=N set jobcomp/redis date/time format: 0=unix epoch,
70
+
1=iso8601 [1]
71
+
...
72
+
```
73
+
___
37
74
38
-
#### Basic configuration
75
+
#### Build slurm-redis using native build system
39
76
40
77
```bash
41
78
# CMAKE_INSTALL_PREFIX must be set to the system's library installation prefix,
@@ -59,7 +96,7 @@ $ make
59
96
$ sudo make install
60
97
```
61
98
62
-
After installing the plugins, restart `slurmctld` if it was running with a previous `jobcomp_redis.so` loaded. You do not have to restart redis, however, in order to load a newer version of the `slurm_jobcomp.so` plugin, in fact, keys can be lost if you restart redis in between its persistence cycles. Instead, simply open a redis cli and manually unload the current module, then load the new module (or write a script to do this):
99
+
After installation, restart `slurmctld` if it was running with a previous `jobcomp_redis.so` loaded. You do not have to restart redis, however, in order to load a newer version of the `slurm_jobcomp.so` plugin, in fact, keys can be lost if you restart redis in between its persistence cycles. Instead, simply open a redis cli and manually unload the current module, then load the new module (or write a script to do this):
$ cmake -DJCR_TMF=N ... # [0 = unix epoch times, 1 = ISO 8601 format] or
116
+
$ ./configure --with-jcr-tmf=N ...
117
+
# The default is 1 (ISO8601).
118
+
119
+
# This setting will cause the slurm jobcomp_redis plugin to send all date/time elements
120
+
# either as ISO8601 strings, GMT with timezone "Z" (Zero/Zulu), or as Unix Epoch times.
78
121
79
-
This setting will cause the slurm jobcomp_redis plugin to send all date/time elements
80
-
as ISO8601 strings, GMT with timezone "Z" (Zero/Zulu), thus all date/times will be
81
-
human-readable and normalized to that timezone.
122
+
$ cmake -DJCR_TTL=N ... # [-1 or a positive integer (seconds)] or
123
+
$ ./configure --with-jcr-ttl=N ...
124
+
# The default is -1: keys are permanent.
82
125
83
-
# cmake -DUSE_ISO8601=0
126
+
# This setting will set the time-to-live in seconds of your job completion data. If you
127
+
# use the value 86400, for example, your job keys will disappear after 1 day.
84
128
85
-
This setting will cause all date/times to be stored as integers (redis strings) --
86
-
the number of seconds since the unix epoch.
129
+
$ cmake -DJCR_QUERY_TTL=N ... # or
130
+
$ ./configure --with-jcr-query-ttl=N ...
131
+
# The default is 60 seconds.
87
132
88
-
# cmake -DTTL=<N> (default -1 = permanent key)
133
+
# This setting should not need to be changed. When clients such as saact request job
134
+
# data, the jobcomp_redis plugin sends the job criteria to redis as a set of transient
135
+
# keys and then issues SLURMJC.MATCH. The latency between the time that the criteria
136
+
# arrives in redis and the command SLURMJC.FETCH completes in redis is where this setting
137
+
# matters.
89
138
90
-
This setting will set the time-to-live in seconds of your job completion data. If you
91
-
use -DTTL=86400, your job keys will disappear after 1 day. The default is -1: keys
92
-
are permanent. Use this setting if you want a fast but temporary cache of recent job
93
-
data.
139
+
$ cmake -DJCR_FETCH_LIMIT=N ... # or
140
+
$ ./configure --with-jcr-fetch-limit=N ...
141
+
# The default is 1000 job records.
94
142
95
-
# cmake -DQUERY_TTL=<N> (default 60)
143
+
# The maximum number of jobs that redis will allow to be sent to the client in one
144
+
# iteration of SLURMJC.FETCH.
96
145
97
-
This setting should not need to be changed. When clients such as saact request job data,
98
-
the jobcomp_redis plugin sends the job criteria to redis as a set of transient keys and
99
-
then issues SLURMJC.MATCH. The brief latency between the time that the criteria arrives
100
-
in redis and the command SLURMJC.MATCH starts in redis is where this setting matters.
146
+
$ cmake -DJCR_FETCH_COUNT=N ... # or
147
+
$ ./configure --with-jcr-fetch-count=N ...
148
+
# The default is 500 job records.
101
149
102
-
# cmake -DFETCH_LIMIT=<N> (default 1000)
150
+
# The maximum number of jobs records that the client would like to receive in one
151
+
# iteration of SLURMJC.FETCH.
103
152
104
-
The maximum number of jobs redis will allow to be sent to the client in one iteration
105
-
of SLURMJC.FETCH.
153
+
$ cmake -DJCR_CACHE_SIZE=N ... # or
154
+
$ ./configure --with-jcr-cache-size=N
155
+
# The default is 128 entries (there are separate uid and gid caches).
106
156
107
-
# cmake -DFETCH_COUNT=<N> (default 500)
157
+
# As job records complete the jobcomp/redis plugin maintains small caches of uid and gid
158
+
# to name, e.g. 0 -> root, to take pressure off distributed LDAP and similar systems.
108
159
109
-
The maximum number of jobs the client would like to receive in one iteration of
110
-
SLURMJC.FETCH.
160
+
$ cmake -DJCR_CACHE_TTL=N ... # or
161
+
$ ./configure --with-jcr-cache-ttl=N
162
+
# The default is 120 seconds.
163
+
164
+
# The time-to-live of the uid and gid cache entries. If a cache entry is missing or has
165
+
# expired, slurm apis are called to fetch the names.
111
166
```
167
+
___
112
168
113
169
### Slurm Configuration
114
170
@@ -123,6 +179,8 @@ JobCompType=jobcomp/redis
123
179
#JobCompUser=<unused, redis has no notion of user>
124
180
```
125
181
182
+
___
183
+
126
184
### Redis Configuration
127
185
128
186
```bash
@@ -157,6 +215,8 @@ number of ways to turn that off:
157
215
158
216
There may be some other system settings, e.g. overcommit_memory, that you need to adjust using `/etc/sysctl.conf`. Refer to the redis log file for more details.
0 commit comments