Skip to content

Commit 281cf7e

Browse files
committed
job-ingest: handle fluid_init() failure
Problem: if fluid_init() fails, job-ingest could issue duplicate/invalid job IDs. When fluid_init() fails, job-ingest logs an error but continues on. Since the "fluid_generator" is just a 64-bit integer, the fluids generated after that might be the same as those issued on rank 0, and may start from an incorrect clock offset. fluid_init() is expected to fail on rank > 16383. It could also fail in the unlikely event that clock_gettime() fails. When fluid_init() fails, abort the module.
1 parent 3f139df commit 281cf7e

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

src/modules/job-ingest/job-ingest.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -854,6 +854,7 @@ int mod_main (flux_t *h, int argc, char **argv)
854854
if (fluid_init (&ctx.gen, 0, fluid_get_timestamp (max_jobid) + 1) < 0) {
855855
flux_log (h, LOG_ERR, "fluid_init failed");
856856
errno = EINVAL;
857+
goto done;
857858
}
858859
}
859860
else {
@@ -873,12 +874,10 @@ int mod_main (flux_t *h, int argc, char **argv)
873874
goto done;
874875
}
875876
flux_future_destroy (f);
876-
/* fluid_init() will fail on rank > 16K.
877-
* Just skip loading the job module on those ranks.
878-
*/
879877
if (fluid_init (&ctx.gen, rank, timestamp) < 0) {
880878
flux_log (h, LOG_ERR, "fluid_init failed");
881879
errno = EINVAL;
880+
goto done;
882881
}
883882
}
884883
flux_log (h, LOG_DEBUG, "fluid ts=%jums", (uint64_t)ctx.gen.timestamp);

0 commit comments

Comments
 (0)