-
Notifications
You must be signed in to change notification settings - Fork 55
job-list: support job list constraints #5126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job-list: support job list constraints #5126
Conversation
e9919eb to
03ac597
Compare
It doesn't look like my simplistic approach will work, which can be illustrated by this simple example:
In my
This should return true for the pending and inactive job states, as its impossible for a running job (regardless of userid) to pass this constraint match. I'm going to give it the good 'ol college try on another technique, which I think will basically involve a third potential state of "maybe" a job in a certain state could pass, but shouldn't spend forever on this. Perhaps an interim solution could be to support job constraints on all the non-job states stuff only (and given discussion in #4914 maybe |
03ac597 to
39fe136
Compare
|
Re-pushed, pushing a first round implementation of constraints based filtering on In follow up work, this will be enhanced:
It initially shocked me that this worked, only to realize, this basically told
A side note:
|
b3a96ca to
d254c88
Compare
d254c88 to
4d4f3e8
Compare
|
re-pushed with a small refactor, a "oh, that's still programmed that way because of ... before." Edit: oops and one more anal retentive tweak |
d5b2de0 to
c1bb5f0
Compare
grondo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a first pass through.. I'd like to do some further testing on another pass.
One general note: I would have perhaps kept the since parameter separate in the protocol. The reason for that is it was meant as a way to stop processing the job lists before you get all the way through, and it isn't clear that the constraint based solution allows that.
Also, eventually I would think since would be implemented in the protocol as a time comparison, i.e. something like t_inactive >= timestamp. Eh, this all seems fine actually, I just thought I'd bring it up for further discussion.
src/common/libjob/state.c
Outdated
| { FLUX_JOB_STATE_RUN, "RUN", "run", "R", "r" }, | ||
| { FLUX_JOB_STATE_CLEANUP, "CLEANUP", "cleanup", "C", "c" }, | ||
| { FLUX_JOB_STATE_INACTIVE, "INACTIVE", "inactive", "I", "i" }, | ||
| { FLUX_JOB_STATE_PENDING, "PENDING", "pending", "PE", "pe" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably PD is a much more common abbreviation for PENDING, or maybe there was a motive for choosing only the first two letters that isn't mentioned here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no particular reason, just picked it somewhat at random :-) PD is good.
src/common/libjob/state.c
Outdated
| { FLUX_JOB_STATE_INACTIVE, "INACTIVE", "inactive", "I", "i" }, | ||
| { FLUX_JOB_STATE_PENDING, "PENDING", "pending", "PE", "pe" }, | ||
| { FLUX_JOB_STATE_RUNNING, "RUNNING", "running", "RU", "ru" }, | ||
| { FLUX_JOB_STATE_ACTIVE, "ACTIVE", "active", "AC", "ac" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not A for active to match I for inactive? (Perhaps the idea is that virtual states are always two letters, but this isn't mentioned in the commit message or comments so it may cause confusion why ACTIVE is abbreviated differently than INACTIVE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sounds good.
| idsync.c \ | ||
| stats.h \ | ||
| stats.c | ||
| stats.c \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message: "add an intially job matching library" -> "add an initial job matching library" ?
src/common/libjob/list.c
Outdated
| "results", 0, | ||
| "attrs", o))) { | ||
| "attrs", o, | ||
| "constraints", c))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there's a mix of constraint vs constraints throughout this file. What is the difference? It wasn't obvious reading the diff anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to just be an inconsistency I picked somewhat at random. I'll correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whew, good catch, at first I was like "how do the tests even work?" only to realize "oh yeah, we deprecated these functions", so I guess its not really tested at all.
Yeah, perhaps I should fold in the timestamp support into this PR. I had originally planned for it to be separate, but given what I've worked on with it so far, it's not as big a deal as I thought it would be. |
|
Acually, might be nice to keep it separate, release-notes wise. It makes sense that you'll eventually add support for timestamps, in which case Edit: I was thinking one use case for |
|
ya know, maybe |
Yeah, that's what I was trying to suggest above :-) |
c1bb5f0 to
51a68db
Compare
|
Re-pushed with the following changes:
|
grondo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I keep going through to review this and end up getting interrupted. Since it is working, my inclination is just get this in and we can deal with any unforeseen issues as we encounter them (though it doesn't appear there will be any)
Before that, I have one issue with the timestamp constraints (noted inline). I'm not sure how I feel about this one so I'm on the fence, but though maybe some discussion would be beneficial.
src/modules/job-list/match.c
Outdated
| type, | ||
| MATCH_LESS_THAN, | ||
| errp); | ||
| else /* if no operation specified >= is default */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize there isn't much point in supporting "=" for timestamps, but it feels weird that >= is the default here.
Maybe just support '=' and have that be the default, knowing that in most real use cases a bare timestamp value or = would never actually be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I picked >= since I figured it was the "best" of all the choices.
Defaulting to = does seem like an interesting idea and would make things consistent to everything else in the constraints. However, I guess my concern would be if someone did a constraint like {"t_inactive": [ "NUMBER" ]} (no oeprator), basically this constraint would never return anything and a user could be confused?? I dunno if that's better or worse.
I almost feel that just requiring ">", "<", ">=", or "<=" would be better. What do you think of just making it a requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go either way. I like making it a requirement better than having a default of >=. I don't personally think users are going to be constructing these constraints by hand, so I don't think allowing {"t_inactive": [ "SOMETHING" ]} would be confusing per se (but it would be difficult to test).
ff3de31 to
06269c9
Compare
|
rebased and re-pushed an update in which all of the timestamps now require a comparison operator. A minor fallout of this is that all timestamp values must be strings now, i.e. `">=1234.5", b/c we always require the comparison operator. I doubt that's a big deal, (it's no big deal in atleast python). I think it does make the comparison more clear. Tests updated as a result. also we should probably merge flux-framework/rfc#377 first. It's mostly a technicality, since I do something in this PR that the RFC clarifies. |
06269c9 to
bc7ba9d
Compare
garlick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general question: Is there a benefit in converting the RFC 31 JSON constraint object to struct list_constraint as opposed to just leaving it as a JSON object and using jansson methods to manipulate it? A large fraction of the matching library seems to be constructors, destructors, and conversion.
I was mostly following design pattern used in `librlist'. Given the current matching that is supported, the use is perhaps a tad limited given its just a bunch of string compares and what not. However, later on there will be things like hostlist filtering, where we will need to convert possible hostlist ranges and things like that into their appropriate internal data structures. |
Problem: It would be convenient if virtual job states of "pending", "running", and "active" could be parsed and handled by flux_job_statetostr() and flux_job_strtostate(). Solution: Add virtual states to the table of states that can be parsed and handled by flux_job_statetostr() and flux_job_strtostate(). Update unit tests for coverage as well.
Problem: In the future we would like to use RFC31 constraints to filter jobs rather than the current implementation. It would allow users to write far more expressive job filtering queries than can currently be done. Add job matching library to parse and match jobs against constraint objects. Add unit tests.
Problem: In job-list we store jobs on three lists, pending, running, and inactive. If we begin to use RFC31 constraints to filter jobs, it would be inefficient to scan all three lists for every job list query. Solution: Add a job state matching library. It will determine if jobs in the pending, running, or inactive state are possible given the job constraint sent with the job-list query. For example: states=pending AND userid=42 In the above constraint, a job in a pending state can possibly be matched to this constraint, so it is worthwhile to scan the pending job list for jobs with userid=42. However, it is impossible for a job in running or inactive state to match at all, so they should not be scanned. This will allow job-list queries to run more efficiently if entire lists can be skipped if it is impossible for any jobs on those lists to be matched to a constraint. Add unit tests.
Problem: Using RFC31 constraints to match jobs would allow us to support many new filtering and query opportunities in job-list. Solution: Convert job-list queries to use constraints for filtering instead of the earlier solution. This change breaks the old filtering RPC protocol. The "userid", "states", "results", "name", and "queue" RPC fields are no longer supported. Update callers in libjob, flux-job, flux-top, job-archive, python JobList and in the testsuite.
Problem: Some additional job-list constraint tests would be useful. Add more tests in t2260-job-list.t.
grondo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just went through this again and did some extra testing and LGTM!
|
awesome, thanks! I know it was a big PR. |
bc7ba9d to
589a8f2
Compare
|
setting MWP |
Codecov Report
@@ Coverage Diff @@
## master #5126 +/- ##
==========================================
- Coverage 83.74% 83.15% -0.60%
==========================================
Files 460 456 -4
Lines 77017 78246 +1229
==========================================
+ Hits 64501 65066 +565
- Misses 12516 13180 +664
|
This is a super early WIP, where I'm just building up the matching library to use within
job-list, so it isn't used at all for actual queries yet.The main reason I'm posting is I wanted to highlight these functions in
match.[ch]and see if anyone sees a problem with it or if its overly dumb with what I want to do.Basically, given some constraint that is passed in, I'd like to know if it is possible / impossible for any jobs with a specific job state to match it. e.g. the constraint
states:pending userid:42 queue:batchreturns true if the job state is pending and it doesn't matter what the queue/userid are. There could be no jobs for userid 42 injob-listat all and it will return true.This is so we can skip iterating and filtering on possibly long lists of pending/running/inactive jobs. (Note, I also need to add an equivalent one to these for
sincematching to end iterating on the inactive list).To implement this, this special constraint treats everything that isn't a
statesconstraint as true all of the time. So we are only testing just thestatesconstraint and nothing else. Some special handling of conditionals has to be done too. e.g.userid:42and-userid:42both need to return true, because we don't care aboutuserid:42.Here's the core code as an example of what I'm doing in this function (removing some error checking to make this example simpler).
As a complete aside as this might come up as a question, "instead of having a new function state_constraint_create() could you just use the normal constraint from before?", and the answer is yes. That was my round 1 prototype, but I found the code to be too complex / irritating so I shifted to this one.