-
Notifications
You must be signed in to change notification settings - Fork 13
Update collector query to check for defined Site or ResourceName #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update collector query to check for defined Site or ResourceName #192
Conversation
|
With this change, how will anyone know if records are being dropped if there’s no site or resource? |
|
@osg-cat Another approach we could take here is to replace empty site/resource with "Unknown" or some similar indicator, would that be preferable? Records would still not get accounted to the correct site but they would at least make it out to GRACC (what GRACC does with them after that point is another question) |
|
I don’t know. It’s probably a Derek question. I guess my main point is that this could use some design thinking. |
|
|
||
| filter_cond = 'SlotType != "Static"' | ||
| # Need at least one defined from Site and ResourceName for proper accounting | ||
| filter_cond = 'SlotType != "Static" && (GLIDEIN_ResourceName =!= UNDEFINED || GLIDEIN_Site =!= UNDEFINED) ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to your code change, but do you know why we're excluding static slots?
I vote that we configure the OSPool CMs to reject any EPs that are missing |
|
Rejecting the EPs work for me. Though, I guess it's the same question from Tim... will we know they are rejected somewhere? |
I think we can set up the container images so that they bail and exit non-zero if they fail to advertise to a CM. For factory-submitted glideins, I imagine they will show up in the monitoring somehow so that the operators can fix their reconfig / go down the troubleshooting path. To me, that's all strictly better than finding out when someone happens to look at a report where the damage is already done. |
|
GitHub comments is not the place to make a real design decision for the OSPool. Let’s pause any policy changes here and get a real (and ideally brief!) design doc going. |
The probe picked up ~30 records from a (misconfigured)? machine at ULAR with no SiteName between June 11th and 14th, which broke later processing. Removing these records allowed processing to resume. This PR adds an additional check to make sure that either site or resource is present in ingested records.