Update collector query to check for defined Site or ResourceName #192

mwestphall · 2025-07-15T21:55:25Z

The probe picked up ~30 records from a (misconfigured)? machine at ULAR with no SiteName between June 11th and 14th, which broke later processing. Removing these records allowed processing to resume. This PR adds an additional check to make sure that either site or resource is present in ingested records.

osg-cat · 2025-07-15T22:00:19Z

With this change, how will anyone know if records are being dropped if there’s no site or resource?

mwestphall · 2025-07-15T22:02:57Z

@osg-cat Another approach we could take here is to replace empty site/resource with "Unknown" or some similar indicator, would that be preferable? Records would still not get accounted to the correct site but they would at least make it out to GRACC (what GRACC does with them after that point is another question)

osg-cat · 2025-07-15T22:05:09Z

I don’t know. It’s probably a Derek question. I guess my main point is that this could use some design thinking.

matyasselmeci · 2025-07-16T14:15:40Z

osg-pilot-container/osgpilot_meter


-    filter_cond = 'SlotType != "Static"'
+    # Need at least one defined from Site and ResourceName for proper accounting
+    filter_cond = 'SlotType != "Static" && (GLIDEIN_ResourceName =!= UNDEFINED || GLIDEIN_Site =!= UNDEFINED) '


Not related to your code change, but do you know why we're excluding static slots?

brianhlin · 2025-07-16T14:48:26Z

With this change, how will anyone know if records are being dropped if there’s no site or resource?

I vote that we configure the OSPool CMs to reject any EPs that are missing GLIDEIN_Site and GLIDEIN_ResourceName. If we're getting this kind of capacity, they're clearly broken, they'll pollute our accounting + downstream reporting, and will require hacky GRACC / reporting fixes every time we get a batch of bad records.

@rynge @djw8605 thoughts?

djw8605 · 2025-07-16T15:07:44Z

Rejecting the EPs work for me. Though, I guess it's the same question from Tim... will we know they are rejected somewhere?

brianhlin · 2025-07-16T16:11:14Z

Rejecting the EPs work for me. Though, I guess it's the same question from Tim... will we know they are rejected somewhere?

I think we can set up the container images so that they bail and exit non-zero if they fail to advertise to a CM. For factory-submitted glideins, I imagine they will show up in the monitoring somehow so that the operators can fix their reconfig / go down the troubleshooting path.

To me, that's all strictly better than finding out when someone happens to look at a report where the damage is already done.

osg-cat · 2025-07-17T14:43:43Z

GitHub comments is not the place to make a real design decision for the OSPool. Let’s pause any policy changes here and get a real (and ideally brief!) design doc going.

Update collector query to check for defined Site or ResourceName

81aa005

mwestphall requested review from brianhlin and matyasselmeci July 15, 2025 21:55

matyasselmeci reviewed Jul 16, 2025

View reviewed changes

brianhlin merged commit 81aa005 into opensciencegrid:2.x Oct 14, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update collector query to check for defined Site or ResourceName #192

Update collector query to check for defined Site or ResourceName #192

Uh oh!

mwestphall commented Jul 15, 2025

Uh oh!

osg-cat commented Jul 15, 2025

Uh oh!

mwestphall commented Jul 15, 2025 •

edited

Loading

Uh oh!

osg-cat commented Jul 15, 2025

Uh oh!

matyasselmeci Jul 16, 2025

Uh oh!

brianhlin commented Jul 16, 2025

Uh oh!

djw8605 commented Jul 16, 2025

Uh oh!

brianhlin commented Jul 16, 2025

Uh oh!

osg-cat commented Jul 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Update collector query to check for defined Site or ResourceName #192

Update collector query to check for defined Site or ResourceName #192

Uh oh!

Conversation

mwestphall commented Jul 15, 2025

Uh oh!

osg-cat commented Jul 15, 2025

Uh oh!

mwestphall commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

osg-cat commented Jul 15, 2025

Uh oh!

matyasselmeci Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

brianhlin commented Jul 16, 2025

Uh oh!

djw8605 commented Jul 16, 2025

Uh oh!

brianhlin commented Jul 16, 2025

Uh oh!

osg-cat commented Jul 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mwestphall commented Jul 15, 2025 •

edited

Loading