-
Notifications
You must be signed in to change notification settings - Fork 0
no compatible agents alerts only if waiting 30+ minutes #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4ed67e2 to
34ce660
Compare
cailyoung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some non blocking comments. Nice work.
| readonly IMetricFactory metricFactory; | ||
| readonly IConfiguration configuration; | ||
| readonly HashSet<(string buildTypeId, string buildId, string queuedDateTime)> seenBuildsNoAgents = new(); | ||
| readonly HttpClient httpClient = new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed; sometimes this isn't good, but for right now let's leave it here and not instantiate one per invocation (because I honestly don't remember which way would be better given the request pattern here!)
|
|
||
| // Check if this build has the "no compatible agents" wait reason | ||
| var noAgentsWaitReason = waitReasonsResponse?.QueuedWaitReasons?.Property | ||
| ?.FirstOrDefault(p => p.Name == "There are no idle compatible agents which can run this build"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we would extract this kind of 'magic string' out to a const with a useful name like 'NoCompatibleAgentWaitReason' - not blocker, just FYI
| } | ||
|
|
||
| var currentBuildsNoAgents = buildsNoCompatibleAgents.Select(b => (b.BuildTypeId, b.Id, b.QueuedDate.ToString("yyyy-MM-ddTHH:mm:ssZ"))).ToArray(); | ||
| seenBuildsNoAgents.UnionWith(currentBuildsNoAgents); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we aren't unioning any more? I think the absent builds won't work without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, just saw it lower down. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't impact the logic if it happens above or below. It just made sense to me for it to happen at the end of the function.
| class QueuedWaitReasonsResponse | ||
| { | ||
| public QueuedWaitReasons QueuedWaitReasons { get; set; } | ||
| } | ||
|
|
||
| class QueuedWaitReasons | ||
| { | ||
| public List<WaitReasonProperty> Property { get; set; } | ||
| } | ||
|
|
||
| class WaitReasonProperty | ||
| { | ||
| public string Name { get; set; } | ||
| public string Value { get; set; } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These might be able to be record instead, which makes all the getter/setter stuff simpler, but that's a later problem (if ever).
| if (noAgentsWaitReason != null && !string.IsNullOrEmpty(noAgentsWaitReason.Value)) | ||
| { | ||
| // Parse the wait time in milliseconds and convert to minutes | ||
| if (long.TryParse(noAgentsWaitReason.Value, out var milliseconds)) | ||
| { | ||
| var waitTimeMinutes = Math.Round(milliseconds / (60.0 * 1000.0)); | ||
|
|
||
| // Only track builds that have been waiting for more than 30 minutes | ||
| if (waitTimeMinutes > 30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we could refactor this as LINQ, but for now keep it if it makes the logic easier to understand.
[sc-130509]
This updates the
TeamCityCompatibleAgentsScraper.csto only alert if the build has been waiting for over 30 minutes.Previously, TeamCityCompatibleAgentsScraper.cs would send info about all builds with no compatible agents and sumologic would alert if any were stuck longer than 30 minutes. However, that strategy did not work because of how sumologic works.
This new strategy is expected to work.
Test
I tested this in Preprod. I set it to alert after 7 minutes of queued build with no compatible agents. After 7 minutes, the scraper correctly found the build that was stuck.
These images include debugging output that has been removed.
Output after removing the stuck build.