Skip to content

Commit e2efc38

Browse files
authored
Fixes lf_rollover_percentage query (#6493)
After working on #6489 I wanted to do the same fixes for `lf_rollover_percentage`. So check there for the description of the fixes. I noticed a problem: when there is no data for a period for either LF or Meta (a case that happens when no job is running in one of the clusters) it skip returning those times, so it is zero for both LF and Meta in the graph. The user can't know if this was period where 100% was in Meta or LF fleet, as both appear zeroed. There is no merge with itself strategy that can deal with this edge case, as it will always match at least the same row, so outer join does not have an effect. This is not a problem for the experiment query, where 0 means no job running in the experiment or no job running at all. So, I created two selects, one with LF rows and one with Meta rows, then outer joined them. This allows to have data for the period where one of the fleets are zero. But it does not allow to have the complementary graph (% at meta, % at lf). So to avoid overcomplicating this query with two merges and then combining both tables, I return only one of the %s. The choice for the LF fleet is given the title of the graph "Percentage of Jobs rolled over Linux Foundation". ![Screenshot 2025-04-02 at 17 17 36](https://github.com/user-attachments/assets/0e62d6bb-864b-42aa-906b-4446d347e5d9)
1 parent 8541b42 commit e2efc38

File tree

1 file changed

+40
-32
lines changed
  • torchci/clickhouse_queries/lf_rollover_percentage

1 file changed

+40
-32
lines changed

torchci/clickhouse_queries/lf_rollover_percentage/query.sql

Lines changed: 40 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ WITH
44
l AS label,
55
extract(j.name, '[^,]*') AS job_name, -- Remove shard number and label from job names
66
j.workflow_name,
7-
toStartOfInterval(j.started_at, INTERVAL 1 HOUR) AS bucket
7+
toStartOfInterval(j.created_at, INTERVAL 1 HOUR) AS bucket
88
FROM
99
-- Deliberatly not adding FINAL to this workflow_job.
1010
-- Risks of not using it:
@@ -32,16 +32,13 @@ WITH
3232
FROM
3333
normalized_jobs AS j
3434
WHERE
35-
j.label LIKE 'lf%'
35+
j.label LIKE 'lf.%'
3636
),
37-
-- filter jobs down to the ones that ran in both
38-
-- LF and Meta fleets
3937
comparable_jobs AS (
4038
SELECT
4139
j.bucket,
4240
j.label,
4341
j.job_name,
44-
-- Remove shard number and label from job names
4542
j.workflow_name
4643
FROM
4744
normalized_jobs AS j
@@ -50,42 +47,53 @@ WITH
5047
),
5148
success_stats AS (
5249
SELECT
53-
bucket,
5450
count(*) AS group_size,
55-
job_name,
56-
workflow_name,
57-
label,
51+
bucket,
52+
replaceOne(label, 'lf.', '') AS label_ref,
5853
if(substring(label, 1, 3) = 'lf.', True, False) AS lf_fleet
5954
FROM
6055
comparable_jobs
6156
GROUP BY
62-
bucket, job_name, workflow_name, label
57+
bucket, label_ref, lf_fleet
58+
),
59+
lf_success_stats AS (
60+
SELECT
61+
*
62+
FROM
63+
success_stats
64+
WHERE
65+
lf_fleet = True
66+
),
67+
meta_success_stats AS (
68+
SELECT
69+
*
70+
FROM
71+
success_stats
72+
WHERE
73+
lf_fleet = False
6374
),
6475
comparison_stats AS (
6576
SELECT
66-
lf.bucket,
67-
SUM(lf.group_size + m.group_size) AS total_jobs,
68-
SUM(m.group_size) AS compliment_jobs,
69-
SUM(lf.group_size) AS counted_jobs,
70-
m.lf_fleet AS c_fleet,
71-
lf.lf_fleet AS m_fleet,
77+
-- *
78+
greatest(lf.bucket, m.bucket) AS bucket,
7279
CAST(SUM(lf.group_size) AS Float32) / SUM(lf.group_size + m.group_size) * 100 AS percentage,
73-
IF(lf.lf_fleet, 'Linux Foundation', 'Meta') AS fleet
80+
-- IF(lf.lf_fleet, 'Linux Foundation', 'Meta') AS fleet
81+
'Linux Fundation' AS fleet
7482
FROM
75-
success_stats AS lf
76-
INNER JOIN
77-
success_stats AS m ON lf.bucket = m.bucket
78-
WHERE
79-
lf.job_name = m.job_name
80-
AND lf.workflow_name = m.workflow_name
81-
AND (
82-
(lf.lf_fleet = 1 AND m.lf_fleet = 0)
83-
OR (lf.lf_fleet = 0 AND m.lf_fleet = 1)
84-
)
85-
AND lf.group_size > 3
86-
AND m.group_size > 3
83+
lf_success_stats AS lf
84+
FULL OUTER JOIN
85+
meta_success_stats AS m
86+
ON
87+
lf.label_ref = m.label_ref
88+
AND lf.bucket = m.bucket
8789
GROUP BY
88-
lf.bucket, lf.lf_fleet, m.lf_fleet
90+
bucket
8991
)
90-
SELECT * FROM comparison_stats
91-
ORDER BY bucket DESC, fleet
92+
SELECT
93+
bucket,
94+
fleet,
95+
avg(percentage) OVER (ORDER BY bucket DESC ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) AS percentage
96+
FROM
97+
comparison_stats
98+
ORDER BY
99+
bucket DESC

0 commit comments

Comments
 (0)