Skip to content

Commit b225c03

Browse files
authored
[hud][ch] fix query_execution_metrics page (#6571)
Currently [query execution metrics](https://hud.pytorch.org/query_execution_metrics) hud page shows only a couple of previous days, regardless of date range selected. This happens because in CH query log data is spread across multiple query_log tables with numeric suffix: <img width="203" alt="image" src="https://github.com/user-attachments/assets/44e03730-310e-4110-9423-1608b879bfe0" /> (apparently CH create a new table when the schema changes) The solution is to create a special "merge" table: ``` CREATE TABLE all_query_logs ENGINE = Merge('system', '^query_log(_\\d+)?$'); ``` and query it instead. This ephemeral table queries all the underlying tables under the hood and merges the results. This is not a big performance hit, as I also added correct filters (matching partitioning and table order), so only relevant data is touched. ### Testing Locally. See [the vercel preview](https://torchci-git-fix-query-execution-metrics-page-fbopensource.vercel.app/query_execution_metrics).
1 parent 931d10b commit b225c03

File tree

3 files changed

+18
-4
lines changed
  • clickhouse_db_schema/all_query_logs
  • torchci/clickhouse_queries

3 files changed

+18
-4
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
CREATE TABLE all_query_logs
2+
ENGINE = Merge('system', '^query_log(_\\d+)?$');

torchci/clickhouse_queries/query_execution_metrics/query.sql

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,15 @@ SELECT
88
count(*) as num,
99
left(query_id, -37) as name
1010
FROM
11-
clusterAllReplicas(default, system.query_log)
11+
clusterAllReplicas(default, default.all_query_logs)
1212
where
13-
event_time >= {startTime: DateTime64(3)}
13+
-- for partitioned tables
14+
toYYYYMM(event_date) >= toYYYYMM({startTime: DateTime64(3)})
15+
and toYYYYMM(event_date) <= toYYYYMM({stopTime: DateTime64(3)})
16+
-- utilize the table ordering
17+
and event_date >= toDate({startTime: DateTime64(3)})
18+
and event_date <= toDate({stopTime: DateTime64(3)})
19+
and event_time >= {startTime: DateTime64(3)}
1420
and event_time < {stopTime: DateTime64(3)}
1521
and initial_user = 'hud_user'
1622
and length(query_id) > 37

torchci/clickhouse_queries/query_execution_metrics_individual/query.sql

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,15 @@ SELECT
88
quantile(0.5)(memory_usage) as memoryBytesP50,
99
count(*) as num
1010
FROM
11-
clusterAllReplicas(default, system.query_log)
11+
clusterAllReplicas(default, default.all_query_logs)
1212
where
13-
event_time >= {startTime: DateTime64(3)}
13+
-- for partitioned tables
14+
toYYYYMM(event_date) >= toYYYYMM({startTime: DateTime64(3)})
15+
and toYYYYMM(event_date) <= toYYYYMM({stopTime: DateTime64(3)})
16+
-- utilize the table ordering
17+
and event_date >= toDate({startTime: DateTime64(3)})
18+
and event_date <= toDate({stopTime: DateTime64(3)})
19+
and event_time >= {startTime: DateTime64(3)}
1420
and event_time < {stopTime: DateTime64(3)}
1521
and initial_user = 'hud_user'
1622
and length(query_id) > 37

0 commit comments

Comments
 (0)