You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
upgrades: avoid crdb_internal.system_jobs in upgrade manager
The crdb_internal.system_jobs is a virtual table that joins
information from the jobs table and the jobs_info table. When given a
job status predicate it does this by running a query such as:
WITH latestpayload AS (
SELECT job_id, value
FROM system.job_info AS payload
WHERE info_key = 'legacy_payload'
ORDER BY written DESC
),
latestprogress AS (
SELECT job_id, value
FROM system.job_info AS progress
WHERE info_key = 'legacy_progress'
ORDER BY written DESC
)
SELECT
distinct(id), status, created,
payload.value AS payload,
progress.value AS progress,
created_by_type, created_by_id,
claim_session_id, claim_instance_id,
num_runs, last_run,job_type
FROM system.jobs AS j
INNER JOIN latestpayload AS payload ON j.id = payload.job_id
LEFT JOIN latestprogress AS progress ON j.id = progress.job_id
WHERE j.status = 'cancel-requested';
This uses 2 full scans of the job_info table:
```
• distinct
│ distinct on: id, value, value
│
└── • merge join
│ equality: (job_id) = (id)
│
├── • render
│ │
│ └── • filter
│ │ estimated row count: 2,787
│ │ filter: info_key = 'legacy_payload'
│ │
│ └── • scan
│ estimated row count: 5,597 (100% of the table; stats collected 27 minutes ago; using stats forecast for 17 minutes ago)
│ table: job_info@primary
│ spans: FULL SCAN
│
└── • merge join (right outer)
│ equality: (job_id) = (id)
│ right cols are key
│
├── • render
│ │
│ └── • filter
│ │ estimated row count: 2,787
│ │ filter: info_key = 'legacy_progress'
│ │
│ └── • scan
│ estimated row count: 5,597 (100% of the table; stats collected 27 minutes ago; using stats forecast for 17 minutes ago)
│ table: job_info@primary
│ spans: FULL SCAN
│
└── • index join
│ table: jobs@primary
│
└── • sort
│ order: +id
│
└── • scan
missing stats
table: jobs@jobs_status_created_idx
spans: [/'cancel-requested' - /'cancel-requested']
```
Previously, the upgrade manager was using this virtual table as part
of a larger query:
SELECT id, status
FROM (
SELECT id, status,
crdb_internal.pb_to_json(
'cockroach.sql.jobs.jobspb.Payload', payload, false
) AS pl
FROM crdb_internal.system_jobs
WHERE status IN ('running', 'pending', 'cancel-requested', 'pause-requested', 'reverting', 'paused')
)
WHERE pl->'migration'->'clusterVersion' = $1::JSONB;
I believe the use of the IN operator causes the virtual index's
populate function to be called for each value. Perhaps the optimizer
accounts for this in some way to avoid this resulting in 2 * 6 full
scans of the job table, but it is hard to confirm with the explain
output.
In at least one recent escalation, we observed this query taking a
substantial amount of time as it continually conflicted with other job
system queries.
Here, we avoid using the virtual table. This allows us to avoid one
full scan of the info table since we don't need the progress (only the
payload). It also allows us to use the full `IN` predicate directly,
avoiding any uncertainty.
In a local example, this is substantially faster
```
root@localhost:26257/defaultdb> SELECT id, status
-> FROM (
-> SELECT id,
-> status,
-> crdb_internal.pb_to_json(
-> 'cockroach.sql.jobs.jobspb.Payload',
-> payload,
-> false -- emit_defaults
-> ) AS pl
-> FROM crdb_internal.system_jobs
-> WHERE status IN ('running', 'pending', 'cancel-requested', 'pause-requested', 'reverting', 'paused')
-> )
-> WHERE pl->'migration'->'clusterVersion' = '{"activeVersion": {"internal": 84, "majorVal": 22, "minorVal": 2}}'::JSONB;
id | status
-----+---------
(0 rows)
Time: 384ms total (execution 384ms / network 0ms)
root@localhost:26257/defaultdb> WITH latestpayload AS (SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC)
-> SELECT id, status FROM (
-> SELECT distinct(id), status, crdb_internal.pb_to_json('cockroach.sql.jobs.jobspb.Payload', payload.value, false) AS pl
-> FROM system.jobs AS j
-> INNER JOIN latestpayload AS payload ON j.id = payload.job_id
-> WHERE status IN ('running', 'pending', 'cancel-requested', 'pause-requested', 'reverting', 'paused')
-> AND job_type = 'MIGRATION'
-> ) WHERE ((pl->'migration')->'clusterVersion') = '{"activeVersion": {"internal": 84, "majorVal": 22, "minorVal": 2}}'::JSONB;
id | status
-----+---------
(0 rows)
Time: 26ms total (execution 26ms / network 0ms)
```
We should do more work to understand contention within the job system,
but perhaps speeding up this query will help a bit.
Epic: None
Release note: None
0 commit comments