-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Description
Summary
I've refactored the /history endpoint to use a PostgreSQL materialized view, refreshed on a recurring timer and guarded by an advisory lock. This approach has reduced history endpoint response time from over 5 seconds to under 130 ms on a dataset of ~7.5 million records (48 hours). The advisory lock ensures only one process or node runs the refresh at a time, which is essential for preforked or multi-node environments.
Timing Results
| Implementation | Request | Status | Response Size | Time |
|---|---|---|---|---|
| Original (dynamic query) | /history | 200 | 0.8 kB | 5.21 s |
| Materialized View (MV) | /history | 200 | 0.8 kB | 127 ms |
Note: An initial or rare /history request may be slower if a refresh hasn't run recently.
Current Updated Implementation
Materialized View SQL
CREATE MATERIALIZED VIEW minion_jobs_history_hourly AS
SELECT EXTRACT(EPOCH FROM ts) AS epoch,
COALESCE(failed_jobs, 0) AS failed_jobs,
COALESCE(finished_jobs, 0) AS finished_jobs
FROM (
SELECT EXTRACT(DAY FROM finished) AS day,
EXTRACT(HOUR FROM finished) AS hour,
COUNT(*) FILTER (WHERE state = 'failed') AS failed_jobs,
COUNT(*) FILTER (WHERE state = 'finished') AS finished_jobs
FROM minion_jobs
WHERE finished > NOW() - INTERVAL '24 hours'
GROUP BY day, hour
) AS j
RIGHT OUTER JOIN (
SELECT *
FROM GENERATE_SERIES(
date_trunc('hour', NOW() - INTERVAL '24 hour'),
date_trunc('hour', NOW()),
'1 hour'
) AS ts
) AS s
ON EXTRACT(HOUR FROM ts) = j.hour AND EXTRACT(DAY FROM ts) = j.day
ORDER BY epoch ASC;
Materialized View Refresh Timer (Perl, Logging Omitted)
sub _history_refresh_timer_id {
my $self = shift;
return $self->{_history_refresh_timer_id} if $self->{_history_refresh_timer_id};
return undef unless Mojo::IOLoop->is_running;
my $interval = 300; # 5 minutes
my $hash = sha1_hex("MINION_HISTORY");
my $lock_key = hex(substr($hash, 0, 8));
my $callback = sub {
my $dbh = $self->pg->db->dbh;
my $got = $dbh->selectrow_array('SELECT pg_try_advisory_lock(?)', undef, $lock_key);
if ($got) {
eval {
$self->pg->db->query('REFRESH MATERIALIZED VIEW minion_jobs_history_hourly');
};
}
};
$self->{_history_refresh_timer_id} = Mojo::IOLoop->recurring($interval => sub {
$callback->();
});
$callback->(); # Run immediately on startup
return $self->{_history_refresh_timer_id};
}
Updated history Method
sub history {
my $self = shift;
$self->_history_refresh_timer_id; # Starts/ensures the refresh loop
my $daily = $self->pg->db->query(
"SELECT epoch, failed_jobs, finished_jobs FROM minion_jobs_history_hourly ORDER BY epoch ASC"
)->hashes->to_array;
return { daily => $daily };
}
How It Works
- When the history endpoint is hit, the periodic timer and advisory lock are initialized if not already present.
- The advisory lock is not released, but has a TTL that expires it after the interval to prevent subsequent runs from drift or parallel processes.
- Every 5 minutes (interval configurable), one process acquires the advisory lock and refreshes the materialized view—preventing redundant, overlapping refreshes across preforked or multi-node deployments.
- All stats are served from the precomputed view, providing consistent sub-150ms UI response even at multimillion-record scale.
- A priming or rarely used endpoint may briefly reflect refresh time, but subsequent calls are instant.
- Any further observations or edge case cautions for advisory lock use with periodic refresh in clustered/HA settings?
- Suggestions for a better place or way to schedule the view update?
PostgreSQL 17.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), 64-bit
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels