Skip to content

Commit df3184f

Browse files
committed
Add helper to override timeout for concurrent statements
Statements like `CRETE INDEX ... CONCURRENTLY` might take longer than the configured `max_migration_statement_runtime_in_seconds` value. With the helper function `VCAP::Migration.with_concurrent_timeout(self)` any following statement will use the configured `migration_psql_concurrent_statement_timeout_in_seconds` value instead. Example usage: ``` Sequel.migration do up do VCAP::Migration.with_concurrent_timeout(self) do # slow migration end end ```
1 parent ff78c15 commit df3184f

11 files changed

+93
-14
lines changed

config/cloud_controller.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -329,6 +329,7 @@ max_annotations_per_resource: 200
329329
max_labels_per_resource: 50
330330
max_migration_duration_in_minutes: 45
331331
max_migration_statement_runtime_in_seconds: 30
332+
migration_psql_concurrent_statement_timeout_in_seconds: 1800
332333
db:
333334
log_level: 'debug'
334335
ssl_verify_hostname: false

db/migrations/20231113105256_add_service_plan_id_index.rb

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,14 @@
22
no_transaction # to use the 'concurrently' option
33

44
up do
5-
add_index :service_plan_visibilities, :service_plan_id, name: :spv_service_plan_id_index, concurrently: true if database_type == :postgres
5+
VCAP::Migration.with_concurrent_timeout(self) do
6+
add_index :service_plan_visibilities, :service_plan_id, name: :spv_service_plan_id_index, concurrently: true if database_type == :postgres
7+
end
68
end
79

810
down do
9-
drop_index :service_plan_visibilities, nil, name: :spv_service_plan_id_index, concurrently: true if database_type == :postgres
11+
VCAP::Migration.with_concurrent_timeout(self) do
12+
drop_index :service_plan_visibilities, nil, name: :spv_service_plan_id_index, concurrently: true if database_type == :postgres
13+
end
1014
end
1115
end

db/migrations/20240219113000_add_routes_space_id_index.rb

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,14 @@
22
no_transaction # to use the 'concurrently' option
33

44
up do
5-
add_index :routes, :space_id, name: :routes_space_id_index, if_not_exists: true, concurrently: true if database_type == :postgres
5+
VCAP::Migration.with_concurrent_timeout(self) do
6+
add_index :routes, :space_id, name: :routes_space_id_index, if_not_exists: true, concurrently: true if database_type == :postgres
7+
end
68
end
79

810
down do
9-
drop_index :routes, :space_id, name: :routes_space_id_index, if_exists: true, concurrently: true if database_type == :postgres
11+
VCAP::Migration.with_concurrent_timeout(self) do
12+
drop_index :routes, :space_id, name: :routes_space_id_index, if_exists: true, concurrently: true if database_type == :postgres
13+
end
1014
end
1115
end

db/migrations/20240222131500_change_delayed_jobs_reserve_index.rb

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,21 @@
33

44
up do
55
if database_type == :postgres
6-
drop_index :delayed_jobs, nil, name: :delayed_jobs_reserve, if_exists: true, concurrently: true
7-
add_index :delayed_jobs, %i[queue locked_at locked_by failed_at run_at priority],
8-
where: { failed_at: nil }, name: :delayed_jobs_reserve, if_not_exists: true, concurrently: true
6+
VCAP::Migration.with_concurrent_timeout(self) do
7+
drop_index :delayed_jobs, nil, name: :delayed_jobs_reserve, if_exists: true, concurrently: true
8+
add_index :delayed_jobs, %i[queue locked_at locked_by failed_at run_at priority],
9+
where: { failed_at: nil }, name: :delayed_jobs_reserve, if_not_exists: true, concurrently: true
10+
end
911
end
1012
end
1113

1214
down do
1315
if database_type == :postgres
14-
drop_index :delayed_jobs, nil, name: :delayed_jobs_reserve, if_exists: true, concurrently: true
15-
add_index :delayed_jobs, %i[queue locked_at locked_by failed_at run_at priority],
16-
name: :delayed_jobs_reserve, if_not_exists: true, concurrently: true
16+
VCAP::Migration.with_concurrent_timeout(self) do
17+
drop_index :delayed_jobs, nil, name: :delayed_jobs_reserve, if_exists: true, concurrently: true
18+
add_index :delayed_jobs, %i[queue locked_at locked_by failed_at run_at priority],
19+
name: :delayed_jobs_reserve, if_not_exists: true, concurrently: true
20+
end
1721
end
1822
end
1923
end

db/migrations/20240314131908_add_user_guid_to_jobs_table.rb

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,12 @@
44

55
up do
66
if database_type == :postgres
7+
db = self
78
alter_table :jobs do
89
add_column :user_guid, String, size: 255, if_not_exists: true
9-
add_index :user_guid, name: :jobs_user_guid_index, if_not_exists: true, concurrently: true
10+
VCAP::Migration.with_concurrent_timeout(db) do
11+
add_index :user_guid, name: :jobs_user_guid_index, if_not_exists: true, concurrently: true
12+
end
1013
end
1114

1215
elsif database_type == :mysql
@@ -21,8 +24,11 @@
2124

2225
down do
2326
if database_type == :postgres
27+
db = self
2428
alter_table :jobs do
25-
drop_index :user_guid, name: :jobs_user_guid_index, if_exists: true, concurrently: true
29+
VCAP::Migration.with_concurrent_timeout(db) do
30+
drop_index :user_guid, name: :jobs_user_guid_index, if_exists: true, concurrently: true
31+
end
2632
drop_column :user_guid, if_exists: true
2733
end
2834
end

lib/cloud_controller/config_schemas/base/api_schema.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ class ApiSchema < VCAP::Config
8787

8888
optional(:max_migration_duration_in_minutes) => Integer,
8989
optional(:max_migration_statement_runtime_in_seconds) => Integer,
90+
optional(:migration_psql_concurrent_statement_timeout_in_seconds) => Integer,
9091
optional(:migration_psql_worker_memory_kb) => Integer,
9192
db: {
9293
optional(:database) => Hash, # db connection hash for sequel

lib/cloud_controller/config_schemas/base/migrate_schema.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ class MigrateSchema < VCAP::Config
88
{
99
optional(:max_migration_duration_in_minutes) => Integer,
1010
optional(:max_migration_statement_runtime_in_seconds) => Integer,
11+
optional(:migration_psql_concurrent_statement_timeout_in_seconds) => Integer,
1112
optional(:migration_psql_worker_memory_kb) => Integer,
1213

1314
db: {

lib/cloud_controller/db.rb

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,8 @@ def empty_from_sql
148148
# the migration methods.
149149
module VCAP
150150
module Migration
151+
PSQL_DEFAULT_STATEMENT_TIMEOUT = 30_000
152+
151153
def self.timestamps(migration, table_key)
152154
created_at_idx = :"#{table_key}_created_at_index" if table_key
153155
updated_at_idx = :"#{table_key}_updated_at_index" if table_key
@@ -219,6 +221,20 @@ def self.uuid_function(migration)
219221
end
220222
end
221223

224+
# Concurrent migrations can take a long time to run, so this helper can be used to override 'max_migration_statement_runtime_in_seconds' for a specific migration.
225+
# REF: https://www.postgresql.org/docs/current/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
226+
def self.with_concurrent_timeout(db, &block)
227+
concurrent_timeout_seconds = VCAP::CloudController::Config.config&.get(:migration_psql_concurrent_statement_timeout_in_seconds) || PSQL_DEFAULT_STATEMENT_TIMEOUT
228+
229+
if concurrent_timeout_seconds && db.database_type == :postgres
230+
original_timeout = db.fetch("select setting from pg_settings where name = 'statement_timeout'").first[:setting]
231+
db.run("SET statement_timeout TO #{concurrent_timeout_seconds * 1000}")
232+
end
233+
block.call
234+
ensure
235+
db.run("SET statement_timeout TO #{original_timeout}") if original_timeout && db.database_type == :postgres
236+
end
237+
222238
def self.logger
223239
Steno.logger('cc.db.migrations')
224240
end

lib/cloud_controller/db_migrator.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ def initialize(db, max_migration_duration_in_minutes=nil, max_migration_statemen
1515
@timeout_in_minutes = default_two_weeks(max_migration_duration_in_minutes)
1616

1717
@max_statement_runtime_in_milliseconds = if max_migration_statement_runtime_in_seconds.nil? || max_migration_statement_runtime_in_seconds <= 0
18-
30_000
18+
VCAP::Migration::PSQL_DEFAULT_STATEMENT_TIMEOUT
1919
else
2020
max_migration_statement_runtime_in_seconds * 1000
2121
end

spec/migrations/Readme.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,10 @@ To create resilient and reliable migrations, follow these guidelines:
6868
```
6969
1. If you're writing a uniqueness constraint where some of the values can be null, remember that `null != null`. For instance, the values `[1, 1, null]` and `[1, 1, null]` are considered unique. Uniqueness constraints only work on columns that do not allow `NULL` as a value. If this is the case, change the column to disallow `NULL` and set the default to an empty string instead.
7070
1. If you need to execute different operations for MySQL and Postgres, you can check the database type as follows: `... if database_type == :postgres` or `... if database_type == :mysql`. If the differences are too big, consider writing separate migrations for each database type.
71-
1. Be sure that with real world table sizes and load each sql query will finish in reasonable time inside your migration. **There is a hard limit of 30s in place to protect against outages caused by long-running migrations**. If you need to run a long-running migration, consider breaking it up into smaller parts. If you make use of table locking, be sure to run any query with real world table sizes sub 2 seconds to not cause issues due to table locks and waiting queries. In case a single statement exceeds 30s(default), the migration is aborted. An operator can overwrite this behaviour by setting the `max_migration_statement_runtime_in_seconds` config property.
71+
1. Be sure that with real world table sizes and load each sql query will finish in reasonable time inside your migration. **There is a hard limit of 30s in place to protect against outages caused by long-running migrations**.
72+
If you need to run a long-running migration, consider breaking it up into smaller parts. If you make use of table locking, be sure to run any query with real world table sizes sub 2 seconds to not cause issues due to table locks and waiting queries.
73+
In case a single statement exceeds 30s(default), the migration is aborted. An operator can overwrite this behaviour by setting the `max_migration_statement_runtime_in_seconds` config property.
74+
Since index creation can take a long time, especially when using `CONCURRENTLY`, an operator can set the `migration_psql_concurrent_statement_timeout` config property to a higher value to allow for longer running statements.
7275

7376
# Sequel Migration Tests
7477

0 commit comments

Comments
 (0)