Skip to content

Commit 3f37025

Browse files
authored
Quorum queues (#1706)
* Test queue.declare method with quorum type [#154472130] * Cosmetics [#154472130] * Start quorum queue Includes ra as a rabbit dependency [#154472152] * Update info and list operations to use quorum queues Basic implementation. Might need an update when more functionality is added to the quorum queues. [#154472152] * Stop quorum queue [#154472158] * Restart quorum queue [#154472164] * Introduce UId in ra config to support newer version of ra Improved ra stop [#154472158] * Put data inside VHost specific subdirs [#154472164] * Include ra in rabbit deps to support stop_app/start_app command [#154472164] * Stop quorum queues in `rabbit_amqqueue:stop/1` [#154472158] * Revert creation of fifo ets table inside rabbit Now supported by ra [#154472158] * Filter quorum queues [#154472158] * Test restart node with quorum queues [#154472164] * Publish to quorum queues [#154472174] * Use `ra:restart_node/1` [#154472164] * Wait for stats to be published when querying quorum queues [#154472174] * Test publish and queue length after restart [#154472174] * Consume messages from quorum queues with basic.get [#154472211] * Autoack messages from quorum queues on basic.get [#154472211] * Fix no_ack meaning no_ack = true is equivalent to autoack [#154472211] * Use data_dir as provided in the config If we modify the data_dir, ra is not able to delete the data when a queue is deleted [#154472158] * Remove unused code/variables [#154472158] * Subscribe to a quorum queue Supports auto-ack [#154472215] * Ack messages consumed from quorum queues [#154472221] * Nack messages consumed from quorum queues [#154804608] * Use delivery tag as consumer tag for basic.get in quorum queues [#154472221] * Support for publisher confirms in quorum queues [#154472198] * Integrate with ra_fifo_client * Clear queue state on queue.delete [#154472158] * Fix quorum nack [#154804608] * Test redelivery after nack [#154804608] * Nack without requeueing [#154472225] * Test multiple acks [#154804208] * Test multiple nacks [#154804314] * Configure dead letter exchange with queue declare [#155076661] * Use a per-vhost process to handle dead-lettering Needs to hold state for quorum queues [#155401802] * Implement dead-lettering on nack'ed messages [#154804620] * Use queue name as a resource on message delivery Fixes a previously introduced bug [#154804608] * Handle ra events on dead letter process [#155401802] * Pass empty queue states to queue delete Queue deletion on vhost deletion calls directly to rabbit_amqqueue. Queue states are not available, but we can provide an empty map as in deletion the states are only needed for cleanup. * Generate quorum queue stats and events Consumer delete events are still pending, as depend on basic.cancel (not implemented yet), ra terminating or ra detecting channel down [#154472241] * Ensure quorum mapping entries are available before metric emission [#154472241] * Configure data_dir, uses new RABBITMQ_QUORUM_BASE env var [#154472152] * Use untracked enqueues when sending wihtout channel Updated several other calls missed during the quorum implementation * Revert "Configure data_dir, uses new RABBITMQ_QUORUM_BASE env var" This reverts commit f226121. * Configure data_dir, uses new RABBITMQ_QUORUM_DIR based on mnesia dir [#154472152] * Fix get_quorum_state * Fix calculation of quorum pids * Move all quorum queues code to its own module [#154472241] * Return an error when declaring a quorum queue with an incompatible argument [#154521696] * Cleanup of quorum queue state after queue delete Also fixes some existing problems where the state wasn't properly stored [#155458625] * Revert Revert "Declare a quorum queue using the queue.declare method" * Remove duplicated state info [#154472241] * Start/stop multi-node quorum queue [#154472231] [#154472236] * Restart nodes in a multi-node quorum cluster [#154472238] * Test restart and leadership takeover on multiple nodes [#154472238] * Wait for leader down after deleting a quorum cluster It ensures an smooth delete-declare sequence without race conditions. The test included here detected the situation before the fix. [#154472236] * Populate quorum_mapping from mnesia when not available Ensures that leader nodes that don't have direct requests can get the mapping ra name -> queue name * Cosmetics * Do not emit core metrics if queue has just been deleted * Use rabbit_mnesia:is_process_alive Fixes bug introduced by cac9583 [#154472231] * Only try to report stats if quorum process is alive * Implement cancel consumer callback Deletes metrics and sends consumer deleted event * Remove unnecessary trigger election call ra:restart_node has already been called during the recovery * Apply cancellation callback on node hosting the channel * Cosmetics * Read new fifo metrics which store directly total, ready and unack * Implement basic.cancel for quorum queues * Store leader in amqqueue record, report all in stats [#154472407] * Declare quorum queue in mnesia before starting the ra cluster Record needs to be stored first to update the leader on ra effects * Revert * Purge quorum queues [#154472182] * Improve use of untracked_enqueue Choose the persisted leader id instead of just using the id of the leader at point of creation. * Store quorum leader in the pid field of amqqueue record Same as mirrored queues, no real need for an additional field * Improve recovery When a ra node has never been started on a rabbit node ensure it doesn't fail but instead rebuilds the config and starts the node as a new node. Also fix issue when a quorum queue is declared when one of it's rabbit nodes are unavailable. [#157054606] * Cleanup core metrics after leader change [#157054473] * Return an error on sync_queue on quorum queues [#154472334] * Return an error on cancel_sync_queue on quorum queues [#154472337] * Fix basic_cancel and basic_consume return values Ensure the quorum queue state is always returned by these functions. * Restore arity of amqqeueu delete and purge functions. This avoids some breaking changes in the cli. * Fix bug returning consumers. * remove rogue debug log * Integrate ingress flow control with quorum queues [#157000583] * Configure commands soft limit [#157000583] * Support quorum pids on rabbit_mnesia:is_process_alive * Publish consumers metric for quorum queues * Whitelist quorum directory in is_virgin_node Allow the quorum directoy to exist without affecting the status of the Rabbit node. * Delete queue_metrics on leader change. Also run the become_leader handler in a separate process to avoid blocking. [#157424225] * Report cluster status in quorum queue infos. New per node status command. Related to [#157146500] * Remove quorum_mapping table As we can store the full queue name resource as the cluster id of the ra_fifo_client state we can avoid needed the quorum_mapping table. * Fix xref issue * Provide quorum members information in stats [#157146500] * fix unused variable * quorum queue multiple declare handling Extend rabbit_amqqueue:internal_declare/2 to indicate if the queue record was created or exisiting. From this we can then provide a code path that should handle concurrent queue declares of the same quorum queue. * Return an error when declaring exclusive/auto-delete quorum queue [#157472160] * Restore lost changes from 79c9bd2 * recover another part of commit * fixup cherry pick * Ra io/file metrics handler and stats publishing [#157193081] * Revert "Ra io/file metrics handler and stats publishing" This reverts commit 05d15c7. * Do not issue confirms on node down for quorum queues. Only a ra_event should be used to issue positive confirms for a quorum queue. * Ra stats publishing [#157193081] * Pick consumer utilisation from ra data [#155402726] * Handle error when deleting a quorum queue and all nodes are already down This is in fact a successful deletion as all raft nodes are already 'stopped' [#158656366] * Return an error when declaring non-durable quorum queues [#158656454] * Rename dirty_query to committed_query * Delete stats on leader node [#158661152] * Give full list of nodes to fifo client * Handle timeout in quorum basic_get * Fix unused variable error * Handle timeout in basic get [#158656366] * Force GC after purge [#158789389] * Increase `ra:delete_cluster` timeout to 120s * Revert "Force GC after purge" This reverts commit 5c98bf2. * Add quorum member command [#157481599] * Delete quorum member command [#157481599] * Implement basic.recover for quorum queues [#157597411] * Change concumer utilisation to use the new ra_fifo table and api. * Set max quorum queue size limit Defaults to 7, can be configured per queue on queue.declare Nodes are selected randomly from the list of nodes, but the one that is executing the queue.declare command [#159338081] * remove potentially unrelated changes to rabbit_networking * Move ra_fifo to rabbit Copied ra_fifo to rabbit and renamed it rabbit_fifo. [#159338031] * rabbit_fifo tidy up * rabbit_fifo tidy up * rabbit_fifo: customer -> consumer rename * Move ra_fifo tests [#159338031] * Tweak quorum_queue defaults * quorum_queue test reliability * Optimise quorum_queue test suite. By only starting a rabbit cluster per group rather than test. [#160612638] * Renamings in line with ra API changes * rabbit_fifo fixes * Update with ra API changes Ra has consolidated and simplified it's api. These changes update to confirm to that. * Update rabbit_fifo with latest ra changes * Clean up out of date comment * Return map of states * Add test case for basic.get on an empty queue Before the previous patch, any subsequent basic.get would crash as the map of states had been replaced by a single state. * Clarify use of deliver tags on record_sent * Clean up queues after testcase * Remove erlang monitor of quorum queues in rabbit_channel The eol event can be used instead * Use macros to make clearer distinctions between quorum/classic queues Cosmetic only * Erase queue stats on 'eol' event * Update to follow Ra's cluster_id -> cluster_name rename. * Rename qourum-cluster-size To quorum-initial-group-size * Issue confirms on quorum queue eol Also avoid creating quorum queue session state on queue operation methods. * Only classic queues should be notified on channel down * Quorum queues do not support global qos Exit with protocol error of a basic.consume for a quorum queue is issued on a channel with global qos enabled. * unused variable name * Refactoring Strictly enfornce that channels do not monitor quorum queues. * Refactor foreach_per_queue in the channel. To make it call classic and quorum queues the same way. [#161314899] * rename function * Query classic and quorum queues separately during recovery as they should not be marked as stopped during failed vhost recovery. * Remove force_event_refresh function As the only user of this function, the management API no longer requires it. * fix errors * Remove created_at from amqqueue record [#161343680] * rabbit_fifo: support AMQP 1.0 consumer credit This change implements an alternative consumer credit mechanism similar to AMQP 1.0 link credit where the credit (prefetch) isn't automatically topped up as deliveries are settled and instead needs to be manually increased using a credit command. This is to be integrated with the AMQP 1.0 plugin. [#161256187] * Add basic.credit support for quorum queues. Added support for AMQP 1.0 transfer flow control. [#161256187] * Make quorum queue recover idempotent So that if a vhost crashes and runs the recover steps it doesn't fail because ra servers are still running. [#161343651] * Add tests for vhost deletion To ensure quorum queues are cleaned up on vhost removal. Also fix xref issue. [#161343673] * remove unused clause * always return latest value of queue * Add rabbitmq-queues scripts. Remove ra config from .bat scripts. * Return error if trying to get quorum status of a classic queue.
1 parent 8ed6b44 commit 3f37025

35 files changed

+6623
-395
lines changed

Makefile

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,8 @@ define PROJECT_ENV
9696
%% see rabbitmq-server#143,
9797
%% rabbitmq-server#949, rabbitmq-server#1098
9898
{credit_flow_default_credit, {400, 200}},
99+
{quorum_commands_soft_limit, 256},
100+
{quorum_cluster_size, 5},
99101
%% see rabbitmq-server#248
100102
%% and rabbitmq-server#667
101103
{channel_operation_timeout, 15000},
@@ -127,13 +129,14 @@ define PROJECT_ENV
127129
%% vhost had to shut down, see server#1158 and server#1280
128130
{vhost_restart_strategy, continue},
129131
%% {global, prefetch count}
130-
{default_consumer_prefetch, {false, 0}}
132+
{default_consumer_prefetch, {false, 0}},
133+
{channel_queue_cleanup_interval, 60000}
131134
]
132135
endef
133136

134137
LOCAL_DEPS = sasl mnesia os_mon inets
135138
BUILD_DEPS = rabbitmq_cli syslog
136-
DEPS = ranch lager rabbit_common
139+
DEPS = ranch syslog lager rabbit_common ra
137140
TEST_DEPS = rabbitmq_ct_helpers rabbitmq_ct_client_helpers amqp_client meck proper
138141

139142
dep_syslog = git https://github.com/schlagert/syslog 3.4.5

scripts/rabbitmq-env

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,7 @@ DEFAULT_NODE_PORT=5672
245245
[ "x" = "x$RABBITMQ_SERVER_CODE_PATH" ] && RABBITMQ_SERVER_CODE_PATH=${SERVER_CODE_PATH}
246246
[ "x" = "x$RABBITMQ_MNESIA_DIR" ] && RABBITMQ_MNESIA_DIR=${MNESIA_DIR}
247247
[ "x" = "x$RABBITMQ_MNESIA_DIR" ] && RABBITMQ_MNESIA_DIR=${RABBITMQ_MNESIA_BASE}/${RABBITMQ_NODENAME}
248+
[ "x" = "x$RABBITMQ_QUORUM_DIR" ] && RABBITMQ_QUORUM_DIR=${RABBITMQ_MNESIA_DIR}/quorum
248249
[ "x" = "x$RABBITMQ_GENERATED_CONFIG_DIR" ] && RABBITMQ_GENERATED_CONFIG_DIR=${GENERATED_CONFIG_DIR}
249250
[ "x" = "x$RABBITMQ_ADVANCED_CONFIG_FILE" ] && RABBITMQ_ADVANCED_CONFIG_FILE=${ADVANCED_CONFIG_FILE}
250251
[ "x" = "x$RABBITMQ_SCHEMA_DIR" ] && RABBITMQ_SCHEMA_DIR=${SCHEMA_DIR}
@@ -255,7 +256,8 @@ rmq_normalize_path_var \
255256
RABBITMQ_CONFIG_FILE \
256257
RABBITMQ_LOG_BASE \
257258
RABBITMQ_MNESIA_BASE \
258-
RABBITMQ_MNESIA_DIR
259+
RABBITMQ_MNESIA_DIR \
260+
RABBITMQ_QUORUM_DIR
259261

260262
[ "x" = "x$RABBITMQ_PID_FILE" ] && RABBITMQ_PID_FILE="$PID_FILE"
261263

@@ -349,6 +351,10 @@ if [ "${RABBITMQ_DEV_ENV}" ]; then
349351
"$RABBITMQ_MNESIA_DIR_source" != 'environment' ]; then
350352
RABBITMQ_MNESIA_DIR="${mnesia_dir}"
351353
fi
354+
if [ "${mnesia_dir}" -a \
355+
"$RABBITMQ_QUORUM_DIR_source" != 'environment' ]; then
356+
RABBITMQ_QUORUM_DIR="${mnesia_dir}/quorum"
357+
fi
352358
fi
353359

354360
if path_contains_existing_directory "${RABBITMQ_PLUGINS_DIR}" ; then

scripts/rabbitmq-queues

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
#!/bin/sh
2+
## The contents of this file are subject to the Mozilla Public License
3+
## Version 1.1 (the "License"); you may not use this file except in
4+
## compliance with the License. You may obtain a copy of the License
5+
## at http://www.mozilla.org/MPL/
6+
##
7+
## Software distributed under the License is distributed on an "AS IS"
8+
## basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
9+
## the License for the specific language governing rights and
10+
## limitations under the License.
11+
##
12+
## The Original Code is RabbitMQ.
13+
##
14+
## The Initial Developer of the Original Code is GoPivotal, Inc.
15+
## Copyright (c) 2007-2017 Pivotal Software, Inc. All rights reserved.
16+
##
17+
18+
# Exit immediately if a pipeline, which may consist of a single simple command,
19+
# a list, or a compound command returns a non-zero status
20+
set -e
21+
22+
# Each variable or function that is created or modified is given the export
23+
# attribute and marked for export to the environment of subsequent commands.
24+
set -a
25+
26+
# shellcheck source=/dev/null
27+
#
28+
# TODO: when shellcheck adds support for relative paths, change to
29+
# shellcheck source=./rabbitmq-env
30+
. "${0%/*}"/rabbitmq-env
31+
32+
run_escript rabbitmqctl_escript "${ESCRIPT_DIR:?must be defined}"/rabbitmq-queues "$@"

scripts/rabbitmq-queues.bat

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
@echo off
2+
REM The contents of this file are subject to the Mozilla Public License
3+
REM Version 1.1 (the "License"); you may not use this file except in
4+
REM compliance with the License. You may obtain a copy of the License
5+
REM at http://www.mozilla.org/MPL/
6+
REM
7+
REM Software distributed under the License is distributed on an "AS IS"
8+
REM basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
9+
REM the License for the specific language governing rights and
10+
REM limitations under the License.
11+
REM
12+
REM The Original Code is RabbitMQ.
13+
REM
14+
REM The Initial Developer of the Original Code is GoPivotal, Inc.
15+
REM Copyright (c) 2007-2015 Pivotal Software, Inc. All rights reserved.
16+
REM
17+
18+
REM Scopes the variables to the current batch file
19+
setlocal
20+
21+
rem Preserve values that might contain exclamation marks before
22+
rem enabling delayed expansion
23+
set TDP0=%~dp0
24+
set STAR=%*
25+
setlocal enabledelayedexpansion
26+
27+
REM Get default settings with user overrides for (RABBITMQ_)<var_name>
28+
REM Non-empty defaults should be set in rabbitmq-env
29+
call "%TDP0%\rabbitmq-env.bat" %~n0
30+
31+
if not exist "!ERLANG_HOME!\bin\erl.exe" (
32+
echo.
33+
echo ******************************
34+
echo ERLANG_HOME not set correctly.
35+
echo ******************************
36+
echo.
37+
echo Please either set ERLANG_HOME to point to your Erlang installation or place the
38+
echo RabbitMQ server distribution in the Erlang lib folder.
39+
echo.
40+
exit /B 1
41+
)
42+
43+
REM Disable erl_crash.dump by default for control scripts.
44+
if not defined ERL_CRASH_DUMP_SECONDS (
45+
set ERL_CRASH_DUMP_SECONDS=0
46+
)
47+
48+
"!ERLANG_HOME!\bin\erl.exe" +B ^
49+
-boot !CLEAN_BOOT_FILE! ^
50+
-noinput -noshell -hidden -smp enable ^
51+
!RABBITMQ_CTL_ERL_ARGS! ^
52+
-kernel inet_dist_listen_min !RABBITMQ_CTL_DIST_PORT_MIN! ^
53+
-kernel inet_dist_listen_max !RABBITMQ_CTL_DIST_PORT_MAX! ^
54+
-sasl errlog_type error ^
55+
-mnesia dir \""!RABBITMQ_MNESIA_DIR:\=/!"\" ^
56+
-nodename !RABBITMQ_NODENAME! ^
57+
-run escript start ^
58+
-escript main rabbitmqctl_escript ^
59+
-extra "%RABBITMQ_HOME%\escript\rabbitmq-queues" !STAR!
60+
61+
if ERRORLEVEL 1 (
62+
exit /B 1
63+
)
64+
65+
endlocal
66+
endlocal

scripts/rabbitmq-server

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,7 @@ start_rabbitmq_server() {
311311
-os_mon start_disksup false \
312312
-os_mon start_memsup false \
313313
-mnesia dir "\"${RABBITMQ_MNESIA_DIR}\"" \
314+
-ra data_dir "\"${RABBITMQ_QUORUM_DIR}\"" \
314315
${RABBITMQ_SERVER_START_ARGS} \
315316
${RABBITMQ_DIST_ARG} \
316317
"$@"

scripts/rabbitmq-server.bat

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,7 @@ if "!ENV_OK!"=="false" (
256256
-os_mon start_disksup false ^
257257
-os_mon start_memsup false ^
258258
-mnesia dir \""!RABBITMQ_MNESIA_DIR:\=/!"\" ^
259+
-ra data_dir \""!RABBITMQ_QUORUM_DIR:\=/!"\" ^
259260
!RABBITMQ_SERVER_START_ARGS! ^
260261
!RABBITMQ_DIST_ARG! ^
261262
!STAR!

scripts/rabbitmq-service.bat

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,7 @@ set ERLANG_SERVICE_ARGUMENTS= ^
330330
-os_mon start_disksup false ^
331331
-os_mon start_memsup false ^
332332
-mnesia dir \""!RABBITMQ_MNESIA_DIR:\=/!"\" ^
333+
-ra data_dir \""!RABBITMQ_QUORUM_DIR:\=/!"\" ^
333334
!RABBITMQ_SERVER_START_ARGS! ^
334335
!RABBITMQ_DIST_ARG! ^
335336
!STARVAR!

src/rabbit.erl

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
-export([start/0, boot/0, stop/0,
2626
stop_and_halt/0, await_startup/0, await_startup/1,
2727
status/0, is_running/0, alarms/0,
28-
is_running/1, environment/0, rotate_logs/0, force_event_refresh/1,
28+
is_running/1, environment/0, rotate_logs/0,
2929
start_fhc/0]).
3030

3131
-export([start/2, stop/1, prep_stop/1]).
@@ -225,7 +225,7 @@
225225
-include("rabbit_framing.hrl").
226226
-include("rabbit.hrl").
227227

228-
-define(APPS, [os_mon, mnesia, rabbit_common, rabbit]).
228+
-define(APPS, [os_mon, mnesia, rabbit_common, ra, rabbit]).
229229

230230
-define(ASYNC_THREADS_WARNING_THRESHOLD, 8).
231231

@@ -252,7 +252,6 @@
252252
-spec is_running(node()) -> boolean().
253253
-spec environment() -> [{param(), term()}].
254254
-spec rotate_logs() -> rabbit_types:ok_or_error(any()).
255-
-spec force_event_refresh(reference()) -> 'ok'.
256255

257256
-spec log_locations() -> [log_location()].
258257

@@ -941,12 +940,6 @@ start_logger() ->
941940
log_locations() ->
942941
rabbit_lager:log_locations().
943942

944-
force_event_refresh(Ref) ->
945-
rabbit_direct:force_event_refresh(Ref),
946-
rabbit_networking:force_connection_event_refresh(Ref),
947-
rabbit_channel:force_event_refresh(Ref),
948-
rabbit_amqqueue:force_event_refresh(Ref).
949-
950943
%%---------------------------------------------------------------------------
951944
%% misc
952945

0 commit comments

Comments
 (0)