|
| 1 | +--- |
| 2 | +Title: Test HA failover |
| 3 | +alwaysopen: false |
| 4 | +categories: |
| 5 | +- docs |
| 6 | +- integrate |
| 7 | +- rs |
| 8 | +- rdi |
| 9 | +description: Learn how to perform HA failover testing for Redis Data Integration (RDI) to ensure high availability and reliability of your data integration setup. |
| 10 | +group: di |
| 11 | +hideListLinks: false |
| 12 | +linkTitle: Test HA failover |
| 13 | +summary: How to perform HA failover testing |
| 14 | +type: integration |
| 15 | +weight: 100 |
| 16 | +--- |
| 17 | + |
| 18 | +## Setup |
| 19 | +1. Ensure that RDI is up and running on both primary and secondary nodes. |
| 20 | + Run the following command and verify and that each instance should show healthy and running `rdi-api` and `rdi-operator` pods. |
| 21 | +``` |
| 22 | +kubectl -n rdi get pods |
| 23 | +
|
| 24 | +# Example output: |
| 25 | +NAME READY STATUS RESTARTS AGE |
| 26 | +collector-api-577d95bfd8-5wbg6 1/1 Running 0 12m |
| 27 | +collector-source-95f45bcf7-vwn5l 1/1 Running 0 12m |
| 28 | +fluentd-zq2lc 1/1 Running 0 72m |
| 29 | +logrotate-29530445-j729x 0/1 Completed 0 14m |
| 30 | +logrotate-29530450-dprr2 0/1 Completed 0 9m40s |
| 31 | +logrotate-29530455-mfmzw 0/1 Completed 0 4m40s |
| 32 | +processor-f66655469-h7nw2 1/1 Running 0 12m |
| 33 | +rdi-api-f75df6796-qwqjw 1/1 Running 0 72m |
| 34 | +rdi-metrics-exporter-d57cdf8c8-wjzb5 1/1 Running 0 72m |
| 35 | +rdi-operator-7f7f6c7dfd-5qmjd 1/1 Running 0 71m |
| 36 | +rdi-reloader-77df5f7854-lwmvz 1/1 Running 0 71m |
| 37 | +``` |
| 38 | + |
| 39 | +2. Identify the leader node - this is the one that has a running `collector-source` pod. |
| 40 | + |
| 41 | +## Performing the HA Failover Testing |
| 42 | + |
| 43 | +To perform HA, you can simulate a connection failure between the leader and the RDI database by blocking the network traffic. You can do this by running the following commands on the leader node: |
| 44 | + |
| 45 | +1. Identify the RDI database IP (replace `<hostname>` with your own hostname): |
| 46 | +``` |
| 47 | +dig +short <hostname> |
| 48 | +
|
| 49 | +# Example: |
| 50 | +# dig +short my.redis.hostname.com |
| 51 | +
|
| 52 | +# Example output: |
| 53 | +54.78.220.161 |
| 54 | +``` |
| 55 | + |
| 56 | +2. For each of the IPs returned by the above command, run the following command to block the traffic: |
| 57 | + |
| 58 | +``` |
| 59 | +sudo iptables -I FORWARD -d <database_ip> -j DROP |
| 60 | +
|
| 61 | +# With the IP from the example above, the command would be: |
| 62 | +sudo iptables -I FORWARD -d 54.78.220.161 -j DROP |
| 63 | +``` |
| 64 | + |
| 65 | + |
| 66 | +The default configuration for the leader lock is 60 seconds, so it may take up to 2 minutes for the failover to occur. |
| 67 | +Meanwhile you can follow the logs of the operator to see the failover process: |
| 68 | + |
| 69 | +``` |
| 70 | +kubectl -n rdi logs rdi-operator-7f7f6c7dfd-5qmjd -f |
| 71 | +``` |
| 72 | + |
| 73 | +In about 10 seconds you will start seeing log entries from the leader saying that it could not acquire the leadership. |
| 74 | +When the leader lock expires, the second node will acquire the leadership and you will see log entries from the second node indicating that it has become the leader. |
| 75 | + |
| 76 | +## Cleanup |
| 77 | + |
| 78 | +To clean up after the test, remove the `iptables` rule that you added to block the traffic: |
| 79 | + |
| 80 | +```sudo iptables -D FORWARD -d <databse_ip> -j DROP``` |
| 81 | + |
| 82 | +Use `sudo iptables -S | grep <database_ip>` to verify that the rule has been removed. |
0 commit comments