forked from tarantool/tarantool
-
Notifications
You must be signed in to change notification settings - Fork 0
qsync testing plan
Sergey Bronnikov edited this page Jun 15, 2020
·
17 revisions
current state of RFC - 15.06.2020 (https://github.com/tarantool/tarantool/commit/a0236e5891f97426a62634557560c4adf32fc967)
- [RFC, summary] switch async replicas into sync ones and vice versa, expected success and data consistency on a leader and replicas
- [RFC, summary] switch from leader to replica and vice versa, expected success and data consistency on a leader and replicas
- [RFC, quorum commit] happy path: write/read data to a leader in sync cluster, expected data consistency on a leader and replicas
- happy path: read/write data to a sync cluster with max allowed replicas number, expected success and data consistency on a leader and replicas
- [RFC, quorum commit] no quorum achieved, expected transaction rollback and data consistency on a leader and replicas
- [RFC, quorum commit] check behaviour with no answer from a replica during write, expected to set failure answer
- [RFC, quorum commit] check behaviour with failure answer from a replica during write, expected disconnect from the replication
- [RFC, quorum commit] attempt to write multiple transactions, expected the same order as on client in case of achieved quorum
- [RFC, quorum commit] attempt to write multiple transactions, expected that latest transaction that collects the quorum is considered as complete, as well as all transactions prior to it
- [RFC, quorum commit] failure on a leader transaction confirm, expected rollback and data consistency on a leader and replicas
- leader got a quorum but one replica participated in a quorum leave cluster right after answering to a leader, expected (TBD)
- [RFC, quorum commit] проверить ситуацию, когда в WAL записали и ответили SUCCESS, но потом потеряли WAL
- почитать код для rollback ("guarantee of rollback on leader and sync replicas")
- consistency on replicas on enabling and disabling sync replication (TBD)
- [RFC, connection liveness]
replication_connect_timeoutworks as expected with sync cluster (see documentation) - [RFC, connection liveness]
replication_sync_lagworks as expected with sync cluster (see documentation) - [RFC, connection liveness]
replication_sync_timeoutworks as expected with sync cluster (see documentation) - [RFC, connection liveness]
replication_timeoutworks as expected with sync cluster (see documentation) - [RFC, connection liveness]
replication_synchro_quorum_timeout - [RFC, connection liveness]
replication_synchro_quorum - [RFC, connection liveness] when Leader has no response for another heartbeat interval, it should consider the replica is lost
- [RFC, connection liveness] when leader appears in a situation it has not enough replicas to achieve quorum, it should stop accepting write requests
- [RFC, connection liveness] leader stopped to accept write requests can be switched back to write mode when configuration of a cluster updated.
- [RFC, connection liveness] some of replicas become unavailable during the quorum collection, expected - a leader should wait at most for
replication_synchro_quorum_timeoutafter which it issues a rollback pointing to the oldest TXN in the waiting list - fault injections on a different steps to fail "WAL Ok" from replica: network, disk, etc (TBD)
- test with time difference on leader and replicas, expected success
- test with a leader and a single replica in a cluster, expected ??? (TBD)
- test new cluster cli
- Testing should be done with both engines: memtx and vinyl
- How many nodes should be in a cluster?
- Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-intensive Systems Ding Yuan, Yu Luo, Xin Zhuang, Guilherme Rodrigues, Xu Zhao, Yongle Zhang, Pranay U. Jain, and Michael Stumm:
Almost all (98%) of the failures are guaranteed to manifest on no more than 3 nodes. 84% will manifest on no more than 2 nodes…. It is not necessary to have a large cluster to test for and reproduce failures.
Architecture Specifications
- Server architecture
- Feature specifications
- What's in a good specification
- Functional indexes
- Space _index structure
- R tree index quick start and usage
- LuaJIT
- Vinyl
- SQL
- Testing
- Performance
How To ...?
- ... add new fuzzers
- ... build RPM or Deb package using packpack
- ... calculate memory size
- ... debug core dump of stripped tarantool
- ... debug core from different OS
- ... debug Lua state with GDB
- ... generate new bootstrap snapshot
- ... use Address Sanitizer
- ... collect a coredump
Lua modules
Useful links