Skip to content

Commit ffb2f2f

Browse files
committed
refactor(sharding): enforce explicit rollback commands in queue
- Remove RollbackCommandGenerator and legacy fallback code - Make rollback SQL parameter required for all queue operations - Update queue schema and docs to mandate rollback commands - Add rollback commands for shard and cluster operations - Improve test coverage with explicit rollback parameters - Simplify rollback handling by assuming always-on support
1 parent 38bb129 commit ffb2f2f

File tree

12 files changed

+177
-647
lines changed

12 files changed

+177
-647
lines changed

doc/sharding/01-components.md

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -287,17 +287,18 @@ Generates reverse SQL commands for rollback operations.
287287

288288
**Usage Pattern:**
289289
```php
290-
// Automatic generation
291-
$rollback = RollbackCommandGenerator::generate("CREATE TABLE users");
292-
// Returns: "DROP TABLE IF EXISTS users"
293-
294-
// Specific generators
295-
$rollback = RollbackCommandGenerator::forCreateTable("users");
296-
$rollback = RollbackCommandGenerator::forAlterClusterAdd("c1", "users");
297-
298-
// Safety check
299-
if (RollbackCommandGenerator::isSafeToRollback($command)) {
300-
$rollback = RollbackCommandGenerator::generate($command);
290+
// Direct rollback command provision
291+
$forwardSql = "CREATE TABLE users (id bigint, name string)";
292+
$rollbackSql = "DROP TABLE IF EXISTS users";
293+
$queue->add($nodeId, $forwardSql, $rollbackSql, $operationGroup);
294+
295+
// Common rollback patterns:
296+
"CREATE TABLE users" → "DROP TABLE IF EXISTS users"
297+
"CREATE CLUSTER c1" → "DELETE CLUSTER c1"
298+
"ALTER CLUSTER c1 ADD t1" → "ALTER CLUSTER c1 DROP t1"
299+
300+
// All operations require explicit rollback commands
301+
$queue->add($node, $sql, $rollbackSql, $operationGroup);
301302
}
302303
```
303304

@@ -357,7 +358,7 @@ $health = $monitor->performHealthCheck();
357358
if ($health['overall_status'] !== 'healthy') {
358359
// Automatic recovery
359360
$recovery = $monitor->performAutoRecovery();
360-
361+
361362
// Or manual intervention based on recommendations
362363
foreach ($health['recommendations'] as $recommendation) {
363364
echo $recommendation;
@@ -370,7 +371,7 @@ if ($health['overall_status'] !== 'healthy') {
370371
### Rollback Flow
371372

372373
```
373-
Table.shard()
374+
Table.shard()
374375
├── Creates operation_group
375376
├── Queue.addWithRollback() [multiple times]
376377
├── On failure:

doc/sharding/04-queue-system.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,21 +13,21 @@ The Queue class manages distributed command execution with the following key fea
1313
- **Node Targeting**: Routes commands to specific cluster nodes
1414
- **Parallel Execution**: Supports concurrent operations where safe
1515
- **Synchronization Points**: Ensures critical operations complete before proceeding
16-
- **Rollback Support**: Stores rollback commands alongside forward commands
16+
- **Rollback Support**: Stores rollback commands alongside forward commands (REQUIRED)
1717
- **Operation Groups**: Groups related commands for atomic execution
1818
- **Automatic Rollback**: Executes rollback sequence on failure
1919

20-
### Enhanced Queue Table Structure
20+
### Queue Table Structure
2121

2222
```sql
2323
CREATE TABLE system.sharding_queue (
2424
`id` bigint, -- Primary key
2525
`node` string, -- Target node
2626
`query` string, -- Forward command
27-
`rollback_query` string, -- Rollback command (NEW)
27+
`rollback_query` string, -- Rollback command (REQUIRED)
2828
`wait_for_id` bigint, -- Forward dependency
29-
`rollback_wait_for_id` bigint, -- Rollback dependency (NEW)
30-
`operation_group` string, -- Operation group ID (NEW)
29+
`rollback_wait_for_id` bigint, -- Rollback dependency
30+
`operation_group` string, -- Operation group ID
3131
`tries` int, -- Retry count
3232
`status` string, -- Command status
3333
`created_at` bigint, -- Creation timestamp
@@ -136,7 +136,7 @@ When a rollback is triggered:
136136
```php
137137
protected function executeRollbackSequence(array $rollbackCommands): bool {
138138
$allSuccess = true;
139-
139+
140140
foreach ($rollbackCommands as $command) {
141141
try {
142142
$this->client->sendRequest($command['rollback_query']);
@@ -147,7 +147,7 @@ protected function executeRollbackSequence(array $rollbackCommands): bool {
147147
// Continue with other rollback commands
148148
}
149149
}
150-
150+
151151
return $allSuccess;
152152
}
153153
```

doc/sharding/07-error-handling.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@ All related operations are grouped together for atomic execution:
1616
public function shard(Queue $queue, int $shardCount, int $replicationFactor = 2): Map {
1717
// Create unique operation group
1818
$operationGroup = "shard_create_{$this->name}_" . uniqid();
19-
19+
2020
try {
2121
// All operations in this group
2222
$queue->addWithRollback($node, $createSql, $rollbackSql, $operationGroup);
2323
// ... more operations
24-
24+
2525
return $result;
2626
} catch (\Throwable $t) {
2727
// Automatic rollback of entire group
@@ -31,12 +31,17 @@ public function shard(Queue $queue, int $shardCount, int $replicationFactor = 2)
3131
}
3232
```
3333

34-
### Rollback Command Generation
34+
### Rollback Command Handling
3535

36-
The system automatically generates reverse commands for common operations:
36+
The system requires explicit rollback commands for all operations:
3737

3838
```php
39-
// RollbackCommandGenerator examples
39+
// Rollback commands must be provided when queuing operations
40+
$forwardSql = "CREATE TABLE users (id bigint, name string)";
41+
$rollbackSql = "DROP TABLE IF EXISTS users";
42+
$queue->add($nodeId, $forwardSql, $rollbackSql, $operationGroup);
43+
44+
// Common rollback patterns:
4045
"CREATE TABLE users" → "DROP TABLE IF EXISTS users"
4146
"CREATE CLUSTER c1" → "DELETE CLUSTER c1"
4247
"ALTER CLUSTER c1 ADD t1" → "ALTER CLUSTER c1 DROP t1"

doc/sharding/11-rollback-system.md

Lines changed: 35 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
# Sharding Rollback and Recovery System
1+
# Sharding Rollback System
22

33
## Overview
44

5-
The ManticoreSearch Buddy sharding system now includes a comprehensive rollback and recovery mechanism that ensures system consistency and reliability by automatically reversing failed operations and providing tools for health monitoring and resource cleanup.
5+
The ManticoreSearch Buddy sharding system includes a simplified rollback mechanism that ensures system consistency by storing rollback commands directly when operations are queued. Rollback is always enabled and commands are provided upfront.
66

77
## Key Features
88

9-
### 1. Automatic Rollback
10-
- **Operation Groups**: Related commands grouped for atomic execution
11-
- **Rollback Commands**: Automatic generation of reverse SQL commands
12-
- **Failure Detection**: Automatic rollback trigger on operation failure
13-
- **Queue-Based Execution**: Leverages existing queue infrastructure
9+
### 1. Always-On Rollback
10+
- **Required Rollback Commands**: Every queued operation must provide its rollback command
11+
- **Direct Storage**: Rollback commands stored immediately when operation is queued
12+
- **No Auto-Generation**: Rollback commands are explicitly provided by the caller
13+
- **Operation Groups**: Related commands grouped for atomic rollback
1414

1515
### 2. Rebalancing Control
1616
- **Stop/Pause/Resume**: Full control over rebalancing operations
@@ -32,17 +32,17 @@ The ManticoreSearch Buddy sharding system now includes a comprehensive rollback
3232

3333
## Architecture
3434

35-
### Enhanced Queue Table Structure
35+
### Queue Table Structure
3636

3737
```sql
3838
CREATE TABLE system.sharding_queue (
3939
`id` bigint, -- Primary key
4040
`node` string, -- Target node
4141
`query` string, -- Forward command
42-
`rollback_query` string, -- Rollback command (NEW)
42+
`rollback_query` string, -- Rollback command (REQUIRED)
4343
`wait_for_id` bigint, -- Forward dependency
44-
`rollback_wait_for_id` bigint, -- Rollback dependency (NEW)
45-
`operation_group` string, -- Operation group ID (NEW)
44+
`rollback_wait_for_id` bigint, -- Rollback dependency
45+
`operation_group` string, -- Operation group ID
4646
`tries` int, -- Retry count
4747
`status` string, -- Command status
4848
`created_at` bigint, -- Creation timestamp
@@ -58,7 +58,7 @@ Table Operations
5858
├── Create operation_group
5959
├── Queue.addWithRollback() [multiple commands]
6060
├── On Success: Mark complete
61-
└── On Failure:
61+
└── On Failure:
6262
└── Queue.rollbackOperationGroup()
6363
├── Get completed commands
6464
├── Sort by ID DESC (reverse)
@@ -116,7 +116,7 @@ if ($health['overall_status'] !== 'healthy') {
116116
foreach ($health['issues'] as $issue) {
117117
echo "Issue: {$issue['type']} - {$issue['count']} affected";
118118
}
119-
119+
120120
// Auto-recovery
121121
$recovery = $monitor->performAutoRecovery();
122122
echo "Recovered: " . count($recovery['recovered_tables']) . " tables";
@@ -139,9 +139,9 @@ $cleanup->cleanupExpiredQueueItems();
139139
$cleanup->cleanupStaleStateEntries();
140140
```
141141

142-
## Rollback Command Generation
142+
## Rollback Command Examples
143143

144-
The system automatically generates rollback commands for common operations:
144+
Common rollback patterns used in the system:
145145

146146
| Forward Command | Rollback Command |
147147
|----------------|------------------|
@@ -151,13 +151,29 @@ The system automatically generates rollback commands for common operations:
151151
| `ALTER CLUSTER c1 DROP t1` | `ALTER CLUSTER c1 ADD t1` |
152152
| `JOIN CLUSTER c1` | `DELETE CLUSTER c1` |
153153

154-
Commands that cannot be safely rolled back (like `DROP TABLE`) return null and require manual intervention.
154+
All rollback commands must be provided when queuing operations. The system no longer auto-generates rollback commands.
155+
156+
## Usage Examples
157+
158+
### Adding Operations with Rollback
159+
160+
```php
161+
// Create table with explicit rollback
162+
$forwardSql = "CREATE TABLE users (id bigint, name string)";
163+
$rollbackSql = "DROP TABLE IF EXISTS users";
164+
$queue->add($nodeId, $forwardSql, $rollbackSql, $operationGroup);
165+
166+
// Distributed table creation
167+
$forwardSql = $this->getCreateShardedTableSQL($shards);
168+
$rollbackSql = "DROP TABLE IF EXISTS {$this->name}";
169+
$queue->add($node, $forwardSql, $rollbackSql, $operationGroup);
170+
```
155171

156172
## Production Deployment
157173

158-
### Migration
174+
### Queue Table Setup
159175

160-
For existing systems, run the migration to add rollback support:
176+
The queue table is automatically created with rollback support:
161177

162178
```php
163179
$queue = new Queue($cluster, $client);
@@ -234,4 +250,4 @@ Configure alerts for:
234250

235251
## Conclusion
236252

237-
The rollback and recovery system transforms the ManticoreSearch Buddy sharding system from a basic distributed system into a production-ready platform with comprehensive error handling, automatic recovery, and resource management capabilities. This ensures high availability, data consistency, and operational reliability in production environments.
253+
The rollback and recovery system transforms the ManticoreSearch Buddy sharding system from a basic distributed system into a production-ready platform with comprehensive error handling, automatic recovery, and resource management capabilities. This ensures high availability, data consistency, and operational reliability in production environments.

doc/sharding/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ The Manticore Buddy Sharding system provides automatic distribution of data acro
1212
- **Data Safety**: Ensures no data loss during rebalancing operations
1313
- **Concurrent Operation Control**: Prevents conflicting rebalancing operations
1414
- **Queue-Based Processing**: Asynchronous command execution with proper ordering
15-
- **Automatic Rollback**: Comprehensive rollback system for failed operations
15+
- **Automatic Rollback**: Simplified rollback system with required rollback commands
1616
- **Graceful Stop Control**: Ability to stop/pause/resume rebalancing operations
1717
- **Resource Cleanup**: Automatic cleanup of orphaned resources and failed operations
1818
- **Health Monitoring**: Built-in health checks and auto-recovery mechanisms

0 commit comments

Comments
 (0)