Skip to content

Commit bd44971

Browse files
committed
feat: implement sharding with unified connection pool
- Implement sharding feature with range, hash, and modulo strategies Automatically routes queries to appropriate shards based on shard key - Add ShardConfig, ShardRouter, and ShardConfigBuilder classes Provides fluent API for configuring sharding - Add useConnections() method to use existing connections from pool Connections are added via addConnection() and reused for sharding - Add getConnection() method to PdoDb for retrieving connections by name - Add ShardStrategyInterface with RangeShardStrategy, HashShardStrategy, and ModuloShardStrategy implementations - Add extractShardKeyValue() to ConditionBuilder for automatic shard routing - Update QueryBuilder to support automatic shard switching based on queries - Add comprehensive tests and examples for all sharding strategies - Add documentation for sharding feature - Replace self with static in interfaces for better inheritance support Applied to ConditionBuilderInterface, DmlQueryBuilderInterface, ExecutionEngineInterface, FileLoaderInterface, JoinBuilderInterface, JsonQueryBuilderInterface, ParameterManagerInterface, QueryBuilderInterface, SelectQueryBuilderInterface
1 parent 28c328a commit bd44971

33 files changed

+2525
-151
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Built on top of PDO with **zero external dependencies**, it offers:
1919
- **Query Performance Profiling** - Built-in profiler for tracking execution times, memory usage, and slow query detection
2020
- **Prepared Statement Pool** - Automatic statement caching with LRU eviction (20-50% faster repeated queries)
2121
- **Read/Write Splitting** - Horizontal scaling with master-replica architecture and load balancing
22+
- **Sharding** - Horizontal partitioning across multiple databases with automatic query routing (range, hash, modulo strategies)
2223
- **Window Functions** - Advanced analytics with ROW_NUMBER, RANK, LAG, LEAD, running totals, moving averages
2324
- **Common Table Expressions (CTEs)** - WITH clauses for complex queries, recursive CTEs for hierarchical data, materialized CTEs for performance optimization
2425
- **LATERAL JOINs** - Correlated subqueries in FROM clause for PostgreSQL and MySQL
@@ -65,6 +66,7 @@ Inspired by [ThingEngineer/PHP-MySQLi-Database-Class](https://github.com/ThingEn
6566
- [SQLite Configuration](#sqlite-configuration)
6667
- [Connection Pooling](#connection-pooling)
6768
- [Read/Write Splitting](#readwrite-splitting)
69+
- [Sharding](#sharding)
6870
- [Window Functions](#window-functions)
6971
- [Common Table Expressions (CTEs)](#common-table-expressions-ctes)
7072
- [ActiveRecord Pattern](#activerecord-pattern)
Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# Sharding
2+
3+
Sharding is a database scaling technique that horizontally partitions data across multiple database instances (shards). This allows you to distribute large datasets across multiple databases, improving performance and scalability.
4+
5+
## Overview
6+
7+
PdoDb's sharding feature automatically routes queries to the appropriate shard based on the shard key value in WHERE conditions. This is transparent to your application code - you write queries normally, and PdoDb handles the routing automatically.
8+
9+
## Configuration
10+
11+
Sharding is configured using a fluent API. First, add your shard connections to the connection pool, then configure sharding to use them:
12+
13+
```php
14+
// Add connections to the pool
15+
$db->addConnection('shard1', [
16+
'driver' => 'mysql',
17+
'host' => 'shard1.example.com',
18+
'dbname' => 'users_db',
19+
'user' => 'user',
20+
'password' => 'password',
21+
]);
22+
23+
$db->addConnection('shard2', [
24+
'driver' => 'mysql',
25+
'host' => 'shard2.example.com',
26+
'dbname' => 'users_db',
27+
'user' => 'user',
28+
'password' => 'password',
29+
]);
30+
31+
$db->addConnection('shard3', [
32+
'driver' => 'mysql',
33+
'host' => 'shard3.example.com',
34+
'dbname' => 'users_db',
35+
'user' => 'user',
36+
'password' => 'password',
37+
]);
38+
39+
// Configure sharding to use existing connections
40+
$db->shard('users')
41+
->shardKey('user_id')
42+
->strategy('range') // or 'hash', 'modulo'
43+
->ranges([
44+
'shard1' => [0, 1000],
45+
'shard2' => [1001, 2000],
46+
'shard3' => [2001, 3000],
47+
])
48+
->useConnections(['shard1', 'shard2', 'shard3'])
49+
->register();
50+
```
51+
52+
## Sharding Strategies
53+
54+
### Range Strategy
55+
56+
Distributes data based on numeric ranges. Each shard handles a specific range of shard key values.
57+
58+
```php
59+
$db->shard('users')
60+
->shardKey('user_id')
61+
->strategy('range')
62+
->ranges([
63+
'shard1' => [0, 1000],
64+
'shard2' => [1001, 2000],
65+
'shard3' => [2001, 3000],
66+
])
67+
->useConnections(['shard1', 'shard2', 'shard3'])
68+
->register();
69+
```
70+
71+
**Use when:**
72+
- Shard key values are numeric and sequential
73+
- You want predictable distribution
74+
- You need to query ranges of values
75+
76+
### Hash Strategy
77+
78+
Distributes data based on hash of the shard key value. Uses CRC32 for consistent hashing.
79+
80+
```php
81+
$db->shard('users')
82+
->shardKey('user_id')
83+
->strategy('hash')
84+
->useConnections(['shard1', 'shard2', 'shard3'])
85+
->register();
86+
```
87+
88+
**Use when:**
89+
- You want even distribution across shards
90+
- Shard key values are not sequential
91+
- You want consistent routing (same value always goes to same shard)
92+
93+
### Modulo Strategy
94+
95+
Distributes data based on modulo operation: `value % shard_count`.
96+
97+
```php
98+
$db->shard('users')
99+
->shardKey('user_id')
100+
->strategy('modulo')
101+
->useConnections(['shard1', 'shard2', 'shard3'])
102+
->register();
103+
```
104+
105+
**Use when:**
106+
- Shard key values are numeric
107+
- You want simple, predictable distribution
108+
- You need to easily add or remove shards
109+
110+
## Usage
111+
112+
Once configured, sharding works transparently:
113+
114+
```php
115+
// Insert - automatically routed to appropriate shard
116+
$db->find()->table('users')->insert([
117+
'user_id' => 500,
118+
'name' => 'Alice',
119+
'email' => '[email protected]'
120+
]);
121+
122+
// Query - automatically routed to appropriate shard
123+
$user = $db->find()
124+
->from('users')
125+
->where('user_id', 500)
126+
->getOne();
127+
128+
// Update - automatically routed to appropriate shard
129+
$db->find()
130+
->table('users')
131+
->where('user_id', 500)
132+
->update(['name' => 'Alice Updated']);
133+
134+
// Delete - automatically routed to appropriate shard
135+
$db->find()
136+
->table('users')
137+
->where('user_id', 500)
138+
->delete();
139+
```
140+
141+
## Shard Key Requirements
142+
143+
For sharding to work, the shard key must be present in WHERE conditions:
144+
145+
```php
146+
// ✅ Works - shard key in WHERE
147+
$user = $db->find()
148+
->from('users')
149+
->where('user_id', 500)
150+
->getOne();
151+
152+
// ❌ Falls back to normal routing - shard key not in WHERE
153+
$users = $db->find()
154+
->from('users')
155+
->where('name', 'Alice')
156+
->get();
157+
```
158+
159+
If the shard key is not found in WHERE conditions, the query will fall back to normal routing (using the default connection or read/write splitting if enabled).
160+
161+
## Advanced Usage
162+
163+
### Using Existing Connections
164+
165+
You can use existing connections from the connection pool:
166+
167+
```php
168+
// Add connections to pool
169+
$db->addConnection('shard1', [...]);
170+
$db->addConnection('shard2', [...]);
171+
172+
// Get connections and register them with shard router
173+
$shardRouter = $db->getShardRouter();
174+
$connections = // get connections from pool
175+
176+
foreach (['shard1', 'shard2'] as $shard) {
177+
$shardRouter->addShardConnection('users', $shard, $connections[$shard]);
178+
}
179+
```
180+
181+
### Manual Shard Resolution
182+
183+
You can manually resolve which shard a value belongs to:
184+
185+
```php
186+
$shardRouter = $db->getShardRouter();
187+
$config = $shardRouter->getShardConfig('users');
188+
189+
$strategy = // create strategy from config
190+
$shardName = $strategy->resolveShard(500);
191+
```
192+
193+
## Best Practices
194+
195+
1. **Choose appropriate shard key**: The shard key should be:
196+
- Present in most queries
197+
- Evenly distributed (for hash/modulo) or sequential (for range)
198+
- Never changed after creation
199+
200+
2. **Keep table structure consistent**: All shards must have the same table structure.
201+
202+
3. **Handle shard key absence**: If a query doesn't include the shard key, it will fall back to normal routing. Consider:
203+
- Adding the shard key to WHERE conditions when possible
204+
- Using a different query strategy for queries without shard key
205+
- Implementing cross-shard queries if needed
206+
207+
4. **Monitor shard distribution**: Regularly check that data is evenly distributed across shards.
208+
209+
5. **Plan for shard migration**: When adding or removing shards, you may need to migrate data.
210+
211+
## Limitations
212+
213+
- Sharding only works when the shard key is present in WHERE conditions
214+
- Cross-shard queries are not supported (queries that need data from multiple shards)
215+
- Transactions spanning multiple shards are not supported
216+
- JOINs between sharded tables are not supported
217+
218+
## Testing
219+
220+
For testing, you can use multiple SQLite in-memory databases to simulate shards:
221+
222+
```php
223+
$db->addConnection('shard1', ['driver' => 'sqlite', 'path' => ':memory:']);
224+
$db->addConnection('shard2', ['driver' => 'sqlite', 'path' => ':memory:']);
225+
226+
// Configure sharding with in-memory databases
227+
$db->shard('users')
228+
->shardKey('user_id')
229+
->strategy('range')
230+
->ranges([
231+
'shard1' => [0, 1000],
232+
'shard2' => [1001, 2000],
233+
])
234+
->nodes([
235+
'shard1' => ['driver' => 'sqlite', 'path' => ':memory:'],
236+
'shard2' => ['driver' => 'sqlite', 'path' => ':memory:'],
237+
])
238+
->register();
239+
```
240+
241+
This allows you to test sharding functionality without setting up multiple database instances.

documentation/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ Complete documentation for the PDOdb library - a lightweight, framework-agnostic
5151
- [Query Caching](05-advanced-features/query-caching.md) - PSR-16 result caching
5252
- [Pagination](05-advanced-features/pagination.md) - Full, simple, and cursor-based pagination
5353
- [Read/Write Splitting](05-advanced-features/read-write-splitting.md) - Master-replica architecture
54+
- [Sharding](05-advanced-features/sharding.md) - Horizontal partitioning across multiple databases
5455
- [Database Migrations](05-advanced-features/migrations.md) - Version-controlled schema changes
5556
- [ActiveRecord](05-advanced-features/active-record.md) - Lightweight ORM pattern for object-based database operations
5657
- [ActiveRecord Relationships](05-advanced-features/active-record-relationships.md) - hasOne, hasMany, belongsTo relationships with lazy and eager loading
@@ -128,6 +129,7 @@ $users = $db->find()
128129
- **Cross-Database Support** - Works with MySQL, MariaDB, PostgreSQL, SQLite
129130
- **Query Caching** - PSR-16 integration for 10-1000x faster queries
130131
- **Read/Write Splitting** - Horizontal scaling with master-replica architecture
132+
- **Sharding** - Horizontal partitioning across multiple databases with automatic query routing
131133
- **Window Functions** - Advanced analytics with ROW_NUMBER, RANK, LAG, LEAD
132134
- **Common Table Expressions (CTEs)** - WITH clauses for complex queries, recursive CTEs
133135
- **Full-Text Search** - Cross-database FTS with unified API

examples/18-set-operations/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,3 +79,4 @@ $db->find()
7979

8080

8181

82+

0 commit comments

Comments
 (0)