You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All write operations MUST go through Raft consensus to ensure strong consistency across the cluster. The system is designed as a CP (Consistent + Partition-tolerant) system where:
23
+
All write operations MUST go through Raft consensus to ensure strong
24
+
consistency across the cluster. The system is designed as a CP
25
+
(Consistent + Partition-tolerant) system where:
26
26
27
27
- Writes require quorum acknowledgment before returning success
28
-
- Reads default to leader consistency but support configurable consistency levels (`:eventual`, `:leader`, `:strong`)
29
-
- No operation may sacrifice consistency for availability during network partitions
28
+
- Reads default to leader consistency but support configurable
- No operation may sacrifice consistency for availability during
31
+
network partitions
30
32
31
-
**Rationale**: As a distributed coordination system, incorrect data is worse than unavailable data. Applications relying on Concord for configuration, feature flags, or coordination require absolute certainty about data accuracy.
33
+
**Rationale**: As a distributed coordination system, incorrect data is
34
+
worse than unavailable data. Applications relying on Concord for
35
+
configuration, feature flags, or coordination require absolute
36
+
certainty about data accuracy.
32
37
33
38
### II. Embedded by Design
34
39
35
-
Concord MUST function as an embedded library that starts with the host application. This means:
40
+
Concord MUST function as an embedded library that starts with the host
41
+
application. This means:
36
42
37
43
- No separate infrastructure or external processes required
- Zero operational overhead for single-node development
41
48
42
-
**Rationale**: Lowering the barrier to entry enables adoption. Developers should be able to add distributed coordination to their apps as easily as adding any other dependency.
49
+
**Rationale**: Lowering the barrier to entry enables adoption.
50
+
Developers should be able to add distributed coordination to their
51
+
apps as easily as adding any other dependency.
43
52
44
53
### III. Performance Without Compromise
45
54
46
-
The system MUST maintain microsecond-level performance for reads and low-millisecond performance for writes:
55
+
The system MUST maintain microsecond-level performance for reads and
56
+
low-millisecond performance for writes:
47
57
48
-
- Read operations: target <10μs for ETS lookups
58
+
- Read operations: target <10us for ETS lookups
49
59
- Write operations: target <20ms for quorum commits
50
60
- Throughput: maintain 600K+ ops/sec under load
51
61
- All performance-critical paths MUST avoid blocking operations
52
62
53
-
**Rationale**: A coordination layer that introduces latency becomes a bottleneck. Performance MUST be a feature, not an afterthought.
63
+
**Rationale**: A coordination layer that introduces latency becomes a
64
+
bottleneck. Performance MUST be a feature, not an afterthought.
54
65
55
66
### IV. Observability as Infrastructure
56
67
57
-
Every operation MUST emit telemetry events. Observability is not optional:
68
+
Every operation MUST emit telemetry events. Observability is not
69
+
optional:
58
70
59
71
- All API operations emit `[:concord, :api, :*]` events
60
72
- All internal operations emit `[:concord, :operation, :*]` events
61
73
- State changes emit `[:concord, :state, :*]` events
62
74
- OpenTelemetry tracing MUST be available for distributed debugging
63
75
- Prometheus metrics MUST be exportable
64
76
65
-
**Rationale**: Distributed systems are inherently harder to debug. Without comprehensive observability, production issues become impossible to diagnose.
77
+
**Rationale**: Distributed systems are inherently harder to debug.
78
+
Without comprehensive observability, production issues become
79
+
impossible to diagnose.
66
80
67
81
### V. Secure Defaults
68
82
69
83
Security MUST be enabled by default in production environments:
70
84
71
85
- Authentication required for all operations when `auth_enabled: true`
72
-
- Token-based authentication with cryptographically secure token generation
86
+
- Token-based authentication with cryptographically secure token
87
+
generation
73
88
- RBAC (Role-Based Access Control) for fine-grained permissions
74
89
- TLS support for transport security
75
90
- Audit logging for compliance requirements
76
91
77
-
**Rationale**: Security vulnerabilities in coordination systems can compromise entire application fleets. Secure-by-default prevents accidental exposure.
92
+
**Rationale**: Security vulnerabilities in coordination systems can
| Small Values (100B) | 621K-870K ops/sec | 1-2us |
87
-
| Medium Values (1KB) | 134K-151K ops/sec | 6-7us |
88
-
| TTL Operations | 943K-25M ops/sec | 0.04-1us |
89
-
| HTTP Health Check | 5K req/sec | 197us |
90
-
| Memory Overhead |~10 bytes per item | — |
84
+
Performance varies significantly depending on hardware, cluster size, network topology, and consistency level. ETS-backed reads are inherently fast, but actual throughput and latency depend on your deployment. Run `mix run benchmarks/run_benchmarks.exs` on your own hardware to get representative numbers.
-**Bootstrap ETS Fallback**: Auth, RBAC, and tenant data written via ETS fallback during the bootstrap window (before a Raft cluster forms) is not replicated. Once the cluster establishes quorum, subsequent writes go through Raft consensus normally.
102
+
103
+
-**Node-Local Rate Limiting**: Multi-tenancy rate limiting is tracked per-node. A tenant can exceed its configured quota by up to N× across N nodes in the cluster.
104
+
105
+
-**Query TTL Clock Sensitivity**: TTL expiration checks in queries use wall-clock time (`System.system_time`) which may differ from leader-assigned time (`meta.system_time`) during clock drift between nodes.
0 commit comments