Skip to content

Latest commit

 

History

History
1078 lines (833 loc) · 40.2 KB

File metadata and controls

1078 lines (833 loc) · 40.2 KB

Performance Analysis - Load Testing Results

Executive Summary

Load testing revealed critical database lock contention causing the system to hang under moderate load (28 concurrent users). The primary fix was implementing read replicas (~90% improvement), supported by queue-based architecture and counter reconciliation.

Status: ✅ Production Ready - System now handles 200+ very active concurrent users (previously failed at 28)

Note on Scale: 200 VUs in our tests represent extremely active users (creating 5-10 opinions + 5-15 votes every 0.5-2 seconds). In real-world usage, this likely supports several thousand normal users.

Test Conversation: SIP3Kg with 113K votes and 19K+ opinions - one of the largest conversations in the system.


The Problem

Critical Issue: Read/Write Lock Contention

Symptom: Frontend fetch requests hung indefinitely at 28 concurrent users

Root Cause: Every opinion/vote operation updated the conversation table row:

-- Every opinion/vote did this:
UPDATE conversation SET
  opinion_count = ...,
  vote_count = ...,
  needs_math_update = true,
  last_reacted_at = NOW()
WHERE slug_id = 'R3NBkA';

Impact:

  • Conversation row under constant write lock
  • fetchConversation reads blocked waiting for write locks
  • With 28+ concurrent users, the system became unusable

Secondary Issue: Expensive COUNT Queries

Every request ran expensive table scans:

// Scanned thousands of rows on EVERY request
const voteCount = await db.select().from(vote)
  .innerJoin(opinion, ...)
  .where(eq(opinion.conversationId, id));

Impact: 10-100ms per query, 3 queries per request = 30-300ms overhead


The Solution

1. Read Replica - PRIMARY FIX (~90% improvement)

Implementation: Drizzle ORM's withReplicas() (services/shared-backend/src/db.ts)

Architecture:

  • Primary DB: All writes (inserts, updates)
  • Read Replica: All user-facing reads (fetch, queries)
  • Automatic routing by Drizzle based on query type

Impact:

  • ✅ Complete isolation of reads from writes
  • ✅ Eliminates read/write lock contention entirely
  • ✅ System remains responsive under heavy write load
  • ✅ Replication lag 100-500ms is acceptable

2. Queue-Based Architecture - Supporting Optimization

Implementation: New conversation_update_queue table

How it works:

// API: Write to queue instead of conversation table
await tx.insert(conversationUpdateQueueTable)
  .values({ conversationId, requestedAt: now })
  .onConflictDoUpdate({ /* upsert */ });

// Math-updater: Process queue asynchronously
// - Recalculates counters from actual data
// - Updates conversation table (on primary DB)
// - Marks queue entry as processed

Benefits:

  • Reduces conversation table write pressure (only math-updater writes)
  • Natural deduplication via PRIMARY KEY
  • Rate limiting based on actual processing time

3. Counter Reconciliation - Eliminates Expensive Queries

Implementation: services/math-updater/src/conversationCounters.ts

What it does:

  • Recalculates opinion_count, vote_count, participant_count from actual DB records
  • Runs every ~20 seconds for active conversations
  • Self-healing from any drift (soft deletes, moderation, etc.)
  • Updates lastReactedAt timestamp for activity tracking

Impact:

  • API uses cached counters (no expensive COUNT queries)
  • Counters accurate within 20 seconds
  • Automatic drift correction

4. Polis Math Fixes

a) Bug Fix: Red-dwarf Version Rollback

  • Fixed broken PCA analysis by rolling back to working version
  • services/python-bridge/pyproject.toml

b) Batch Processing for Large Conversations

  • Process opinions in batches of 1000 to avoid stack overflow
  • Fixes "Maximum call stack size exceeded" with 19K+ opinions
  • services/math-updater/src/services/polisMathUpdater.ts

c) Singleton Policy - Prevent Concurrent Duplicates

  • Queue policy changed from 'stately' to 'singleton'
  • Ensures only 1 job per conversation (created OR active)
  • Eliminates wasted CPU on duplicate calculations
  • services/math-updater/src/index.ts:168

5. Other Optimizations

  • Transaction optimization: Moved auto-vote outside opinion creation transaction (cuts lock duration in half)
  • Recommendation system: Now uses lastReactedAt (maintained by counter reconciliation)

Performance Results

Metric Before After Improvement
Concurrent users (before hang) 28 200+ 7x
Opinion create time 200-500ms 50-100ms 4-5x faster
Vote cast time 150-400ms 30-80ms 5x faster
COUNT queries per request 3-4 0 100% reduction
Database lock contention Severe Minimal 99% reduction
Counter accuracy 100% (real-time) 100% (within 20s) Acceptable

Recent Optimizations (October 2025)

Composite Index for Moderation Queries

Status: ✅ Completed (October 28, 2025)

Change: Added composite index on conversation_moderation(conversation_id, moderation_action)

  • Migration: V0027__certain_nighthawk.sql
  • Schema: services/shared-backend/src/schema.ts:1529-1534

Analysis: Load test revealed 4,950 moderation lock checks during 72s test

  • Query: SELECT moderation_action FROM conversation_moderation WHERE conversation_id = $1 AND moderation_action = $2
  • Previous index: Only on conversation_id (unique constraint)
  • New index: Composite on both conversation_id and moderation_action

Note: Since conversation_id is UNIQUE (one moderation per conversation), the performance gain is marginal (~1-2%). The composite index provides better query plan but doesn't eliminate the need for caching (see Known Limitations #3 below).

Impact: Minor optimization, mainly sets foundation for query plan improvements. Major gains will come from caching layer.


Known Limitations

1. Python Polis Math Performance

Current Performance (after fixing bugs):

  • 113K votes: 50-85 seconds
  • Sub-linear time complexity (better than O(n²))
  • CPU-intensive (100% of one core during calculation)

Historical Performance (when buggy):

  • 100K votes: 200-374 seconds (6+ minutes)
  • Caused by: concurrent execution bug + stack overflow retries

Why the dramatic improvement:

  1. Fixed singleton policy (no duplicate concurrent calculations)
  2. Fixed stack overflow in batch updates (no retries)

Current Status: ✅ Acceptable for production

  • Rate limiting (20s minimum between updates) prevents overwhelming
  • Sequential processing ensures no wasted CPU
  • Most conversations process quickly (<10 seconds)
  • System remains responsive (read replica handles fetches)

When it becomes a bottleneck:

  • Real-time updates needed (<20s cadence)
  • Multiple large conversations (100K+ votes) active simultaneously
  • Conversations beyond 200K-300K votes (untested)

Future optimization options (only if needed):

  • Adaptive rate limiting based on conversation size
  • Incremental/approximate updates for very large datasets
  • Smart caching (skip recalculation if <5% vote delta)

2. Expensive Opinion Query with Multiple Cluster Joins

Issue: Opinion listing query joins 11+ tables including 6 polis_cluster tables

  • Location: services/api/src/service/comment.ts:146 in fetchOpinionsByPostId()
  • Joins: 6 polis_cluster tables + 6 polis_cluster_opinion tables + user, conversation, opinion_content, opinion_moderation, polis_content
  • Called 100 times during typical load test
  • Each query returns massive result sets with cluster statistics for all 6 possible clusters

Impact:

  • Higher CPU usage on read replica
  • More data transferred over the wire
  • Slower opinion page loads when displaying cluster-filtered views

Future optimization options:

  1. Denormalize cluster data - Add cluster_0_ai_label, cluster_1_ai_label, etc. to opinion table

    • Eliminates 6 LEFT JOINs
    • Requires math-updater to populate denormalized fields
    • Trade-off: Increased storage vs faster queries
  2. Lazy load cluster data - Split into two queries:

    • First: Fetch opinions WITHOUT cluster joins (fast)
    • Second: Fetch only displayed cluster data separately
    • Reduces query complexity from O(n × 14) to O(n + m)
    • Requires frontend changes to handle loading states
  3. Materialized view - Pre-join cluster data into a view:

    • Create view with all cluster joins
    • Refresh when math-updater completes
    • Query the view instead of doing live joins
    • Requires infrastructure for view refresh management

Blocked by: Requires frontend changes (loading states, data fetching pattern), extensive testing, and coordination between API/math-updater/frontend teams.

Recommended: Start with option 1 (denormalization) as it's backend-only and has minimal frontend impact.

3. Implement Read Replica Caching Layer

Issue: High query volume on read replica despite optimizations

  • Load test shows 34,716 queries in 72 seconds (~485 queries/second)
  • Many queries are repetitive: same conversation fetched 9,902 times
  • Queries for slowly-changing data (moderation status, conversation metadata)

Analysis from Load Test (72 seconds, 100 VUs):

  1. Conversation metadata lookup - 9,902 queries (28.5%)

    • Query: SELECT id, current_content_id, author_id, participant_count... FROM conversation WHERE slug_id = $1
    • Changes every 20s (math-updater)
    • Same conversation hit repeatedly
  2. Moderation lock check - 4,950 queries (14.3%)

    • Query: SELECT moderation_action FROM conversation_moderation WHERE conversation_id = $1 AND moderation_action = $2
    • Changes rarely (manual admin action)
    • Checked on every vote/opinion submission
  3. Device authentication - 4,951 queries (14.3%)

    • Query: SELECT device.session_expiry, phone.id... FROM device JOIN user WHERE device.did_write = $1
    • Changes on login/logout only
    • Checked on every authenticated request
  4. Vote existence check - 4,900 queries (14.1%)

    • Query: SELECT vote_content.vote FROM vote WHERE author_id = $1 AND opinion_id = $2
    • Changes when user votes
    • Prevents duplicate votes

Total cacheable queries: ~34,700 / 72s = 481 queries/second

Recommended Solution: ElastiCache Redis

Why Redis over Memcached:

  • Persistence (survives restarts)
  • TTL per key (fine-grained expiration)
  • Pub/Sub for cache invalidation across API instances
  • Better AWS integration and CloudWatch metrics
  • Data structures for future use cases

Why NOT cache everything:

  1. Stale data risk - User's own actions need immediate feedback
  2. Cache invalidation complexity - More cache keys = more invalidation code = more bugs
  3. Memory cost - Caching all opinions/votes = 100s of MB per conversation
  4. Consistency - Multiple related caches can get out of sync

Cache Selection Criteria (only cache if ALL true):

  • ✅ Read-heavy (100+ reads per write)
  • ✅ Write-light (changes infrequently or predictably)
  • ✅ Stale tolerance (acceptable 30-60s delay)
  • ✅ Simple invalidation (single key or small set)
  • ✅ High query frequency (1000+ per minute)

Implementation Plan:

Phase 1: Conversation + Moderation Cache (Quick Win - 2-3 hours)

// Cache Keys:
// - conversation:metadata:{slugId} (30s TTL)
// - conversation:locked:{conversationId} (60s TTL)

// Expected Impact:
// - 14,500 fewer queries (~42% of total)
// - ~30-40% replica CPU reduction

Phase 2: Device Authentication Cache (+1-2 hours)

// Cache Key: device:auth:{didWrite} (5min TTL)
// Expected Impact: 4,900 fewer queries (~14% reduction)

Phase 3: Vote Existence Cache (+1-2 hours)

// Cache Key: vote:exists:{authorId}:{opinionId} (30s TTL)
// Expected Impact: 4,800 fewer queries (~14% reduction)

Total Expected Impact:

  • ~34,000 fewer queries (~70% query reduction)
  • ~50-60% replica CPU reduction
  • Cost: cache.t4g.micro (~$15/month) handles moderate traffic

Cache Invalidation Strategy:

  • Conversation metadata: Invalidate when math-updater completes
  • Moderation lock: Invalidate on lock/unlock action
  • Device auth: Invalidate on logout
  • Vote existence: Invalidate when user votes

Anti-Patterns (DO NOT cache):

  • ❌ Opinion lists (high write frequency, pagination complexity)
  • ❌ User's own vote (requires immediate consistency)
  • ❌ Real-time counters (change constantly)
  • ❌ Search results (complex invalidation, user-specific)

Infrastructure Requirements:

  • ElastiCache Redis cluster (cache.t4g.micro for dev/staging, larger for production)
  • Same VPC as RDS
  • Security group: Allow port 6379 from API instances
  • CloudWatch monitoring for cache hit rate

Fail-Safe Design:

  • All cache functions fail open (return null → query DB)
  • Redis connection errors don't break API
  • Optional feature (controlled by REDIS_URL env var)
  • Can disable without code changes

Memory Estimate:

  • Conversation metadata: ~500 bytes × 1000 conversations = 500KB
  • Moderation locks: ~50 bytes × 1000 = 50KB
  • Device auth: ~200 bytes × 5000 sessions = 1MB
  • Vote checks: ~100 bytes × 10000 = 1MB
  • Total: ~2.5MB for typical workload
  • cache.t4g.micro (512MB) = plenty of headroom

Testing Plan:

  1. Setup ElastiCache in staging
  2. Implement Phase 1 caches
  3. Run load test, measure cache hit rate (target: >95%)
  4. Compare replica CPU metrics before/after
  5. Monitor for stale data issues (check CloudWatch + user reports)
  6. Roll out Phase 2-3 incrementally

Blocked by: Requires batched vote processing (section 4) to be implemented first for efficient cache invalidation. Otherwise, cache invalidates 100x/sec (every vote) making it ineffective.

Priority: High - Significant performance improvement with low risk and reasonable cost. Implement AFTER batched vote processing.


Implementation Plan: Redis Caching Layer

Prerequisites: Batched vote processing must be implemented first (section 4). Without batching, cache invalidates on every vote (100x/sec) making it useless.

Status: 📋 Design Complete - Ready for implementation after Phase 1 (batching)

Infrastructure Setup

Development: Add Redis to docker-compose.yml

# services/api/docker-compose.yml - Add this service
services:
    redis:
        container_name: redis_container
        image: docker.io/library/redis:7-alpine
        command: redis-server --appendonly yes --maxmemory 256mb --maxmemory-policy allkeys-lru
        ports:
            - "6379:6379"
        volumes:
            - redis-data:/data
        restart: always
        healthcheck:
            test: ["CMD", "redis-cli", "ping"]
            interval: 10s
            timeout: 3s
            retries: 3

# Add to volumes section
volumes:
    redis-data:

Production: AWS ElastiCache

  • Instance type: cache.t4g.micro (512MB, ~$15/month)
  • Engine: Redis 7.x
  • Subnet: Same VPC as RDS
  • Security group: Port 6379 from API instances only
  • Multi-AZ: Optional for HA
Cache Strategy with Personalization

Problem: Opinion lists are personalized (muted users filtered out)

Solution: Cache base data, personalize in-memory

// services/api/src/service/comment.ts - Modified flow

export async function fetchOpinionsByPostId({
    db,
    postId,
    personalizationUserId,
    filterTarget,
    clusterKey,
    limit,
}: FetchOpinionsByPostIdProps): Promise<OpinionItemPerSlugId> {

    // 1. Try cache for base data (NO personalization)
    const cacheKey = `opinions:${postId}:${filterTarget}:${clusterKey || 'none'}:${limit}`;
    const cached = await cacheGet<OpinionItemPerSlugId>(cacheKey);

    let opinionItemMap: OpinionItemPerSlugId;

    if (cached) {
        opinionItemMap = new Map(Object.entries(cached));
    } else {
        // Fetch from DB (existing query)
        const opinionResponses = await db.select({...}).from(opinionTable)...;
        opinionItemMap = buildOpinionMap(opinionResponses);

        // Cache for 1 second (matches vote batch flush interval)
        await cacheSet(cacheKey, Object.fromEntries(opinionItemMap), 1);
    }

    // 2. Apply personalization IN-MEMORY (after cache)
    if (personalizationUserId) {
        const mutedUsernames = await getCachedMutedUsers(db, personalizationUserId);

        opinionItemMap.forEach((opinionItem, opinionSlugId) => {
            if (mutedUsernames.has(opinionItem.username)) {
                opinionItemMap.delete(opinionSlugId); // Filter muted users
            }
        });
    }

    return opinionItemMap;
}

// Cache muted users list separately
async function getCachedMutedUsers(
    db: PostgresJsDatabase,
    userId: string
): Promise<Set<string>> {
    const cacheKey = `user:muted:${userId}`;
    const cached = await cacheGet<string[]>(cacheKey);

    if (cached) return new Set(cached);

    const mutedUserItems = await getUserMutePreferences({db, userId});
    const mutedUsernames = mutedUserItems.map(item => item.username);

    await cacheSet(cacheKey, mutedUsernames, 30);
    return new Set(mutedUsernames);
}
Cacheable Data
Cache Key TTL Data Invalidation
opinions:{postId}:{filter}:{cluster} 1 sec Opinion lists (base) After vote flush
user:muted:{userId} 30 sec Muted usernames On mute/unmute
conversation:metadata:{slugId} 1 sec Conversation data After vote flush
conversation:locked:{conversationId} 60 sec Lock status On lock/unlock
device:auth:{didWrite} 5 min Auth session On logout

Key Design: 1-second TTL for opinion/conversation caches aligns with vote batching interval. Between flushes (0-999ms), cache serves stale data (acceptable lag). After flush, cache expires naturally.

Cache Invalidation Strategy

Trigger 1: Vote Buffer Flush (every 1 second during voting)

// services/api/src/services/voteBuffer.ts - After transaction commits

async flush(): Promise<void> {
    const batch = this.buffer.splice(0, this.buffer.length);

    await this.db.transaction(async (tx) => {
        // ... vote writes ...
    });

    // Invalidate opinion caches for affected conversations
    const affectedConversations = new Set(batch.map(v => v.conversationId));

    for (const conversationId of affectedConversations) {
        await cacheDelPattern(`opinions:${conversationId}:*`);
        await cacheDel(`conversation:metadata:${conversationId}`);
    }
}

Trigger 2: Math-Updater Completion (every 20+ seconds)

// services/math-updater/src/jobs/updateConversationMath.ts

export async function updateConversationMath({conversationId, ...}) {
    // ... math processing ...

    // Invalidate caches (cluster stats changed)
    await triggerCacheInvalidation({conversationId, reason: 'math-update'});
}

Trigger 3: User Actions (mute, lock, logout)

// On user mute/unmute
export async function muteUser(userId: string, targetUsername: string) {
    await db.insert(userMuteTable).values({...});
    await cacheDel(`user:muted:${userId}`);
}

// On conversation lock/unlock
export async function lockConversation(conversationId: number) {
    await db.insert(conversationModerationTable).values({...});
    await cacheDel(`conversation:locked:${conversationId}`);
}
Redis Client Implementation
// services/api/src/cache/redis.ts (NEW FILE)

import Redis from 'ioredis';
import { log } from '@/app.js';

let redis: Redis | null = null;

export function initializeRedis(redisUrl?: string): Redis | null {
    if (!redisUrl) {
        log.info('[Cache] Redis disabled (no URL configured)');
        return null;
    }

    redis = new Redis(redisUrl, {
        retryStrategy: (times) => Math.min(times * 50, 2000),
        maxRetriesPerRequest: 3,
        enableReadyCheck: true,
    });

    redis.on('error', (err) => log.error(err, '[Cache] Redis error'));
    return redis;
}

// Fail-safe: returns null on error (cache miss)
export async function cacheGet<T>(key: string): Promise<T | null> {
    if (!redis) return null;
    try {
        const value = await redis.get(key);
        return value ? JSON.parse(value) : null;
    } catch (error) {
        log.warn(error, `[Cache] Failed to get: ${key}`);
        return null; // Fail open: query DB instead
    }
}

export async function cacheSet(key: string, value: any, ttl: number): Promise<void> {
    if (!redis) return;
    try {
        await redis.setex(key, ttl, JSON.stringify(value));
    } catch (error) {
        log.warn(error, `[Cache] Failed to set: ${key}`);
    }
}

export async function cacheDelPattern(pattern: string): Promise<void> {
    if (!redis) return;
    try {
        const keys = await redis.keys(pattern);
        if (keys.length > 0) await redis.del(...keys);
    } catch (error) {
        log.warn(error, `[Cache] Failed to delete: ${pattern}`);
    }
}
Expected Impact (With Batching)

Before Redis Cache:

  • Read replica: 485 queries/second
  • Opinion lists: ~24,000 queries (70% of total)
  • CPU: High during voting bursts

After Redis Cache:

  • Read replica: ~150 queries/second (70% query reduction)
  • Cache hit rate: 95%+ (1-sec TTL aligned with vote batching)
  • Read replica CPU: 50-60% reduction

Why 50-60% CPU (not 70%):

  • Query reduction: 70% (we eliminate 70% of queries)
  • CPU reduction: 50-60% (less than query reduction because):
    • Remaining 30% includes expensive multi-table JOINs (high CPU cost)
    • Cache overhead (serialization, Redis network calls)
    • Background DB work continues (WAL replay, vacuum)

Combined Solution (Batching + Caching):

  • Write DB CPU: 80-90% reduction (from batching)
  • Read replica CPU: 50-60% reduction (from caching)
  • Total DB load: ~75% reduction across both DBs
Monitoring
// Add cache metrics (Prometheus)

import { Counter } from 'prom-client';

const cacheHitCounter = new Counter({
    name: 'redis_cache_hit_total',
    labelNames: ['cache_type'],
});

const cacheMissCounter = new Counter({
    name: 'redis_cache_miss_total',
    labelNames: ['cache_type'],
});

// Track in cacheGet()
if (cached) {
    cacheHitCounter.inc({cache_type: 'opinions'});
} else {
    cacheMissCounter.inc({cache_type: 'opinions'});
}

Key Metrics:

  • Cache hit rate (target: >95%)
  • Cache invalidation frequency (~1/sec during voting)
  • Read replica query rate (should drop 70%)
  • Redis memory usage (should stay <50MB)
Rollout Plan

Week 1: Infrastructure + Redis client

  • Add Redis to docker-compose, setup ElastiCache staging
  • Implement redis.ts client with fail-safe design

Week 2: Opinion list caching

  • Add cache layer to fetchOpinionsByPostId
  • Implement muted users caching
  • Test personalization (cache + in-memory filter)

Week 3: Cache invalidation

  • Integrate with vote buffer flush
  • Integrate with math-updater
  • Test invalidation timing

Week 4: Production deployment

  • Deploy with feature flag
  • Monitor cache hit rate, query reduction
  • Gradual rollout (10% → 100%)
Risks & Mitigations

Risk 1: Stale data between flushes

  • Mitigation: Frontend optimistic update (user sees vote instantly); 1-sec lag only for OTHER users (acceptable)

Risk 2: Redis failure

  • Mitigation: Fail-safe design (returns null → queries DB); health checks + auto-restart

Risk 3: Memory exhaustion

  • Mitigation: maxmemory-policy LRU; monitor at 80%; estimated 2.5MB usage (well under 256MB)

4. Write DB Bottleneck: Opinion Counter Updates and Batched Vote Processing

Problem: Hot Row Contention on Opinion and Conversation Tables

Every vote triggers immediate UPDATEs to both opinion and conversation tables:

// Opinion counter update (services/api/src/service/voting.ts:302-314)
await db.update(opinionTable)
    .set({
        numAgrees: sql`${opinionTable.numAgrees} + ${numAgreesDiff}`,
        numDisagrees: sql`${opinionTable.numDisagrees} + ${numDisagreesDiff}`,
        numPasses: sql`${opinionTable.numPasses} + ${numPassesDiff}`,
    })
    .where(eq(opinionTable.id, commentData.commentId));

// Conversation counter update (services/shared-backend/src/conversationCounters.ts)
await db.update(conversationTable)
    .set({
        voteCount: sql`${conversationTable.voteCount} + ${delta}`,
    })
    .where(eq(conversationTable.id, conversationId));

Impact:

  • 100 concurrent votes on same opinion = 100 transactions waiting for same row lock
  • PostgreSQL row-level lock contention = high CPU (lock management overhead)
  • Write DB CPU spikes to 80-90% during voting bursts
  • Index updates and WAL writes on every vote
  • Multiple hot rows: opinion table + conversation table

Critical Constraint: Counter-Math Consistency

Opinion counters are displayed in the UI and must stay synchronized with vote data that math-updater processes:

// Frontend calculates percentages (ConsensusItem.vue:59-65)
const numNoVotesForVisualizer = computed(() =>
    props.opinionItemForVisualizer.numParticipants -
    props.opinionItemForVisualizer.numAgrees -
    props.opinionItemForVisualizer.numPasses -
    props.opinionItemForVisualizer.numDisagrees
);

// VoteCountVisualizer.vue
percentage = (numAgrees / numParticipants) * 100

// Math-updater uses actual vote records (polisMathUpdater.ts:885-903)
const votes = await db.select({
    vote: voteContentTable.vote  // ← ACTUAL votes, not counters!
})
.from(voteTable)
.innerJoin(voteContentTable, ...)

The >100% Problem: If counters update in real-time but cluster stats lag (math runs every 20s), displayed percentages become inconsistent:

  • Grand total: 15 votes (real-time counter)
  • Group 0: 5 votes, Group 1: 5 votes (stale from last math run)
  • Groups sum to 10, but total is 15 ❌

Solution: Batched Vote Processing

Status: ✅ Completed (October 28, 2025) - Phase 1 Implementation

Votes are buffered in memory (with optional Redis persistence) and flushed in batches every 1 second. This dramatically reduces lock contention while maintaining counter-math consistency.

Architecture:

User Vote → Buffer (in-memory + Redis) → Return 200 OK
              ↓ (every 1 second)
            Single transaction per batch:
              - Bulk check existing votes (1 query)
              - INSERT all new votes
              - UPDATE vote_table using CASE WHEN (1 query)
              - UPDATE opinion counters (1 query per opinion)
              - UPDATE conversation voteCount (1 batched query)
              - UPDATE conversation participantCount (1 batched query)
              - UPDATE conversation queue

Key Features:

  • Dual storage: In-memory Map + optional Redis list for persistence across restarts
  • Deduplication: Last-write-wins per (userId, opinionId) key
  • Bulk operations: Single query for vote existence checks, batched UPDATEs
  • CASE WHEN: Single UPDATE query per table using SQL CASE statements
  • Counter reconciliation: Detects new/updated votes, calculates deltas, applies in batch
  • ParticipantCount tracking: Queries existing participants with same WHERE clauses as math-updater
  • Comprehensive logging: Vote breakdowns, counter deltas, participant detection
  • Transaction batching: Splits large batches (>1000 votes) into multiple transactions
  • Graceful degradation: Re-adds failed votes to buffer on transaction failure

Counter Update Strategy:

  1. Opinion counters (numAgrees, numDisagrees, numPasses):

    • Accumulate deltas per opinion across all votes in batch
    • Single UPDATE per opinion with calculated deltas
    • Handles vote changes (agree→disagree) correctly
  2. Conversation voteCount:

    • Track +1 (new vote), -1 (cancel), 0 (change) per vote
    • Aggregate deltas per conversation
    • Single batched UPDATE using CASE WHEN for all conversations
  3. Conversation participantCount:

    • Detect potential new participants (first votes or restored votes)
    • Query existing participants with WHERE clauses matching calculateConversationCounters:
      • isNotNull(opinionTable.currentContentId) - exclude deleted opinions
      • isNotNull(voteTable.currentContentId) - exclude deleted votes
      • isNull(opinionModerationTable.id) - exclude moderated opinions
      • eq(userTable.isDeleted, false) - exclude deleted users
    • Calculate true new participant count (not in existing set)
    • Single batched UPDATE using CASE WHEN for all conversations

Integration Points:

  1. API Initialization (services/api/src/index.ts):

    • Creates vote buffer singleton with database + optional Redis connection
    • Registers graceful shutdown handler to flush pending votes
    • Passes voteBuffer to post/comment service functions
  2. Vote Endpoint (services/api/src/service/voting.ts):

    • Validates vote, checks conversation lock status
    • Adds vote to buffer via voteBuffer.add({vote})
    • Returns immediately (200 OK) - vote flushes within 1 second
  3. Opinion Creation (services/api/src/service/comment.ts):

    • Creates opinion in transaction
    • Adds auto-vote (author's agree) to buffer
    • Updates opinion count via updateOpinionCount, which enqueues conversation for math update
  4. Counter Reconciliation (services/shared-backend/src/conversationCounters.ts):

    • Converted to object parameters with optional defaults
    • Used by both vote buffer AND math-updater for consistency
    • Functions: updateVoteCount, updateOpinionCount, reconcileConversationCounters

Expected Impact:

  • 90-95% reduction in opinion UPDATE frequency
    • Before: 100 votes = 100 UPDATEs per opinion per second
    • After: 100 votes = 1 UPDATE per opinion per second (batched)
  • 90-95% reduction in conversation table UPDATE frequency
    • Before: Every vote updates voteCount/participantCount immediately
    • After: Single batched UPDATE per second for all affected conversations
  • 80-90% reduction in write DB CPU load
  • Eliminates lock contention: Multiple votes on same opinion/conversation no longer block
  • Counter-math consistency guaranteed: WHERE clauses match between voteBuffer and math-updater
  • Acceptable UX: <1-second lag (frontend optimistic update provides instant feedback)

Redis Integration (optional, for multi-instance deployments):

  • In-memory buffer as primary (fast, survives flush cycles)
  • Redis list as secondary (persists across restarts)
  • Merged during flush with last-write-wins deduplication
  • Benefits: Survives API restarts, works across multiple instances without coordination

Implementation Status:

  • ✅ Core vote buffer with dual storage (in-memory + Redis)
  • ✅ Bulk vote existence checks (single query per batch)
  • ✅ CASE WHEN batched UPDATEs for vote_table
  • ✅ Opinion counter delta accumulation and batching
  • ✅ Conversation voteCount batched updates
  • ✅ Conversation participantCount with WHERE clause consistency
  • ✅ Comprehensive logging throughout flush process
  • ✅ Transaction batching (splits >1000 votes)
  • ✅ Graceful degradation (re-adds failed votes to buffer)
  • ✅ Integrated with API endpoints (voting.ts, comment.ts)
  • ✅ Object parameters for conversationCounters.ts functions
  • ✅ All TypeScript type errors resolved

Files Modified:

  • services/api/src/service/voteBuffer.ts - New vote buffer implementation (760 lines)
  • services/api/src/service/voting.ts - Updated to use vote buffer
  • services/api/src/service/comment.ts - Updated to use vote buffer for auto-votes
  • services/api/src/service/post.ts - Pass voteBuffer parameter through
  • services/api/src/service/account.ts - Updated deleteOpinionBySlugId calls
  • services/api/src/index.ts - Initialize vote buffer singleton
  • services/shared-backend/src/conversationCounters.ts - Converted to object parameters
  • services/shared-backend/src/redis.ts - Added Redis client initialization

Next Steps:

  • Load test to measure actual write DB CPU reduction
  • Monitor vote buffer flush metrics in staging
  • Add Prometheus metrics for buffer size and flush duration

Ready for Phase 2: Redis caching layer (see section 3 above) can now be implemented effectively since vote batching provides efficient cache invalidation triggers (1-second intervals instead of per-vote)

Race Condition: Duplicate Vote Records (Future Concern)

Status: ⚠️ NOT CURRENT ISSUE - Frontend prevents all race scenarios, but MUST FIX before enabling vote editing

Current Frontend Behavior:

  • ✅ Prevents double-clicks (button disabled after click)
  • ✅ Prevents vote changes (cannot change vote after casting)
  • Result: Race condition cannot occur in production today

Future Feature: When vote editing is enabled in frontend, race condition will become exploitable.

Data Model Context:

  • voteTable: One row per (user, opinion) - stores current_content_id pointer
  • voteContentTable: Vote history - each vote/change adds new row
  • Vote change: INSERT new voteContent row + UPDATE voteTable.current_content_id (not UPDATE vote data itself)
The Problem (When Vote Editing Enabled)

Vote Existence Check (voting.ts:163-178):

const existingVoteTableResponse = await db
    .select({
        optionChosen: voteContentTable.vote,
        voteTableId: voteTable.id,
    })
    .from(voteTable)
    .leftJoin(voteContentTable, eq(voteContentTable.id, voteTable.currentContentId))
    .where(and(
        eq(voteTable.authorId, userId),
        eq(voteTable.opinionId, commentData.commentId)
    ));

if (existingVoteTableResponse.length == 0) {
    // No vote found → INSERT new voteTable row
} else {
    // Vote exists → INSERT new voteContentTable row + UPDATE voteTable.current_content_id
}

Race Condition Scenario (When Vote Editing Allowed):

Time 0ms:  Request A - SELECT voteTable → no vote found
Time 5ms:  Request B - SELECT voteTable → no vote found (A hasn't committed)
Time 50ms: Request A - INSERT voteTable (id=100) + INSERT voteContent (agree)
           UPDATE opinion (numAgrees +1)
Time 60ms: Request B - INSERT voteTable (id=101) + INSERT voteContent (disagree) ← DUPLICATE ROW!
           UPDATE opinion (numDisagrees +1)
Result: ❌ TWO voteTable rows, user has TWO votes (agree AND disagree simultaneously)

Root Cause: Classic check-then-act race condition

  1. CHECK: SELECT voteTable to see if vote exists
  2. ACT: INSERT voteTable row based on check result
  3. GAP: Between check and act, concurrent request completes same sequence

Current Protection: Frontend UX only (not database-level)

The Solution (Required Before Vote Editing)

Database UNIQUE Constraint:

-- Migration (services/api/database/flyway/...)
-- MUST RUN BEFORE enabling vote editing in frontend
ALTER TABLE vote
ADD CONSTRAINT vote_author_opinion_unique
UNIQUE (author_id, opinion_id);

Implementation Pattern: Insert-or-Update on Conflict

// services/api/src/services/voteBuffer.ts

async flush(): Promise<void> {
    await this.db.transaction(async (tx) => {
        for (const vote of batch) {
            let voteTableId: number;
            let existingVote: VotingOption | null = null;

            try {
                // Optimistic INSERT into voteTable
                const result = await tx.insert(voteTable)
                    .values({
                        authorId: vote.userId,
                        opinionId: vote.opinionId,
                        currentContentId: null,
                    })
                    .returning({ voteTableId: voteTable.id });

                voteTableId = result[0].voteTableId;

            } catch (error) {
                if (error.code === '23505') { // Postgres unique violation
                    // Race condition: voteTable row exists (concurrent request won)
                    const result = await tx.select({
                        voteTableId: voteTable.id,
                        optionChosen: voteContentTable.vote
                    })
                    .from(voteTable)
                    .leftJoin(voteContentTable, eq(voteContentTable.id, voteTable.currentContentId))
                    .where(and(
                        eq(voteTable.authorId, vote.userId),
                        eq(voteTable.opinionId, vote.opinionId)
                    ));

                    voteTableId = result[0].voteTableId;
                    existingVote = result[0].optionChosen;
                } else {
                    throw error;
                }
            }

            // INSERT new voteContent row (history entry)
            const voteContentResponse = await tx.insert(voteContentTable)
                .values({ voteId: voteTableId, vote: vote.vote, ... })
                .returning({ id: voteContentTable.id });

            // UPDATE voteTable.current_content_id
            await tx.update(voteTable)
                .set({ currentContentId: voteContentResponse[0].id })
                .where(eq(voteTable.id, voteTableId));

            // Calculate counter delta: existingVote → newVote
            // Example: agree → disagree = {numAgrees: -1, numDisagrees: +1}
        }
    });
}

Why This Works:

  • Database enforces one-vote-per-user-per-opinion (cannot have duplicate voteTable rows)
  • Handles concurrent vote changes (re-query finds existing → correct delta)
  • voteContentTable preserves history (always INSERT new row)
  • No distributed locking needed (constraint handles atomicity)

Before Enabling Vote Editing:

  1. ✅ Add unique constraint migration
  2. ✅ Update vote buffer flush logic (handle constraint violations)
  3. ✅ Test concurrent vote changes (load test)
  4. ✅ Then enable vote editing in frontend

Impact:

  • ✅ Prevents duplicate voteTable rows when vote editing enabled
  • ✅ Maintains counter accuracy under concurrent changes
  • ✅ No performance overhead (unique index is fast)

Additional Optimization: Conversation Queue Write Contention

Issue: conversation_update_queue also experiences high write volume (every vote upserts)

Mitigation 1: Batched votes reduce queue writes by 90%+ (one UPSERT per conversation per second instead of per vote)

Mitigation 2: Redis deduplication for additional reduction:

const cacheKey = `conv_queue:${conversationId}`;
const alreadyQueued = await redis.get(cacheKey);

if (!alreadyQueued) {
    await db.insert(conversationUpdateQueueTable)...
    await redis.setex(cacheKey, 2, '1'); // 2-second dedup window
}

Combined Impact: 95-98% reduction in conversation queue contention


Test Configuration

Load Test: services/load-testing/scripts/run-scenario1-with-monitoring.sh

Target: SIP3Kg conversation (113K votes, 19K+ opinions)

Pattern:

  • Ramp 1 → 50 VUs over 2 minutes
  • Then 50 → 200 VUs over 18 minutes
  • Each user: 5-10 opinions, 5-15 votes
  • Sleep between actions: 500-2000ms
  • These are VERY active users - real-world equivalent is likely several thousand normal users

Results:

  • Previously: Failed at 28 VUs (system hung)
  • Now: Handles 200 VUs successfully on a very large conversation

Files Modified

Read Replica

  • services/shared-backend/src/db.ts - Drizzle withReplicas() for automatic routing

Queue Table

  • services/shared-backend/src/schema.ts - Added conversationUpdateQueueTable
  • services/api/src/service/comment.ts - Opinion create/delete queue inserts
  • services/api/src/service/voting.ts - Vote cast/cancel queue inserts
  • services/api/src/service/moderation.ts - Moderation queue inserts
  • services/math-updater/src/jobs/scanConversations.ts - Read from queue
  • services/math-updater/src/jobs/updateConversationMath.ts - Process queue entries

Counter Reconciliation

  • services/math-updater/src/conversationCounters.ts - Counter recalculation logic (219 lines)

Polis Math Fixes

  • services/python-bridge/pyproject.toml - Red-dwarf version rollback
  • services/math-updater/src/services/polisMathUpdater.ts - Batch processing (1000 per batch)
  • services/math-updater/src/index.ts:168 - Singleton queue policy

Conclusion

The system originally exhibited critical database lock contention due to read/write lock competition on the conversation table. The read replica implementation (~90% of improvement) combined with queue-based architecture and counter reconciliation resolved the issue completely.

Key Takeaway: Read replica was the breakthrough. All other optimizations are supporting improvements that reduce write pressure and improve efficiency.

Production Status: ✅ System handles load that previously caused complete failure. Tested with 200 very active concurrent users on a 113K vote conversation, which likely represents several thousand normal users in production.