Skip to content

Latest commit

 

History

History
756 lines (588 loc) · 22.6 KB

File metadata and controls

756 lines (588 loc) · 22.6 KB

NexusPay Documentation

This document provides detailed architectural insights, design decisions, and implementation details for the NexusPay digital wallet platform.

Table of Contents

Core Architectural Decisions

1. Microservices Architecture with Domain-Driven Design

Decision: Split the application into distinct services based on business domains:

  • Account Service: Manages user accounts, balances, and account-related operations
  • Transaction Service: Handles transaction initiation, validation, and logging
  • API Gateway: Single entry point for all client requests
  • Discovery Server: Service registry for dynamic service discovery

Rationale:

  • Isolation: Each service can fail independently without affecting others
  • Scalability: Services can be scaled independently based on load patterns
  • Technology Diversity: Different services can use different technologies if needed
  • Team Autonomy: Different teams can own and develop services independently
  • Data Ownership: Each service owns its data, preventing tight coupling

2. Event-Driven Architecture with Apache Kafka

Decision: Use Kafka as the central nervous system for asynchronous communication between services.

Why Kafka over alternatives:

  • Durability: Messages are persisted to disk and replicated, ensuring no data loss
  • Ordering: Kafka guarantees message ordering within partitions
  • Replay Capability: Can replay events for recovery or new consumers
  • High Throughput: Can handle millions of messages per second
  • Stream Processing: Enables real-time analytics and event streaming

KRaft Mode: Eliminates Zookeeper dependency, simplifying deployment and reducing operational overhead.

3. Two-Layer Idempotency Strategy

Problem: In distributed systems, network failures can cause duplicate requests or message deliveries.

Solution: Implement idempotency at two critical layers:

API Layer (HTTP Idempotency)

  • Mechanism: Client sends Idempotency-Key header with each request
  • Storage: Redis stores processed keys with TTL
  • Behavior: If duplicate key detected, return original response without processing

Consumer Layer (Message Idempotency)

  • Mechanism: Database unique constraint on transaction IDs
  • Storage: PostgreSQL table tracks processed events
  • Behavior: If duplicate event detected, skip processing silently

Why Both Layers:

  • API layer prevents duplicate client requests (user double-clicks, network retries)
  • Consumer layer prevents duplicate Kafka message delivery (at-least-once semantics)

4. Transactional Outbox Pattern

Problem: Need to ensure that database updates and event publishing happen atomically.

Solution:

  1. Save transaction intent to local database
  2. Publish event to Kafka
  3. Both operations wrapped in single database transaction

Benefits:

  • Atomicity: Either both succeed or both fail
  • Consistency: No orphaned database records or missed events
  • Reliability: Event publishing is guaranteed if database write succeeds

5. Pessimistic Locking for Concurrency Control

Problem: Multiple concurrent transactions could cause race conditions in balance updates.

Solution: Use PESSIMISTIC_WRITE locks when updating account balances.

Trade-offs:

  • Pros: Prevents data corruption, ensures consistency
  • Cons: Reduces throughput, potential for deadlocks
  • Justification: In financial systems, correctness is more important than performance

6. Database-per-Service Pattern

Decision: Each service has its own dedicated database instance.

Benefits:

  • Data Isolation: Services cannot directly access other services' data
  • Technology Choice: Each service can choose optimal database technology
  • Independent Scaling: Databases can be scaled independently
  • Fault Isolation: Database failures affect only one service

7. API Gateway Pattern

Decision: Use Spring Cloud Gateway as single entry point.

Responsibilities:

  • Routing: Route requests to appropriate microservices
  • Load Balancing: Distribute requests across service instances
  • Cross-cutting Concerns: Authentication, rate limiting, logging (in production)
  • Protocol Translation: Convert external protocols to internal protocols

8. Service Discovery with Eureka

Decision: Use Netflix Eureka for service registration and discovery.

Benefits:

  • Dynamic Routing: Services register themselves automatically
  • Health Monitoring: Unhealthy instances are removed from registry
  • Load Balancing: Client-side load balancing with Ribbon
  • Resilience: No single point of failure for service location

Technology Stack & Justification

Category Technology Justification
Backend Framework Java 17, Spring Boot 3 Industry standard for enterprise microservices. Excellent ecosystem, mature tooling, and strong community support.
API Gateway Spring Cloud Gateway Reactive, non-blocking gateway with excellent Spring ecosystem integration. Supports load balancing and service discovery.
Service Discovery Netflix Eureka Battle-tested service registry with health monitoring and automatic failover capabilities.
Message Broker Apache Kafka (KRaft) Provides durability, ordering, and high throughput needed for financial transactions. KRaft mode eliminates Zookeeper complexity.
Primary Database PostgreSQL ACID compliance essential for financial data. Excellent performance, reliability, and advanced features like row-level locking.
Cache/Session Store Redis High-performance in-memory store perfect for idempotency keys and session management. Sub-millisecond latency.
Containerization Docker & Docker Compose Consistent environments across development and production. Simplified dependency management and deployment.
Build Tool Apache Maven Mature dependency management with excellent Spring Boot integration and plugin ecosystem.

Data Flow & Transaction Lifecycle

Transaction Processing Flow

  1. Client Request: User initiates a P2P transfer via mobile app
  2. API Gateway: Routes request to Transaction Service with load balancing
  3. Validation: Transaction Service validates request and checks business rules
  4. Persistence: Transaction intent saved to PostgreSQL with PENDING status
  5. Event Publishing: TransactionInitiated event published to Kafka topic
  6. Async Processing: Account Service consumes event from Kafka
  7. Balance Updates: Account Service updates balances with pessimistic locking
  8. Completion: Transaction status updated to COMPLETED or FAILED

Sequence Diagram

sequenceDiagram
    participant Client
    participant Gateway as API Gateway
    participant TxnSvc as Transaction Service
    participant Redis
    participant Kafka
    participant AcctSvc as Account Service
    participant DB as PostgreSQL

    Client->>Gateway: POST /api/transactions (Idempotency-Key)
    Gateway->>TxnSvc: Route request
    TxnSvc->>Redis: Check idempotency key
    Redis-->>TxnSvc: Key not found
    TxnSvc->>DB: Save transaction (PENDING)
    TxnSvc->>Kafka: Publish TransactionInitiated event
    TxnSvc->>Redis: Store idempotency key + response
    TxnSvc-->>Gateway: Return transaction details
    Gateway-->>Client: 201 Created

    Kafka->>AcctSvc: Deliver event
    AcctSvc->>DB: Check if already processed
    AcctSvc->>DB: Lock accounts (PESSIMISTIC_WRITE)
    AcctSvc->>DB: Update balances
    AcctSvc->>DB: Mark transaction as processed
    AcctSvc->>Kafka: Publish TransactionCompleted event
Loading

Error Handling & Recovery

Scenario 1: Account Service Down

Flow:

  1. Transaction Service continues accepting requests
  2. Events accumulate in Kafka topics
  3. When Account Service recovers, it processes all missed events
  4. Kafka's durability ensures no transaction is lost

Recovery Process:

  • Kafka retains messages based on retention policy
  • Consumer groups track processing offsets
  • Service restart resumes from last committed offset

Scenario 2: Database Connection Failure

Flow:

  1. Service immediately returns HTTP 503 (Service Unavailable)
  2. No partial state changes due to transactional boundaries
  3. Client can safely retry with same Idempotency-Key

Resilience Mechanisms:

  • Connection pooling with health checks
  • Automatic connection retry with exponential backoff
  • Circuit breaker pattern to prevent cascade failures

Scenario 3: Kafka Unavailable

Flow:

  1. Transaction Service returns HTTP 503 error (fails fast)
  2. No transaction is accepted unless it can be reliably processed
  3. Prevents data inconsistency between services

Monitoring:

  • Kafka broker health checks
  • Producer acknowledgment timeouts
  • Consumer lag monitoring

Scenario 4: Partial Network Failure

Problem: Transaction Service can reach database but not Kafka

Solution:

  • Use transactional outbox pattern
  • Store events in database table first
  • Separate process publishes events to Kafka
  • Ensures eventual consistency

Performance Considerations

Throughput Optimization

Kafka Partitioning Strategy

Partition Key: hash(accountId) % partition_count

Benefits:

  • Transactions for same account processed in order
  • Parallel processing across different accounts
  • Even distribution of load

Database Connection Pooling

# HikariCP Configuration
spring.datasource.hikari:
  maximum-pool-size: 20
  minimum-idle: 5
  connection-timeout: 30000
  idle-timeout: 600000
  max-lifetime: 1800000

Async Processing Benefits

  • Non-blocking I/O prevents thread starvation
  • Reactive streams handle backpressure
  • Event-driven eliminates polling overhead

Latency Optimization

Redis Caching Strategy

Key Pattern: "idempotency:{key}"
TTL: 24 hours

Cache Hit Scenarios:

  • Duplicate API requests (immediate response)
  • Recently accessed account data
  • Transaction status lookups

Database Indexing Strategy

-- Primary indexes for fast lookups
CREATE INDEX idx_account_user_id ON accounts(user_id);
CREATE INDEX idx_transaction_from_account ON transactions(from_account_id);
CREATE INDEX idx_transaction_to_account ON transactions(to_account_id);
CREATE INDEX idx_transaction_status ON transactions(status);
CREATE INDEX idx_transaction_timestamp ON transactions(created_at);

-- Composite indexes for complex queries
CREATE INDEX idx_account_balance_lookup ON accounts(user_id, status) WHERE status = 'ACTIVE';

Connection Optimization

  • Keep-alive connections reduce handshake overhead
  • Connection multiplexing for HTTP/2
  • Service mesh for inter-service communication

Production Deployment

Security Hardening

Implementation Details:

  • OAuth 2.0 with PKCE for mobile clients
  • JWT tokens with short expiration (15 minutes)
  • Refresh token rotation for security
  • Role-based access control (RBAC)

Network Security

# Security Groups (AWS Example)
api-gateway:
  ingress:
    - port: 443
      source: 0.0.0.0/0  # HTTPS only
  egress:
    - port: 8081-8082
      source: backend-sg

backend-services:
  ingress:
    - port: 8081-8082
      source: gateway-sg
  egress:
    - port: 5432
      source: database-sg
    - port: 9092
      source: kafka-sg

Observability & Monitoring

Distributed Tracing Implementation

// Example trace correlation
@RestController
public class TransactionController {
    
    @PostMapping("/transactions")
    @Traced(operationName = "create-transaction")
    public ResponseEntity<Transaction> createTransaction(
            @RequestBody TransactionRequest request,
            @RequestHeader("Idempotency-Key") String idempotencyKey) {
        
        Span span = tracer.nextSpan()
            .tag("transaction.amount", request.getAmount())
            .tag("idempotency.key", idempotencyKey);
            
        try (Tracer.SpanInScope ws = tracer.withSpanInScope(span)) {
            return transactionService.createTransaction(request, idempotencyKey);
        } finally {
            span.end();
        }
    }
}

Metrics Collection

# Prometheus metrics
metrics:
  - name: transaction_requests_total
    type: counter
    labels: [method, status, service]
  
  - name: transaction_duration_seconds
    type: histogram
    labels: [service, endpoint]
    
  - name: account_balance_updates_total
    type: counter
    labels: [account_type, currency]
    
  - name: kafka_messages_processed_total
    type: counter
    labels: [topic, partition, status]

Alerting Rules

# Critical Alerts
alerts:
  - name: HighTransactionFailureRate
    condition: |
      rate(transaction_requests_total{status="error"}[5m]) / 
      rate(transaction_requests_total[5m]) > 0.05
    severity: critical
    
  - name: DatabaseConnectionPoolExhausted
    condition: hikari_connections_active >= hikari_connections_max * 0.9
    severity: warning
    
  - name: KafkaConsumerLag
    condition: kafka_consumer_lag_sum > 1000
    severity: warning

Scalability Patterns

Horizontal Scaling Strategy

# Kubernetes HPA Example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: transaction-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: transaction-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Database Scaling

Read Replicas:

# PostgreSQL Read Replicas
datasource:
  primary:
    url: jdbc:postgresql://primary-db:5432/nexuspay
    username: ${DB_USER}
    password: ${DB_PASS}
  
  read-replicas:
    - url: jdbc:postgresql://replica-1:5432/nexuspay
    - url: jdbc:postgresql://replica-2:5432/nexuspay

Partitioning Strategy:

-- Partition transactions by date
CREATE TABLE transactions_2025_01 PARTITION OF transactions
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE transactions_2025_02 PARTITION OF transactions
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

Testing Strategy

Integration Testing

Test Containers for Realistic Testing

@SpringBootTest
@Testcontainers
class TransactionIntegrationTest {
    
    @Container
    static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:14")
            .withDatabaseName("testdb")
            .withUsername("test")
            .withPassword("test");
    
    @Container
    static KafkaContainer kafka = new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:7.4.0"));
    
    @Container
    static GenericContainer<?> redis = new GenericContainer<>("redis:alpine")
            .withExposedPorts(6379);
    
    @Test
    void shouldProcessTransactionEndToEnd() {
        // Given: Two accounts with sufficient balance
        Account fromAccount = createAccount("user1", new BigDecimal("1000"));
        Account toAccount = createAccount("user2", new BigDecimal("500"));
        
        // When: Transaction is initiated
        TransactionRequest request = TransactionRequest.builder()
            .fromAccountId(fromAccount.getId())
            .toAccountId(toAccount.getId())
            .amount(new BigDecimal("100"))
            .build();
            
        String idempotencyKey = UUID.randomUUID().toString();
        
        ResponseEntity<Transaction> response = restTemplate.postForEntity(
            "/api/transactions",
            request,
            Transaction.class,
            headers("Idempotency-Key", idempotencyKey)
        );
        
        // Then: Transaction should be successful
        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.CREATED);
        
        // And: Balances should be updated
        await().atMost(Duration.ofSeconds(10)).untilAsserted(() -> {
            Account updatedFrom = accountRepository.findById(fromAccount.getId()).orElseThrow();
            Account updatedTo = accountRepository.findById(toAccount.getId()).orElseThrow();
            
            assertThat(updatedFrom.getBalance()).isEqualTo(new BigDecimal("900"));
            assertThat(updatedTo.getBalance()).isEqualTo(new BigDecimal("600"));
        });
    }
}

Load Testing

Performance Test Scenarios

// k6 Load Test Script
import http from 'k6/http';
import { check } from 'k6';
import { uuidv4 } from 'https://jslib.k6.io/k6-utils/1.4.0/index.js';

export let options = {
  stages: [
    { duration: '2m', target: 100 }, // Ramp up
    { duration: '5m', target: 100 }, // Steady state
    { duration: '2m', target: 200 }, // Spike test
    { duration: '5m', target: 200 }, // Steady high load
    { duration: '2m', target: 0 },   // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<1000'], // 95% under 1s
    http_req_failed: ['rate<0.1'],     // Error rate under 10%
  },
};

export default function() {
  // Create transaction
  const transactionPayload = JSON.stringify({
    fromAccountId: 'acc-123',
    toAccountId: 'acc-456',
    amount: '10.00'
  });
  
  const headers = {
    'Content-Type': 'application/json',
    'Idempotency-Key': uuidv4()
  };
  
  const response = http.post(
    'http://localhost:8080/api/transactions',
    transactionPayload,
    { headers }
  );
  
  check(response, {
    'transaction created': (r) => r.status === 201,
    'response time < 1000ms': (r) => r.timings.duration < 1000,
  });
}

Chaos Engineering

Failure Injection Tests

# Chaos Monkey Configuration
chaos:
  monkey:
    enabled: true
    watcher:
      enabled: true
    assaults:
      - level: 5
        runtime:
          - assault: killApplication
            level: 5
          - assault: latencyAssault
            level: 3
            latencyRangeStart: 1000
            latencyRangeEnd: 5000
          - assault: memoryAssault
            level: 2
            memoryFillIncrementFraction: 0.15

Future Enhancements

Additional Services

Fraud Detection Service

Purpose: Real-time fraud detection using machine learning

Implementation:

graph LR
    TxnService --> FraudML[Fraud Detection ML]
    FraudML --> RiskScore[Risk Score]
    RiskScore --> Decision{Risk Level}
    Decision -->|Low| Approve[Auto Approve]
    Decision -->|Medium| Review[Manual Review]
    Decision -->|High| Block[Auto Block]
Loading

Features:

  • Real-time scoring (< 100ms)
  • Machine learning models for pattern detection
  • Rule-based checks for known fraud patterns
  • Integration with external fraud databases

Notification Service

Purpose: Real-time push notifications for transaction events

Architecture:

notification-service:
  inputs:
    - kafka-topic: transaction-events
    - kafka-topic: account-events
  outputs:
    - push-notifications: Firebase/APNS
    - email: SendGrid/SES
    - sms: Twilio

Advanced Features

Multi-Currency Support

Database Schema:

-- Enhanced account table
ALTER TABLE accounts ADD COLUMN currency VARCHAR(3) NOT NULL DEFAULT 'USD';
ALTER TABLE accounts ADD COLUMN exchange_rate DECIMAL(10,6);

-- Exchange rates table
CREATE TABLE exchange_rates (
    id BIGSERIAL PRIMARY KEY,
    from_currency VARCHAR(3) NOT NULL,
    to_currency VARCHAR(3) NOT NULL,
    rate DECIMAL(10,6) NOT NULL,
    timestamp TIMESTAMP NOT NULL,
    UNIQUE(from_currency, to_currency, timestamp)
);

Real-time Exchange Rates:

@Service
public class ExchangeRateService {
    
    @Scheduled(fixedRate = 60000) // Update every minute
    public void updateExchangeRates() {
        // Fetch from external API (e.g., Fixer.io)
        List<ExchangeRate> rates = externalRateProvider.getCurrentRates();
        exchangeRateRepository.saveAll(rates);
    }
    
    public BigDecimal convertAmount(BigDecimal amount, String fromCurrency, String toCurrency) {
        if (fromCurrency.equals(toCurrency)) {
            return amount;
        }
        
        ExchangeRate rate = exchangeRateRepository
            .findLatestRate(fromCurrency, toCurrency)
            .orElseThrow(() -> new ExchangeRateNotFoundException());
            
        return amount.multiply(rate.getRate());
    }
}

Operational Improvements

Blue-Green Deployment

# Kubernetes Blue-Green Deployment
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: transaction-service
spec:
  replicas: 5
  strategy:
    blueGreen:
      activeService: transaction-service-active
      previewService: transaction-service-preview
      autoPromotionEnabled: false
      scaleDownDelaySeconds: 30
      prePromotionAnalysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: transaction-service-preview
      postPromotionAnalysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: transaction-service-active

Circuit Breaker Implementation

@Service
public class AccountServiceClient {
    
    private final CircuitBreaker circuitBreaker;
    
    public AccountServiceClient() {
        this.circuitBreaker = CircuitBreaker.ofDefaults("accountService");
        circuitBreaker.getEventPublisher()
            .onStateTransition(event -> 
                log.info("Circuit breaker state transition: {}", event));
    }
    
    public Account getAccount(String accountId) {
        Supplier<Account> decoratedSupplier = CircuitBreaker
            .decorateSupplier(circuitBreaker, () -> {
                return restTemplate.getForObject(
                    "/accounts/" + accountId, 
                    Account.class
                );
            });
            
        return Try.ofSupplier(decoratedSupplier)
            .recover(throwable -> {
                log.error("Failed to get account {}: {}", accountId, throwable.getMessage());
                throw new ServiceUnavailableException("Account service is down");
            })
            .get();
    }
}

This comprehensive documentation provides the detailed architectural insights, implementation patterns, and operational considerations needed to understand and extend the NexusPay platform.