Lambda Consensus Federation

A distributed consensus system that simulates AWS Lambda functions running locally using Docker containers with AWS Runtime Interface Emulator (RIE). The system implements a simplified Raft-like consensus algorithm for maintaining a consistent global count across five Java-based Lambda functions that communicate exclusively through SQS queues.

Overview

The Lambda Consensus Federation demonstrates:

Distributed Consensus: Five Lambda functions maintain consensus on a global count value
Fault Tolerance: Automatic recovery when individual Lambda containers are restarted
Message-Based Communication: All communication happens through local SQS queues
Quorum-Based Recovery: Minimum 3 out of 5 nodes required for recovery operations
Performance Optimization: Connection pooling, retry logic, and graceful shutdown handling

Key Features

✅ 5-Node Consensus: Distributed agreement across multiple Lambda instances
✅ SQS Communication: Local SQS queues for inter-Lambda messaging
✅ Automatic Recovery: Failed nodes recover state when restarted
✅ Comprehensive Logging: Structured JSON logging for all operations
✅ Performance Monitoring: Built-in metrics and performance tracking
✅ CLI Tools: Command-line utilities for testing and monitoring
✅ Docker Integration: Complete containerized environment

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Docker Environment                           │
├─────────────┬─────────────┬─────────────┬─────────────┬─────────┤
│  Lambda 1   │  Lambda 2   │  Lambda 3   │  Lambda 4   │Lambda 5 │
│ Java + RIE  │ Java + RIE  │ Java + RIE  │ Java + RIE  │Java+RIE │
└─────────────┴─────────────┴─────────────┴─────────────┴─────────┘
       │             │             │             │             │
       └─────────────┼─────────────┼─────────────┼─────────────┘
                     │             │             │
┌─────────────────────────────────────────────────────────────────┐
│                     Local SQS (ElasticMQ)                      │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐   │
│  │Queue-1  │ │Queue-2  │ │Queue-3  │ │Queue-4  │ │Queue-5  │   │
│  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘   │
└─────────────────────────────────────────────────────────────────┘

Consensus Protocol

Increment Request: External trigger initiates count increment
Proposal Phase: Receiving Lambda proposes new count to all peers
Voting Phase: Each Lambda validates and votes on the proposal
Commit Phase: If majority accepts, all Lambdas update their count
Recovery Phase: Restarted Lambdas request current state from peers

Prerequisites

Docker: Version 20.0 or higher
Docker Compose: Version 2.0 or higher
Java: JDK 21 (for local development)
Maven: Version 3.8 or higher (for building)

System Requirements

Memory: Minimum 4GB RAM (8GB recommended)
CPU: 2+ cores recommended
Disk: 2GB free space
Network: Docker networking enabled

Quick Start

1. Clone and Build

# Clone the repository
git clone <repository-url>
cd lambda-consensus-federation

# Build the project
mvn clean package

# Verify the build
ls -la target/lambda-consensus-federation-1.0.0.jar

2. Start the Federation

# Start all services
docker-compose up -d

# Verify all containers are running
docker-compose ps

# Check logs
docker-compose logs -f

3. Test Consensus

# Send increment request to any Lambda
./scripts/consensus-cli.sh increment lambda-node-1

# Check current count across all nodes
./scripts/consensus-cli.sh status

# Monitor logs in real-time
./scripts/view-logs.sh

4. Test Recovery

# Restart a single node
./scripts/restart-node.sh lambda-node-3

# Verify recovery
./scripts/consensus-cli.sh status

Configuration

Environment Variables

Each Lambda container supports the following environment variables:

Variable	Description	Default
`NODE_ID`	Unique identifier for the Lambda instance	`lambda-node-{timestamp}`
`SQS_ENDPOINT`	Local SQS endpoint URL	`http://localhost:9324`
`KNOWN_NODES`	Comma-separated list of peer node IDs	`lambda-node-1,lambda-node-2,...`
`LOG_LEVEL`	Logging verbosity level	`INFO`

Docker Compose Configuration

The docker-compose.yml file defines:

5 Lambda containers: Each with unique NODE_ID
ElasticMQ service: Local SQS implementation
Trigger service: Sends random increment requests
Shared network: For inter-container communication
Volume mounts: For log aggregation

SQS Configuration

ElasticMQ configuration in elasticmq.conf:

include classpath("application.conf")

node-address {
    protocol = http
    host = "*"
    port = 9324
    context-path = ""
}

rest-sqs {
    enabled = true
    bind-port = 9324
    bind-hostname = "0.0.0.0"
    sqs-limits = strict
}

Usage

CLI Commands

The system includes several CLI utilities:

Consensus CLI

# Send increment request
./scripts/consensus-cli.sh increment <node-id>

# Check status of all nodes
./scripts/consensus-cli.sh status

# Send batch requests for load testing
./scripts/consensus-cli.sh batch 10

# Monitor specific node
./scripts/consensus-cli.sh monitor lambda-node-1

System Management

# Start the consensus system
./scripts/start-consensus.sh

# Stop the consensus system
./scripts/stop-consensus.sh

# Restart specific node
./scripts/restart-node.sh <node-id>

# View aggregated logs
./scripts/view-logs.sh

# Run integration tests
./scripts/run-integration-tests.sh

Programmatic Usage

Java API

// Create consensus request
ConsensusRequest request = new ConsensusRequest(
    MessageType.INCREMENT_REQUEST,
    "source-node",
    "target-node",
    42L,
    "proposal-123",
    Map.of("metadata", "value")
);

// Send via SQS handler
SQSMessageHandler handler = new SQSMessageHandlerImpl("node-1");
boolean success = handler.sendMessage("lambda-node-2", request);

REST API (via CLI)

# Increment count
curl -X POST http://localhost:8080/increment \
  -H "Content-Type: application/json" \
  -d '{"targetNode": "lambda-node-1"}'

# Get status
curl http://localhost:8080/status

Monitoring

Structured Logging

All operations are logged in structured JSON format:

{
  "timestamp": "2024-01-15T10:30:45.123Z",
  "level": "INFO",
  "logger": "ConsensusLambdaHandler",
  "nodeId": "lambda-node-1",
  "operation": "CONSENSUS_OPERATION",
  "proposalId": "prop-123",
  "proposedValue": 42,
  "phase": "commit",
  "metadata": {
    "success": true,
    "duration": 1250
  }
}

Performance Metrics

Built-in performance tracking includes:

Consensus Duration: Time to complete consensus operations
Message Counts: Sent/received message statistics
Success Rates: Operation success/failure ratios
Recovery Times: Node recovery duration metrics

Log Analysis

Use the built-in log analyzer:

# Analyze consensus performance
java -cp target/lambda-consensus-federation-1.0.0.jar \
  com.example.consensus.logging.LogAnalyzerCLI \
  --analyze-consensus logs/

# Generate performance report
java -cp target/lambda-consensus-federation-1.0.0.jar \
  com.example.consensus.logging.LogAnalyzerCLI \
  --performance-report logs/

Real-time Monitoring

# Monitor all nodes
./scripts/view-logs.sh | grep "CONSENSUS_OPERATION"

# Monitor specific operations
docker-compose logs -f lambda-node-1 | jq '.operation'

# Watch consensus success rate
watch -n 5 './scripts/consensus-cli.sh status'

Testing

Unit Tests

# Run all unit tests
mvn test

# Run specific test class
mvn test -Dtest=ConsensusLambdaHandlerTest

# Run with coverage
mvn test jacoco:report

Integration Tests

# Run integration tests
mvn test -Dtest="*IntegrationTest"

# Run specific integration test
mvn test -Dtest=MultiNodeConsensusIntegrationTest

# Run with Docker containers
./scripts/run-integration-tests.sh

End-to-End Tests

# Full system test
mvn test -Dtest=EndToEndSystemTest

# Chaos testing
mvn test -Dtest=AdvancedChaosTest

# Performance testing
mvn test -Dtest=ConsensusPerformanceIntegrationTest

Load Testing

# Generate load with CLI
./scripts/consensus-cli.sh batch 100

# Concurrent load testing
for i in {1..5}; do
  ./scripts/consensus-cli.sh batch 20 &
done
wait

Troubleshooting

Common Issues

1. Containers Not Starting

Symptoms: Docker containers exit immediately or fail to start

Solutions:

# Check Docker daemon
docker info

# Verify image build
docker-compose build --no-cache

# Check resource limits
docker system df
docker system prune

2. SQS Connection Issues

Symptoms: "Connection refused" or SQS timeout errors

Solutions:

# Verify ElasticMQ is running
docker-compose ps elasticmq

# Check SQS endpoint
curl http://localhost:9324/

# Restart SQS service
docker-compose restart elasticmq

3. Consensus Failures

Symptoms: Nodes don't reach consensus or have different count values

Solutions:

# Check node status
./scripts/consensus-cli.sh status

# Verify all nodes are reachable
for i in {1..5}; do
  docker-compose exec lambda-node-$i echo "Node $i OK"
done

# Check for network partitions
docker network ls
docker network inspect lambda-consensus-federation_consensus-network

4. Recovery Issues

Symptoms: Restarted nodes don't recover or get stuck in recovery state

Solutions:

# Check quorum availability
./scripts/consensus-cli.sh status | grep -c "IDLE"

# Manually trigger recovery
./scripts/restart-node.sh lambda-node-1

# Verify recovery logs
docker-compose logs lambda-node-1 | grep "RECOVERY"

Debug Mode

Enable debug logging:

# Set debug level
export LOG_LEVEL=DEBUG

# Restart with debug logging
docker-compose down
docker-compose up -d

# View debug logs
docker-compose logs -f | grep "DEBUG"

Performance Issues

High Latency

# Check system resources
docker stats

# Monitor message queue lengths
curl http://localhost:9324/ | grep "ApproximateNumberOfMessages"

# Analyze consensus timing
grep "consensus_duration" logs/*.log | sort -n

Memory Issues

# Check memory usage
docker stats --format "table {{.Container}}\t{{.MemUsage}}\t{{.MemPerc}}"

# Increase container memory limits
# Edit docker-compose.yml:
# mem_limit: 1g

Log Analysis

# Find consensus failures
grep "consensus.*failed" logs/*.log

# Analyze recovery patterns
grep "RECOVERY" logs/*.log | cut -d' ' -f1-3 | sort | uniq -c

# Check message flow
grep "MESSAGE.*SENT\|MESSAGE.*RECEIVED" logs/*.log | head -20

Performance

Benchmarks

Typical performance characteristics:

Metric	Value	Notes
Consensus Latency	< 5 seconds	95th percentile
Throughput	1 op/10 seconds	Sequential operations
Recovery Time	< 30 seconds	With 3+ nodes available
Memory Usage	~512MB/node	Including JVM overhead
CPU Usage	< 10% idle	Burst during consensus

Optimization Tips

1. SQS Configuration

# Increase message batch size
export SQS_MAX_MESSAGES=10

# Reduce polling wait time for faster response
export SQS_WAIT_TIME=5

2. JVM Tuning

# In Dockerfile, add JVM options:
ENV JAVA_OPTS="-Xmx512m -Xms256m -XX:+UseG1GC -XX:MaxGCPauseMillis=200"

3. Connection Pooling

The system automatically uses optimized connection pooling:

SQS Client: Reused connections with timeout management
Thread Pool: Configurable pool size for message processing
Queue Caching: URL caching to reduce SQS API calls

Scaling Considerations

Horizontal Scaling

The current implementation supports exactly 5 nodes. To scale:

Update KNOWN_NODES environment variable
Modify quorum calculations in ConsensusManagerImpl
Add additional containers to docker-compose.yml

Vertical Scaling

# In docker-compose.yml
services:
  lambda-node-1:
    mem_limit: 1g
    cpus: '0.5'

Development

Building from Source

# Full build with tests
mvn clean package

# Skip tests for faster build
mvn clean package -DskipTests

# Build Docker images
docker-compose build

Code Structure

src/
├── main/java/com/example/consensus/
│   ├── handler/           # Lambda function handlers
│   ├── manager/           # Consensus algorithm implementation
│   ├── messaging/         # SQS message handling
│   ├── model/             # Data models and serialization
│   ├── state/             # Node state management
│   ├── logging/           # Structured logging and metrics
│   ├── cli/               # Command-line utilities
│   └── trigger/           # Trigger service for testing
└── test/java/com/example/consensus/
    ├── integration/       # Integration tests
    ├── handler/           # Handler unit tests
    ├── manager/           # Manager unit tests
    └── ...               # Other test packages

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit changes: git commit -m 'Add amazing feature'
Push to branch: git push origin feature/amazing-feature
Open a Pull Request

Code Style

Java: Follow Google Java Style Guide
Logging: Use structured logging with appropriate levels
Testing: Maintain >80% code coverage
Documentation: Update README for new features

Release Process

# Update version
mvn versions:set -DnewVersion=1.1.0

# Build and test
mvn clean package

# Create release
git tag -a v1.1.0 -m "Release version 1.1.0"
git push origin v1.1.0

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For questions, issues, or contributions:

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Wiki

Happy Consensus Building! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.kiro/specs/lambda-consensus-federation		.kiro/specs/lambda-consensus-federation
.vscode		.vscode
docs		docs
scripts		scripts
src		src
CLI_README.md		CLI_README.md
DOCKER_SETUP.md		DOCKER_SETUP.md
Dockerfile		Dockerfile
Dockerfile.trigger		Dockerfile.trigger
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
docker-compose.trigger.yml		docker-compose.trigger.yml
docker-compose.yml		docker-compose.yml
elasticmq.conf		elasticmq.conf
pom.xml		pom.xml

memestorm/lambda-poc

Folders and files

Latest commit

History

Repository files navigation

Lambda Consensus Federation

Table of Contents

Overview

Key Features

Architecture

Consensus Protocol

Prerequisites

System Requirements

Quick Start

1. Clone and Build

2. Start the Federation

3. Test Consensus

4. Test Recovery

Configuration

Environment Variables

Docker Compose Configuration

SQS Configuration

Usage

CLI Commands

Consensus CLI

System Management

Programmatic Usage

Java API

REST API (via CLI)

Monitoring

Structured Logging

Performance Metrics

Log Analysis

Real-time Monitoring

Testing

Unit Tests

Integration Tests

End-to-End Tests

Load Testing

Troubleshooting

Common Issues

1. Containers Not Starting

2. SQS Connection Issues

3. Consensus Failures

4. Recovery Issues

Debug Mode

Performance Issues

High Latency

Memory Issues

Log Analysis

Performance

Benchmarks

Optimization Tips

1. SQS Configuration

2. JVM Tuning

3. Connection Pooling

Scaling Considerations

Horizontal Scaling

Vertical Scaling

Development

Building from Source

Code Structure

Contributing

Code Style

Release Process

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages