Skip to content

A CUDA-accelerated blockchain miner that parallelizes Merkle tree computation and Proof of Work for dramatic performance gains.

Notifications You must be signed in to change notification settings

robertpaulp/blockchain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

banner

CUDA Merkel Root Tree & Proof of Work

Introduction

This project implements a blockchain mining simulation using CUDA to accelerate two critical components: Merkle Root computation and Proof of Work nonce finding. The implementation demonstrates how GPU parallelization can significantly improve the performance of blockchain operations compared to traditional CPU-based approaches.

Project Structure

The project consists of two main components:

  • CPU Miner (cpu_miner/): Reference implementation showing the serial CPU approach
  • GPU Miner (gpu_miner/): Optimized CUDA implementation with parallelized Merkle Root computation and nonce finding

Algorithm Overview

Blockchain Consensus Mechanism

  • Nodes/Miners: Participate in the consensus process without trusting each other
  • Mathematical Problem: Miners solve computationally complex problems to gain trust
  • Dynamic Difficulty: Problem complexity increases with more participants

Block Structure

Block Structure

  • Previous Block Hash: Links to the previous block in the chain
  • Merkle Root: Hash of all transactions in the block using a Merkle Tree
  • Nonce: 32-bit integer found through Proof of Work to meet difficulty requirements

CUDA Implementation Details

Merkle Root Computation

The Merkle Root is computed using two CUDA kernels:

  1. hash_kernel: Computes SHA-256 hashes for each initial transaction
  2. merge_kernel: Concatenates and hashes pairs of hashes to build the tree

Tree Construction Process:

Level 0 (initial):   [H0] [H1] [H2] [H3] [H4] [H5] [H6] [H7]
                      ↓↓   ↓↓   ↓↓   ↓↓
Level 1:            [H01] [H23] [H45] [H67]
                     ↓↓    ↓↓
Level 2:            [H0123] [H4567]
                      ↓↓
Level 3 (root):     [H01234567]

Nonce Finding (Proof of Work)

The implementation uses batch processing for nonce discovery:

  • Batch Division: INT32_MAX is divided into batches processed by different threads
  • Parallel Search: Multiple threads search different nonce ranges simultaneously
  • Early Termination: Stops searching once a valid nonce is found
  • Atomic Operations: Uses atomicMin to track the smallest valid nonce found

Memory Optimization Evolution:

  • Started with unified memory (cudaMallocManaged)
  • Experimented with shared memory for constant data
  • Final implementation uses cudaMalloc for optimal performance

Input/Output Format

Input File (<TEST_NAME>.in)

First line contains (in order):

  • number_of_transactions: Total transactions in file
  • prev_block_hash: Hash of the previous block
  • difficulty_zeros: Number of leading zeros required
  • max_nonce: Maximum nonce values to check
  • max_transactions_in_a_block: Maximum transactions per block
  • transaction_size: Size of each transaction in bytes

Subsequent lines contain the actual transactions.

Output File (<TEST_NAME>.out)

For each block:

BLOCK_ID,NONCE,BLOCK_HASH,TIME_FOR_MERKLE_ROOT_COMPUTATION,TIME_FOR_NONCE_COMPUTATION,TIME_SUM

Final line shows total times:

TOTAL_TIME_FOR_MERKLE_ROOT_COMPUTATION,TOTAL_TIME_FOR_NONCE_COMPUTATION,TOTAL_TIME_SUM

Processing Pipeline

  1. Transaction Grouping: Transactions are grouped into blocks based on max_transactions_in_a_block
  2. Merkle Root Calculation: Compute Merkle Root for each block's transactions
  3. Block Formation: Concatenate previous block hash and Merkle Root
  4. Nonce Search: Find valid nonce that meets difficulty requirements
  5. Validation: Block is invalidated if no valid nonce is found within max_nonce

Build and Execution

Environment Setup

This project requires:

  • CUDA Toolkit (tested with 12.2.2)
  • NVIDIA GPU with compute capability 3.5+
  • Slurm workload manager
# Compile
make    

# Compile with specific compiler
make COMPILER=gcc build

# Compile with specific compiler and locally
make COMPILER=gcc build LOCAL=y

# Run specific test
make run TEST=<TEST_NAME>  # Example: make run TEST=test4

# Clean build artifacts
make clean

Performance Comparison

CPU vs GPU Execution

Performance Comparison Speedup Chart

Detailed Results

Test Component CPU Time (s) GPU Time (s) Speedup
Test 1 Merkle Root 0.00043 0.00484 0.09x
Test 1 Nonce Finding 12.86962 0.04931 261x
Test 1 Total 12.87005 0.05415 238x
Test 2 Merkle Root 0.00055 0.00790 0.07x
Test 2 Nonce Finding 12.04284 0.04940 244x
Test 2 Total 12.04339 0.05730 210x
Test 3 Merkle Root - 0.02439 -
Test 3 Nonce Finding - 32.15315 -
Test 3 Total - 32.17754
Test 4 Merkle Root 0.83722 0.04266 19.6x
Test 4 Nonce Finding 2.08098 0.01596 130x
Test 4 Total 2.91820 0.05862 49.8x

Speedup Breakdown

Test 3 results are not available for CPU due to excessive computation time.

The CUDA implementation demonstrates significant speedup over the CPU version:

  • Merkle Root: Parallel tree construction vs sequential hashing
  • Nonce Finding: Massive parallel search vs iterative trial-and-error

Key Optimizations

  • Parallel Tree Construction: Multiple levels of Merkle tree built concurrently
  • Batch Nonce Processing: Divides search space across GPU threads
  • Memory Management: Optimized data transfer between host and device
  • Early Termination: Stops unnecessary computation once solution is found

Resources

About

A CUDA-accelerated blockchain miner that parallelizes Merkle tree computation and Proof of Work for dramatic performance gains.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published