Optimized Dilithium Implementation

Overview

This repository attempts at creating a highly optimized implementation of the CRYSTALS-Dilithium post-quantum digital signature scheme.

Currently, it achieves the following metrics (tested on standard hardware):

There's still a lot of room to improve on performance using GPU acceleration.

Performance Summary:
------------------
Key Generation:    65.80 ms
Signing Time:      81.17 ms
Verify Time:        0.28 ms
Total Time:       147.25 ms

Note: There are inaccuracies, the bounds are very extremely relaxed for learning purposes.

Mathematical Foundation

Ring Structure

We work in the polynomial ring Rq = Zq[X]/(X^n + 1) where:

q = 2^23 - 2^13 + 1 = 8,380,417 (Carefully chosen prime for efficient NTT)
n = 256 (Power of 2 for efficient NTT operations)

The choice of these parameters enables:

Efficient modular reduction
Fast NTT-based polynomial multiplication
Optimal security-performance tradeoff

Parameter Sets

We implement three security levels following NIST standards:

PARAMS = {
    2: {"k": 4, "l": 4, "eta": 2},  # NIST Security Level 2
    3: {"k": 6, "l": 5, "eta": 4},  # NIST Security Level 3
    5: {"k": 8, "l": 7, "eta": 2},  # NIST Security Level 5
}

Core Operations

The scheme consists of three main algorithms:

Key Generation:
- Generate random seed rho
- Derive matrix A in Rq^(k×l) from rho
- Sample small polynomials s1 in S_eta^l, s2 in S_eta^k
- Compute t = As1 + s2
- Output (rho,t) as public key and (s1,s2) as secret key
Signing:
- Sample y with coefficients in [-gamma1, gamma1]
- Compute w = Ay
- Generate challenge c using w
- Compute z = y + cs1
- Output signature (z,c,w)
Verification:
- Check that infinity norm of z < gamma1 - beta
- Verify challenge reconstruction
- Check bounds on w

Optimizations

1. Polynomial Operations

class Polynomial:
    N = 256
    Q = 8380417  # 2^23 - 2^13 + 1

Key optimizations:

Vectorized operations using NumPy for all polynomial arithmetic
Pre-allocated numpy arrays for coefficient storage
Cached helper arrays for multiplication
Efficient modular reduction using the special form of q

2. Matrix Operations

Matrix-vector multiplication optimization:

Cached matrix generation using LRU cache
Vectorized coefficient generation
Pre-allocated result vectors
Batch processing of polynomial multiplications

3. SHAKE128 Optimizations

class OptimizedHasher:
    def __init__(self):
        self._shake = SHAKE128.new()
        self._matrix_cache = {}

Improvements:

Single SHAKE instance reuse
Cached seed expansion
Vectorized coefficient extraction
Batch processing of challenge generation
Pre-allocated buffers for hash operations

4. Memory Management

Use of @lru_cache for frequently accessed computations
Pre-allocated numpy arrays
Efficient buffer reuse in hash operations
Minimized memory allocations in hot paths

5. Coefficient Generation

Vectorized generation of random coefficients
Efficient modular reduction using bit operations
Batch processing of polynomial coefficients
Optimized challenge polynomial generation

Performance Critical Sections

Polynomial Multiplication:

def __mul__(self, other):
    indices, neg_ones = self._get_multiplication_helper()
    result = np.zeros(self.N, dtype=np.int32)
    # Vectorized multiplication using numpy
    for i in range(self.N):
        prods = a[i] * b
        positions = (i + indices) % self.N
        signs = np.where(i + indices >= self.N, -1, 1)
        result = (result + signs * np.roll(prods, -i)) % self.Q

Matrix Generation:

def generate_matrix(self, seed: bytes, k: int, l: int) -> list:
    total_coeffs = k * l * Polynomial.N
    total_bytes = total_coeffs * CHUNK_SIZE
    all_data = np.frombuffer(shake.read(total_bytes), dtype=np.uint8)
    coeffs_array = all_data.reshape(-1, CHUNK_SIZE)

Challenge Generation:

def generate_challenge(self, message: bytes, public_key: tuple, w1: list) -> Polynomial:
    challenge_bytes = np.frombuffer(self._shake.read(TAU * 2), dtype=np.uint8)
    positions = (challenge_bytes[::2].astype(np.uint16) |
                (challenge_bytes[1::2].astype(np.uint16) << 8)) % Polynomial.N

Security Considerations

Implementation follows constant-time principles where possible
Careful bounds checking on all operations
Secure random number generation using secrets module
Protection against timing attacks in critical operations

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
dilithium		dilithium
examples		examples
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimized Dilithium Implementation

Overview

Mathematical Foundation

Ring Structure

Parameter Sets

Core Operations

Optimizations

1. Polynomial Operations

2. Matrix Operations

3. SHAKE128 Optimizations

4. Memory Management

5. Coefficient Generation

Performance Critical Sections

Security Considerations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

0xahzam/dilithium

Folders and files

Latest commit

History

Repository files navigation

Optimized Dilithium Implementation

Overview

Mathematical Foundation

Ring Structure

Parameter Sets

Core Operations

Optimizations

1. Polynomial Operations

2. Matrix Operations

3. SHAKE128 Optimizations

4. Memory Management

5. Coefficient Generation

Performance Critical Sections

Security Considerations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages