Skip to content

Witek902/Caissa

Repository files navigation

Caissa Chess Engine

C++ Standard Linux Build Status GitHub License GitHub Release

ArtImage

(image generated with DALL·E 2)

Overview

Caissa is a strong, UCI-compatible chess engine written from scratch in C++ since early 2021. It features a custom neural network evaluation system trained on over 17 billion self-play positions, achieving ratings of 3600+ ELO on major chess engine rating lists, placing it at around top-10 spot.

The engine is optimized for:

  • Regular Chess - Standard chess rules
  • FRC (Fischer Random Chess) - Chess960 variant
  • DFRC (Double Fischer Random Chess) - Extended FRC variant

Table of Contents

Playing Strength

Caissa consistently ranks among the top chess engines on major rating lists:

CCRL (Computer Chess Rating Lists)

List Rating Rank Version Notes
CCRL 40/2 FRC 4022 #6 1.23 Fischer Random Chess
CCRL Chess324 3770 #6 1.23 Chess324 variant
CCRL 40/15 3622 #9 1.23 4 CPU
CCRL Blitz 3755 #10 1.22 8 CPU

SPCC (Schachprogramm-Computer-Chess)

List Rating Rank Version
SPCC UHO-Top15 3697 #10 Caissa 1.24 avx512

IpMan Chess

List Rating Rank Version Architecture
10+1 (R9-7945HX) 3542 #16 1.24 AVX-512
10+1 (i9-7980XE) 3526 #14 1.21 AVX-512
10+1 (i9-13700H) 3544 #17 1.22 AVX2-BMI2

CEGT (Chess Engine Grand Tournament)

List Rating Rank Version
CEGT 40/20 3576 #8 1.24
CEGT 40/4 3614 #8 1.22
CEGT 5+3 3618 #5 1.22

Note: The rankings above may be outdated.

Features

General

  • UCI Protocol - Full Universal Chess Interface support
  • Neural Network Evaluation - Custom NNUE-style evaluation
  • Endgame Tablebases - Syzygy and Gaviota support
  • Chess960 Support - Fischer Random Chess (FRC) and Double FRC

Search Algorithm

  • Negamax with alpha-beta pruning
  • Iterative Deepening with aspiration windows
  • Principal Variation Search (PVS)
  • Quiescence Search for tactical positions
  • Transposition Table with large pages support
  • Multi-PV Search - Analyze multiple lines simultaneously
  • Multithreaded Search - Parallel search with shared TT

Neural Network Evaluation

  • Architecture: (32×768→1024)×2→1
  • Incremental Updates - Efficiently updated first layer
  • Vectorized Code - Manual SIMD optimization for:
    • AVX-512 (fastest)
    • AVX2
    • SSE2
    • ARM NEON
  • Activation: Clipped-ReLU
  • Variants: 8 variants of last layer weights (piece count dependent)
  • Features: Absolute piece coordinates with horizontal symmetry, 32 king buckets
  • Special Endgame Routines - Enhanced endgame evaluation

Neural Network Trainer

  • Custom CPU-based Trainer using Adam algorithm
  • Highly Optimized - Exploits AVX instructions, multithreading, and network sparsity
  • Self-Play Training - Trained on 17+ billion positions from self-generated games
  • Progressive Training - Older games purged, networks trained on latest engine versions

Performance Optimizations

  • Magic Bitboards - Efficient move generation
  • Large Pages - Transposition table uses large pages for better performance
  • Node Caching - Evaluation result caching
  • Accumulator Caching - Neural network accumulator caching
  • Ultra-Fast - Outstanding performance at ultra-short time controls (sub-second games)

Quick Start

Using Pre-built Binaries

  1. Download the appropriate executable from the Releases page
  2. Choose the version matching your CPU:
    • AVX-512: Latest Intel Xeon/AMD EPYC (fastest)
    • BMI2: Most modern CPUs (recommended)
    • AVX2: Older CPUs with AVX2 support
    • POPCNT: Older CPUs with SSE4.2
    • Legacy: Very old x64 CPUs
  3. Run the engine with any UCI-compatible chess GUI

Running from Source

See the Compilation section below for detailed build instructions.

Compilation

Prerequisites

  • C++ Compiler with C++20 support:
    • GCC 10+ or Clang 12+ (Linux)
    • Visual Studio 2022 (Windows)
  • CMake 3.15 or later
  • Make (Linux) or Visual Studio (Windows)

Linux

Using Makefile (Quick Build)

cd src
make -j$(nproc)

Note: This compiles the default AVX2/BMI2 version.

Using CMake (Recommended)

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Final ..
make -j$(nproc)

Build Configurations:

  • Final - Production build, no asserts, maximum optimizations
  • Release - Development build with asserts, optimizations enabled
  • Debug - Development build with asserts, optimizations disabled

Architecture Selection:

To build for a specific architecture, set the TARGET_ARCH variable:

# AVX-512 (requires AVX-512 support)
cmake -DTARGET_ARCH=x64-avx512 -DCMAKE_BUILD_TYPE=Final ..
# BMI2 (recommended for modern CPUs)
cmake -DTARGET_ARCH=x64-bmi2 -DCMAKE_BUILD_TYPE=Final ..
# AVX2
cmake -DTARGET_ARCH=x64-avx2 -DCMAKE_BUILD_TYPE=Final ..
# SSE4-POPCNT
cmake -DTARGET_ARCH=x64-sse4-popcnt -DCMAKE_BUILD_TYPE=Final ..
# Legacy (fallback)
cmake -DTARGET_ARCH=x64-legacy -DCMAKE_BUILD_TYPE=Final ..

Windows

  1. Run GenerateVisualStudioSolution.bat to generate the Visual Studio solution
  2. Open build_<arch>/caissa.sln in Visual Studio 2022
  3. Select the desired configuration (Debug/Release/Final)
  4. Build the solution (Ctrl+Shift+B)

Note: Visual Studio 2022 is the only tested version. CMake directly in Visual Studio has not been tested.

Post-Compilation

After compilation, copy the appropriate neural network file from data/neuralNets/ to:

  • Linux: build/bin/
  • Windows: build\bin\x64\<Configuration>\

Architecture Variants

Variant CPU Requirements Performance Recommended For
AVX-512 AVX-512 instruction set Fastest Latest Intel Xeon, AMD EPYC
BMI2 AVX2 + BMI2 Fast Most modern CPUs (2015+)
AVX2 AVX2 instruction set Fast Intel Haswell, AMD Ryzen
POPCNT SSE4.2 + POPCNT Moderate Older CPUs (2008-2014)
Legacy x64 only Slowest Very old x64 CPUs

Tip: If unsure, try BMI2 first. It's supported by most modern CPUs and offers excellent performance.

UCI Options

The engine supports the following UCI options:

Search Options

  • Hash (int) - Transposition table size in megabytes
  • Threads (int) - Number of search threads
  • MultiPV (int) - Number of principal variation lines to search
  • Ponder (bool) - Enable pondering mode

Time Management

  • MoveOverhead (int) - Move overhead in milliseconds (increase if engine loses time)

Evaluation

  • EvalFile (string) - Path to neural network evaluation file (.pnn)
  • EvalRandomization (int) - Evaluation randomization range (weakens engine, introduces non-determinism)

Tablebases

  • SyzygyPath (string) - Semicolon-separated paths to Syzygy tablebases
  • SyzygyProbeLimit (int) - Maximum number of pieces for tablebase probing

Display Options

  • UCI_AnalyseMode (bool) - Analysis mode (full PV lines, no depth constraints)
  • UCI_Chess960 (bool) - Enable Chess960 mode (castling as "king captures rook")
  • UCI_ShowWDL (bool) - Show win/draw/loss probabilities with evaluation
  • UseSAN (bool) - Use Standard Algebraic Notation (FIDE standard)
  • ColorConsoleOutput (bool) - Enable colored console output

History & Originality

Caissa has been written from the ground up since early 2021. The development journey:

  1. Early Versions - Used simple PeSTO evaluation
  2. Version 0.6 - Temporarily used Stockfish NNUE
  3. Version 0.7+ - Custom neural network evaluation system

Neural Network Development

The engine's neural network has evolved significantly:

  • Initial Network: Based on Stockfish's architecture, trained on a few million positions
  • Current Network (v1.24+): Trained on 17+ billion positions from self-play
  • Progressive Training: Older games are purged, ensuring networks are trained only on the latest engine versions

Key Components

  • Runtime Evaluation: PackedNeuralNetwork.cpp

    • Inspired by nnue.md
    • Highly optimized with manual SIMD vectorization
  • Network Trainer: NetworkTrainer.cpp, NeuralNetwork.cpp

    • Written completely from scratch
    • CPU-based, heavily optimized with AVX and multithreading
    • Exploits network sparsity for performance
  • Self-Play Generator: SelfPlay.cpp

    • Generates games with fixed nodes/depth
    • Custom binary format for efficient storage
    • Uses Stefan's Pohl UHO books or DFRC openings

Project Structure

The project is organized into three main modules:

src/
├── backend/     # Core engine library
│   ├── Search.*           # Search algorithms
│   ├── Position.*         # Position representation
│   ├── MoveGen.*          # Move generation
│   ├── PackedNeuralNetwork.*  # Neural network evaluation
│   ├── TranspositionTable.*   # Position caching
│   └── ...
│
├── frontend/    # UCI interface executable
│   ├── Main.cpp           # Entry point
│   └── UCI.*              # UCI protocol implementation
│
└── utils/       # Development and training tools
    ├── NetworkTrainer.*    # Neural network training
    ├── SelfPlay.*          # Self-play game generation
    ├── Tests.*            # Unit tests
    └── ...

Module Descriptions

  • backend (library) - Engine core: search, evaluation, move generation, position management
  • frontend (executable) - UCI wrapper providing command-line interface
  • utils (executable) - Utilities: network trainer, self-play generator, unit tests, performance tests

License

This project is licensed under the MIT License - see the LICENSE file for details.


Author: Michał Witanowski
Started: Early 2021
Language: C++20
License: MIT