Skyjo Reinforcement Learning Implementation

Important note: This project is still in work and does not contain the optimal RL implementation. It may also have some bugs in regards to the game mechanics!

An advanced implementation of a reinforcement learning agent for the card game Skyjo using PPO (Proximal Policy Optimization) with action masking and sophisticated state representation.

Technical Overview

This project implements a complete Skyjo game environment with multiple agent types, focusing on a reinforcement learning approach using the PPO algorithm. The implementation features a custom OpenAI Gym environment, action masking for valid move enforcement, and detailed state representation.

Architecture

Game Environment (`environment.py`)

Core game logic implementation
Manages game state transitions
Handles player actions and card operations
Implements state tracking and validation

RL Environment (`RLEnvironment` class)

The RL environment extends OpenAI Gym and implements:

State Space

Field representation: 12x18 one-hot encoded matrix
- Encodes card values (-2 to 12)
- Special symbols (♦ for stars)
- Hidden/visible card states
Last action encoding: 5-dimensional one-hot vector Total observation space: Box(low=0, high=1, shape=(12 * 18 + 5,), dtype=float32)

Action Space

16 discrete actions combining:
- Card positions (0-11)
- Game actions (pull, discard, change)
Action masking ensures only valid moves

Reward Structure

Multiple reward components:

Point-based rewards

if current_points[self.rl_name] < self.points_in_running[-2]:
    reward += 1
if current_points[self.rl_name] < min(self.points_in_running):
    reward += 2

Normalized score rewards

reward += (1 - self.theoretical_normalize(current_points[self.rl_name])) * 2

Line completion rewards

Additional rewards for completing rows/columns
Matching card combinations

Agent Types

1. RL Agent (`rl_agent.py`)

Implements PPO with action masking
Uses neural network policy
Maintains game state history

2. Simple Reflex Agent (`simple_reflex_agent.py`)

Rule-based decision making
Card value thresholds
Position-based strategies

3. Random Agent

Baseline implementation
Random action selection
Used for comparison

Game Components

Card Deck (`carddeck.py`)

Manages card distribution
Card value mappings
Special card handling (stars)

Game Field (`gamefield.py`)

Grid-based card layout
Line completion detection
Score calculation

Implementation Details

State Representation

The state is represented as a combination of:

self.observation_space = gym.spaces.Box(
    low=0, 
    high=1, 
    shape=(12 * 18 + 5,), 
    dtype=np.float32
)

Action Masking

Implemented using MaskablePPO from Stable-Baselines3:

def action_masks(self):
    legal_actions = self._legal_actions()
    action_mask = [False] * 16
    # Mask based on game state
    if self.game_state == "running":
        if last_action == "pull deck":
            action_mask[12:14] = [True, True]
    return np.array(action_mask, dtype=bool)

Training Configuration

model = MaskablePPO(
    "MlpPolicy", 
    rl_env, 
    verbose=1, 
    device='cuda', 
    learning_rate=0.0001
)
model.learn(
    total_timesteps=100000, 
    progress_bar=True, 
    callback=logging_callback
)

Performance Monitoring

Episode rewards tracking
Wrong action counting
Performance visualization
Learning curves

Installation and Setup

Clone the repository
Install dependencies:

pip install torch gymnasium stable-baselines3 numpy matplotlib

Usage Example

# Initialize environment
carddeck = Carddeck()
agent = RLAgent("RLBOT", carddeck, (4, 3))
gamefield = GameField(4, 3, [agent], carddeck)
env = Environment(gamefield)
rl_env = RLEnvironment(env)

# Training with monitoring
logging_callback = LoggingCallback()
model = MaskablePPO("MlpPolicy", rl_env, verbose=1, device='cuda')
model.learn(total_timesteps=100000, callback=logging_callback)

# Visualize results
plt.plot(rl_env.points, "-o", label="Points of agent")
plt.title("Points over time")
plt.grid()
plt.show()

Key Features

1. Sophisticated State Management

Complete game state tracking
Efficient state updates
Memory-optimized representations

2. Action Validation

Comprehensive move validation
Legal action enforcement
Game rule compliance

3. Reward Engineering

Multi-component reward system
Progress-based incentives
Strategic play encouragement

4. Performance Optimization

GPU acceleration support
Efficient state representations
Optimized action masking

Future Developments

Advanced Features

Multi-agent training capabilities
Self-play implementation
Advanced reward shaping

Optimizations

Enhanced state representation
Improved action space efficiency
Advanced policy architectures

Contributing

Contributions are welcome! Please refer to the contribution guidelines for more information.

License

MIT License - see LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.idea		.idea
Game		Game
agents		agents
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
main.py		main.py
probs_randomagent.png		probs_randomagent.png
results.txt		results.txt
thoughts.txt		thoughts.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Skyjo Reinforcement Learning Implementation

Technical Overview

Architecture

Game Environment (`environment.py`)

RL Environment (`RLEnvironment` class)

State Space

Action Space

Reward Structure

Agent Types

1. RL Agent (`rl_agent.py`)

2. Simple Reflex Agent (`simple_reflex_agent.py`)

3. Random Agent

Game Components

Card Deck (`carddeck.py`)

Game Field (`gamefield.py`)

Implementation Details

State Representation

Action Masking

Training Configuration

Performance Monitoring

Installation and Setup

Usage Example

Key Features

1. Sophisticated State Management

2. Action Validation

3. Reward Engineering

4. Performance Optimization

Future Developments

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Nicolas2912/Skyjo

Folders and files

Latest commit

History

Repository files navigation

Skyjo Reinforcement Learning Implementation

Technical Overview

Architecture

Game Environment (environment.py)

RL Environment (RLEnvironment class)

State Space

Action Space

Reward Structure

Agent Types

1. RL Agent (rl_agent.py)

2. Simple Reflex Agent (simple_reflex_agent.py)

3. Random Agent

Game Components

Card Deck (carddeck.py)

Game Field (gamefield.py)

Implementation Details

State Representation

Action Masking

Training Configuration

Performance Monitoring

Installation and Setup

Usage Example

Key Features

1. Sophisticated State Management

2. Action Validation

3. Reward Engineering

4. Performance Optimization

Future Developments

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Game Environment (`environment.py`)

RL Environment (`RLEnvironment` class)

1. RL Agent (`rl_agent.py`)

2. Simple Reflex Agent (`simple_reflex_agent.py`)

Card Deck (`carddeck.py`)

Game Field (`gamefield.py`)

Packages