Tic-tac-toe (American English), noughts and crosses (British English), or Xs and Os is a paper-and-pencil game for two players, X and O, who take turns marking the spaces in a 3×3 grid. The player who succeeds in placing three of their marks in a horizontal, vertical, or diagonal row wins the game. (Wiki)
In practice, tabular Q learning is a better method to train a tic-tac-toe agent due to the limited number of states and the stablity of tabular Q learning. However, this repo aims to build a deep Q learning agent for this game.
Required package:
- numpy
- keras
You may also install them by python install -r requirements.txt
Before training the model, you may open train_rl.py to change model config. Then simply python train_rl.py to start the training.
player_name: str, feel free to make a cool name :)batch_size: int, batch sizelearning_rate: float, learning rateini_epsilon: float, initial epsilonepsilon_decay: float, every episode the current epsilon will time this factorepsilon_min: float, minimum epsilongamma: float, reward discounthidden_layers_size: list of int, only relevent ifload_trained_model_pathis Noneis_double_dqns: bool, whether to use double dqnoptimizer: anything that is a Keras optimizer e.g. keras.optimizers.Adam(lr=0.0001)loss: anything that is a Keras lossload_trained_model_path: str or Noneis_train: set it to be true when trainingp2_player_type: str, 'random' or 'q_player'p2_load_trained_model_path: Ifp2_player_type = 'random, set it to None, else set it a str (saved model path)episode: int, episodememory_size: int, replay memory sizeepisode_switch_q_target: int, every a number of episode, copy q_value model parameters to q_targetis_special_sample: bool, if True, focus on sampling terminal state when deciding training batchsave_model_path: str, path to save the final trained modelwin_reward: floatlose_reward: floatdraw_reward: float