-
Notifications
You must be signed in to change notification settings - Fork 1
Wait, what?
The entire system was programmed in Python 2.7 and used The Pac-Man AI Projects, by UC Berkeley, as the game simulator.
Several modules compose the system, which are presented below with their defined roles.
The controller script implements logic to control learning and action selection for each agent by receiving simulator messages, routing them to the appropriate agent, and sending agents' actions to the simulator. In order to communicate with the simulator module via messages, the script instantiates a server object.
The simulator script executes the Pac-Man simulator, extracts the state from the current game state, communicates the current state with the controller process through a client instance, receives actions from controller, and save experiment results. This script is tightly coupled to the Pac-Man simulator and must be modified to use this project in new scenarios.
The agents module contains agents implementation for action selection. By implementing the choose_action method, the agent instance must return a valid action according to its execution environment.
The module also contains behaviors, which are pre-defined reactive action selection processes. For instance, the flee behavior always select the action that moves the agent away from its enemies. On the other hand, a random behavior, such as presented below, randomly selects any legal action for the given state.
An agent can, therefore, use behaviors to select actions and even learn to select the most appropriate behavior for the given state.
class RandomBehavior(object):
def __call__(self, state, legal_actions):
return random.choice(legal_actions)
class BehaviorAgent(object):
def __init__(self):
self.behavior = RandomBehavior()
def choose_action(self, state, legal_actions):
self.behavior(state, legal_actions)The learning module stores general-purpose reinforcement learning algoriths. Every RL algorithm must inherit from the LearningAlgorithm class and implement two methods:
-
learn(self, state, action, reward): Adapts according to the current state representation, the last performed action, and a numerical reward value. -
act(self, state): Selects an action for the current state.
The communication module implement two classes: Server and Client. By using the ZeroMQ package, client-server architecture is easily incorporated into the decision process cycle using recv and send methods to receive and send strings.
A server, configured with TCP/IP address, may receive and answer toany number of clients messages. However, a client can only connect to a single server. Due to a ZeroMQ restriction, in this architecture, the client must send a message first and, in sequence, receive a server reply. Should the server not be able to reply the client, communication is lost.
The following code implements a client-server architecture where the client sends Client data and the server replies Server data:
# Server-side script
import communication as comm
server = comm.Server()
recv_data = server.recv()
print 'Received "{}"'.format(recv_data)
send_data = 'Server data'
server.send(send_data)
print 'Sent "{}"'.format(send_data)# Client-side script
import communication as comm
client = comm.Client()
send_data = 'Client data'
client.send(send_data)
print 'Sent "{}"'.format(send_data)
recv_data = client.recv()
print 'Received "{}"'.format(recv_data)Server output:
Received "Client data"
Sent "Server data"
Client output:
Sent "Client data"
Received "Server data"
The messages module stores all kinds of messages used in the Pac-Man application. All messages inherit from BaseMessage and have a respective type.
For instance, AckMessage is used to communicate the server received the client message but has no special reply.
ACK = 'Ack'
class AckMessage(BaseMessage):
def __init__(self):
super(AckMessage, self).__init__(msg_type=ACK)The state module contains the GameState class, which holds information about the Pac-Man simulation current state.
In order to incorporate stochastic information, the Map class stores probabilities in each cell and allow Bayesian approaches with observe and predict methods, according to Bayesian Programming theory. observe incorporates new measurements into the map probabilities, whereas predict updates the probability without using sensor measurements. Map also implements access graphs to take obstacles into account when predicting movements and calculating distances.
The plot script allows visualization for simulation data. It plots the scores, probability of selecting each behavior, and game duration.