SUL Interface, or How to Learn Your Systems

SUL Interface

The system under learning (SUL) is a basic abstract class that all systems that you would like to learn have to implement. When inferring loaded or randomly generated automata, one only has to pass it to the appropriate SUL found in aalpy/SULs/AutomataSUL.py. However, if you would like to learn any other system, you have to implement a system reset and step method.

Reset is divided in

pre() : initialization/startup of the system
post() : graceful shutdown of the system/memory cleanup

The step method received an element of the input alphabet. Based on the received element certain action is performed. Membership query is a sequence of steps.

For further examples on how to implement SUL refer to: aalpy/SULs/

Input Alphabet

Input alphabet is a list of any hashable object, most often strings or integers. In the step method of the SUL interface receives an element from the input alphabet and performs a system action based on it.

From standard automatons (loaded from file/randomly generated), input alphabet can be obtained with:

automaton.get_input_alphabet()

To see how class methods and their arguments can be used as the elements of the input alphabet, check out aalpy/SULs/PyMethodSUL and learnPythonClass in Examples.py.

Examples

Example - RegexSUL

Suppose that you would like to learn a regular expression. More precisely, you would like to learn a grammar that accepts or rejects the same language as the regular expression.

In Python, to check if the regular expression is matched one can use re module and its match method. So how do we go from re.match(some_regex, some_string) to the SUL that can be used to learn a DFA that represents such regex?

Let us examine the following code snippet that achieves such a task. In the constructor of the RegexSUL we pass a regular expression we would like to learn.

In pre method, we initialize the system. In our case, we will simply reset the string that we will use to form membership queries to the empty string.

In the step method, an element of the input alphabet, denoted by 'letter' will be received as an argument. When learning DFAs and Moore machines, we have to ensure that we account for the None input. For Mealy machines, non-deterministic or stochastic formalisms None will never be passed to the step method, unless it is implicitly declared in the input alphabet.

In our example, we will simply append an element of the input alphabet, a character, to the self. string and return whether the resulting string is accepted by the regular expression. Step method is called for all inputs in the membership query.

import re

class RegexSUL(SUL):
    """
    An example implementation of a system under learning that can be used to learn any regex expression.
    Note that the $ is added to the expression as in this SUL only exact matches are learned.
    """
    def __init__(self, regex: str):
        super().__init__()
        self.regex = regex if regex[-1] == '$' else regex + '$'
        self.string = ""

    def pre(self):
        self.string = ""
        pass

    def post(self):

        pass

    def step(self, letter):
        if letter is not None:
            self.string += str(letter)
        return True if re.match(self.regex, self.string) else False

Once such SUL is implemented, let us connect learn a concrete regular expression. Lets define random regex over ['a', 'b', 'c'] input alphabet. Furthermore we use StatePrefixEqOracle, as it ensures good test coverage.

regex = 'abc(b|c)+(ab|c)*'
alphabet = ['a', 'b', 'c']

regex_sul = RegexSUL(regex)

eq_oracle = StatePrefixEqOracle(alphabet, regex_sul, walks_per_state=100,
                                walk_len=20)

learned_regex = run_Lstar(alphabet, regex_sul, eq_oracle, automaton_type='dfa', print_level=2)

visualize_automaton(learned_regex)

When we run the presented example, our result is:

Hypothesis 1 has 1 states.
Hypothesis 2 has 7 states.
Hypothesis 3 has 8 states.
-----------------------------------
Learning Finished.
Learning Rounds:  3
Number of states: 8
Time (in seconds)
  Total                : 0.05
  Learning algorithm   : 0.0
  Conformance checking : 0.05
Learning Algorithm
 # Membership Queries  : 85
 # MQ Saved by Caching : 120
 # Steps               : 582
Equivalence Query
 # Membership Queries  : 800
 # Steps               : 18173
-----------------------------------

and

Example - MQTT

Suppose that you would like to learn an MQTT protocol. What would be the input alphabet, and how would you implement the SUL interface? You downloaded some clients, and want to learn the behavior of the broker. Usual client setup would have the following signature:

class MockMqttExample:

    def __init__(self):
        self.state = 'CONCLOSED'
        self.topics = set()

    def subscribe(self, topic: str):
        ...

    def unsubscribe(self, topic):
        ...

    def connect(self):
        ...

    def disconnect(self):
        ...

    def publish(self, topic):
        ...

We can approach setting up the SUL in two ways.

Standard Way

Usually, we would define our input alphabet to be the abstract representation of possible actions. Input alphabet for this experiment would be: {connect, disconnect, publish, subscribe, unsubscribe}. Each element of the abstract input alphabet will be mapped to the concrete method. For example:

'connect' -> self.connect()
'publish' -> self.publish(topic='testTopic')
...

Complete learning procedure would then look like this:

from aalpy.base import SUL
from aalpy.oracles import RandomWalkEqOracle
from aalpy.learning_algs import run_Lstar
from aalpy.utils import visualize_automaton

class MQTT_SUL(SUL):
    def __init__(self):
        super().__init__()
        self.mqtt = MockMqttExample()

        self.abstract_to_concrete_mapping = {
            'connect': self.mqtt.connect,
            'disconnect': self.mqtt.disconnect,
            'publish': self.mqtt.publish,
            'subscribe': self.mqtt.subscribe,
            'unsubscribe': self.mqtt.unsubscribe
        }

    def pre(self):
        self.mqtt.state = 'CONCLOSED'

    def post(self):
        self.mqtt.topics.clear()

    def step(self, letter):
        if letter in {"publish", "subscribe", "unsubscribe"}:
            return self.abstract_to_concrete_mapping[letter](topic='testTopic')
        return self.abstract_to_concrete_mapping[letter]()

sul = MQTT_SUL()
input_al = list(sul.abstract_to_concrete_mapping.keys())

eq_oracle = RandomWalkEqOracle(input_al, sul, num_steps=2000, reset_after_cex=True, reset_prob=0.15)

learned_mqtt= run_Lstar(input_al, sul, eq_oracle=eq_oracle, automaton_type='mealy', cache_and_non_det_check=True,
                  print_level=2)

visualize_automaton(learned_mqtt)

When run, following output is observed:

Hypothesis 1 has 3 states.
-----------------------------------
Learning Finished.
Learning Rounds:  1
Number of states: 3
Time (in seconds)
  Total                : 0.01
  Learning algorithm   : 0.01
  Conformance checking : 0.0
Learning Algorithm
 # Membership Queries  : 75
 # MQ Saved by Caching : 5
 # Steps               : 225
Equivalence Query
 # Membership Queries  : 324
 # Steps               : 2000
-----------------------------------
Visualized model saved to Model_Visualization.pdf.

and

Learning Python Class

Let us now consider this example as a challenge where we would like to learn the behavior of a Python class. In this case, we would simply wrap class methods and their arguments in the FunctionDecorator and use PyMethodSUL. Reset is then implemented by reinitializing the class.

from aalpy.SULs import PyMethodSUL, FunctionDecorator
from aalpy.oracles import RandomWalkEqOracle
from aalpy.learning_algs import run_Lstar
from aalpy.utils import visualize_automaton

# class 
mqtt = MockMqttExample

# methods weapped in the function decorators
input_al = [FunctionDecorator(mqtt.connect), FunctionDecorator(mqtt.disconnect),
            FunctionDecorator(mqtt.subscribe, 'topic'), FunctionDecorator(mqtt.unsubscribe, 'topic'),
            FunctionDecorator(mqtt.publish, 'topic')]

# SUL
sul = PyClassSUL(mqtt)

eq_oracle = RandomWalkEqOracle(input_al, sul, num_steps=2000, reset_after_cex=True, reset_prob=0.15)

learned_mqtt = run_Lstar(input_al, sul, eq_oracle=eq_oracle, automaton_type='mealy', cache_and_non_det_check=True)

visualize_automaton(learned_mqtt)