-
Notifications
You must be signed in to change notification settings - Fork 32
SUL Interface, or How to Learn Your Systems
The system under learning (SUL) is a basic abstract class that all systems that you want to learn have to implement. When inferring loaded or randomly generated automata, one only has to pass it to the appropriate SUL found in aalpy/SULs/AutomataSUL.py. However, if you want to learn any other system, you have to implement a system reset and step method.
Reset is divided in
- pre() : initialization/startup of the system
- post() : gracefull shutdown of the system/memory cleanup
The step method received an element of the input alphabet. Based on the received element certain action is performed. Membership query is a sequence of steps.
For further examples on how to implement SUL refer to: aalpy/SULs/
Suppose that you want learn a regular expression. More precisely, you want to learn a grammar that accepts or rejects the same language as the regular expression.
In Python, to check if the regular expression is matched one can use re
module and its match
method.
So how do we go from re.match(some_regex, some_string)
to the SUL that can be used to learn a DFA that represents such regex?
Let us examine the following code snippet that achieves such a task. In the constructor of the RegexSUL we pass a regular expression we want to learn.
In pre
method, we initialize the system. In our case, we will simply reset the string that we will use to form membership queries to the empty string.
In the step method, an element of the input alphabet, denoted by 'letter' will be received as an argument. When learning DFAs and Moore machines, we have to ensure that we account for the None
input. For Mealy machines, non-deterministic or stochastic formalisms None
will never be passed to the step method, unless it is implicitly declared in the input alphabet.
In our example, we will simply append an element of the input alphabet, a character, to the self. string
and return whether the resulting string is accepted by the regular expression.
Step method is called for all inputs in the membership query.
import re
class RegexSUL(SUL):
"""
An example implementation of a system under learning that can be used to learn any regex expression.
Note that the $ is added to the expression as in this SUL only exact matches are learned.
"""
def __init__(self, regex: str):
super().__init__()
self.regex = regex if regex[-1] == '$' else regex + '$'
self.string = ""
def pre(self):
self.string = ""
pass
def post(self):
pass
def step(self, letter):
if letter is not None:
self.string += str(letter)
return True if re.match(self.regex, self.string) else False
Once such SUL is implemented, let us connect learn a concrete regular expression. Lets define random regex over ['a', 'b', 'c'] input alphabet. Furthermore we use StatePrefixEqOracle, as it ensures good test coverage.
regex = 'abc(b|c)+(ab|c)*'
alphabet = ['a', 'b', 'c']
regex_sul = RegexSUL(regex)
eq_oracle = StatePrefixEqOracle(alphabet, regex_sul, walks_per_state=100,
walk_len=20)
learned_regex = run_Lstar(alphabet, regex_sul, eq_oracle, automaton_type='dfa', print_level=2)
visualize_automaton(learned_regex)
When we run the presented example, our result is:
Hypothesis 1 has 1 states.
Hypothesis 2 has 7 states.
Hypothesis 3 has 8 states.
-----------------------------------
Learning Finished.
Learning Rounds: 3
Number of states: 8
Time (in seconds)
Total : 0.05
Learning algorithm : 0.0
Conformance checking : 0.05
Learning Algorithm
# Membership Queries : 85
# MQ Saved by Caching : 120
# Steps : 582
Equivalence Query
# Membership Queries : 800
# Steps : 18173
-----------------------------------
and
Suppose now that you want to learn a MQTT protocol. What would be the input alphabet, and how would you implement the SUL interface? You downloaded some client, and want to learn the behavior of the broker.