-
Notifications
You must be signed in to change notification settings - Fork 32
Generalized State‐Merging Framework
AALpy implements a general state-merging framework for passive automata learning via the GeneralizedStateMerging
class. This framework follows the red-blue framework presented in the Abbadingo competition. The framework can be easily instantiated which streamlines implementing existing passive automata learning algorithms and decreases effort when experimenting with novel ideas in the field. Due to the general nature of the framework it can be used to implement learning algorithms for many different automaton types ranging from DFAs to (observably nondeterministic) stochastic Mealy machines. Examples showing how this framework is used can be found in the files Examples.py and GsmAlgorithms.py.
AALpy provides a few algorithms based on its generalized state-merging framework that can be used out of the box (see GsmAlgorithms.py). This includes EDSM (run_EDSM
), k-Tails (run_k_tails
) and IOAlergia with EDSM (run_Alergia_EDSM
).
We categorize automaton types using two orthogonal categorization schemes:
- Output Behavior:
- Moore Behavior: Outputs are associated with states
- Mealy Behavior: Outputs are associated with transitions
- Transition Behavior:
- Deterministic: Transitions are uniquely determined by their source state and the applied input. Consequently, applying a sequence of inputs in the initial state will always give the same sequence of outputs.
- Observably Nondeterministic: Transitions are uniquely determined by their source state, the applied input and the observed output. Given a state and an input symbol, several transitions might exist and no knowledge is available about how the actual transition is chosen, when applying an input in a given state. When applying a sequence of inputs in the initial state, the reached (intermediate) states are uniquely determined by the observed outputs.
- Stochastic: Similar to observably nondeterministic behavior except that transitions are taken according to a probability distribution.
The table below shows the corresponding classes used in AALpy for encoding the six different automaton types resulting from this categorization.
Deterministic | Nondeterministic | Stochastic | |
---|---|---|---|
Moore | MooreMachine |
NDMooreMachine |
Mdp |
Mealy | MealyMachine |
Onfsm |
StochasticMealyMachine |
Once an algorithm is defined using an instance of the GeneralizedStateMerging
class, it can be run on data
using the instance run(data)
method.
The framework supports three kinds of data formats:
- Input-output traces: Each element of the training data is a full IO trace. For automata with Moore-like output behavior, this is a sequence where the first element is the initial output and each subsequent element is an input-output pair, i.e.,
[[o0, (i1, o1), (i2, o2), ...], ...]
. For Mealy-like output behavior, each element of the training data is a sequence of input-output pairs, i.e.,[[(i0, o0), (i1, o1), (i2, o2), ...], ...]
. - Labeled traces: Each element of the training data is a tuple consisting of a sequence of inputs followed by a single output, i.e.,
[([i0, i1, i2, ...], o), ...]
. Data in this format should only be provided when learning with deterministic transition behavior. - Prefix Tree Automata (PTA): The training data can be also provided as a pre-computed PTA, which has to be given as a tree-shaped automaton using the internal representation (
GsmNode
class).
The used data format can be specified using the data_format
argument of the run
method, which should be set to "io_traces"
, "labeled_sequences"
or "tree"
respectively.
New algorithms can instantiated using the GeneralizedStateMerging
class. The parameters output_behavior
and transition_behavior
govern the returned automaton type as outlined above, using the values "moore"
and "mealy"
for output behavior, as well as the values "deterministic"
, "nondeterministic"
and "stochastic"
for transition behavior. The most important parameter is score_calc
, which governs important notions such as compatibility of states and scores of merge candidates. This behavior is encapsulated in a ScoreCalculation
object, which provides two methods for compatibility and scoring respectively.
To understand compatibility and scoring, some knowledge about state-merging algorithms and the red-blue framework is required. In its default configuration, whenever our framework considers two states for merging, it creates a partitioning of states that is implied by merging the original two states. During the creation of this partitioning, score_calc.local_compatibility(a, b)
is used to assess the local compatibility of two states a
and b
, where a
represents a partition in which b
is to be inserted. The states are represented using the internal GsmNode
class. If this predicate returns False
for any state pair encountered during partition creation, the original merge candidate is rejected immediately. After the partitioning is complete, score_calc.score_function(partitioning)
is used to assign a score to the resulting partitioning
, which is a dictionary mapping each merged state to the partition it was merged into. Scores are used to rank potential merge candidates. The values True
and False
are used to immediately accept or reject a potential merge candidate without further consideration.
Apart from the previously mentioned parameters, the GeneralizedStateMerging
class also provides several other parameters that change the algorithm's behavior, such as pre- and post-processing, details in compatibility checking and the order in which merge candidates are considered.