Generalized State‐Merging Framework

Generalized State Merging in the Red-Blue Framework

AALpy implements a general state-merging framework for passive automata learning via the GeneralizedStateMerging class. This framework follows the red-blue framework presented in the Abbadingo competition. The framework can be easily instantiated which streamlines implementing existing passive automata learning algorithms and decreases effort when experimenting with novel ideas in the field. Due to the general nature of the framework it can be used to implement learning algorithms for many different automaton types ranging from DFAs to (observably nondeterministic) stochastic Mealy machines. Examples showing how this framework is used can be found in the files Examples.py and GsmAlgorithms.py.

Predefined Algorithms

AALpy provides a few algorithms based on its generalized state-merging framework that can be used out of the box (see GsmAlgorithms.py). This includes EDSM (run_EDSM), k-Tails (run_k_tails) and IOAlergia with EDSM (run_Alergia_EDSM).

Support for different Automaton Types

We categorize automaton types using two orthogonal categorization schemes:

Output Behavior:
- Moore Behavior: Outputs are associated with states
- Mealy Behavior: Outputs are associated with transitions
Transition Behavior:
- Deterministic: Transitions are uniquely determined by their source state and the applied input. Consequently, applying a sequence of inputs in the initial state will always give the same sequence of outputs.
- Observably Nondeterministic: Transitions are uniquely determined by their source state, the applied input and the observed output. Given a state and an input symbol, several transitions might exist and no knowledge is available about how the actual transition is chosen, when applying an input in a given state. When applying a sequence of inputs in the initial state, the reached (intermediate) states are uniquely determined by the observed outputs.
- Stochastic: Similar to observably nondeterministic behavior except that transitions are taken according to a probability distribution.

The table below shows the corresponding classes used in AALpy for encoding the six different automaton types resulting from this categorization.

	Deterministic	Nondeterministic	Stochastic
Moore	`MooreMachine`	`NDMooreMachine`	`Mdp`
Mealy	`MealyMachine`	`Onfsm`	`StochasticMealyMachine`

Running Algorithms

Once an algorithm is defined using an instance of the GeneralizedStateMerging class, it can be run on data using the instance run(data) method.

Data Formats

The framework supports three kinds of data formats:

Input-output traces: Each element of the training data is a full IO trace. For automata with Moore-like output behavior, this is a sequence where the first element is the initial output and each subsequent element is an input-output pair, i.e., [[o0, (i1, o1), (i2, o2), ...], ...]. For Mealy-like output behavior, each element of the training data is a sequence of input-output pairs, i.e., [[(i0, o0), (i1, o1), (i2, o2), ...], ...].
Labeled traces: Each element of the training data is a tuple consisting of a sequence of inputs followed by a single output, i.e., [([i0, i1, i2, ...], o), ...]. Data in this format should only be provided when learning with deterministic transition behavior.
Prefix Tree Automata (PTA): The training data can be also provided as a pre-computed PTA, which has to be given as a tree-shaped automaton using the internal representation (GsmNode class).

The used data format can be specified using the data_format argument of the run method, which should be set to "io_traces", "labeled_sequences" or "tree" respectively.

Instantiating Algorithms

New algorithms can instantiated using the GeneralizedStateMerging class. The parameters output_behavior and transition_behavior govern the returned automaton type as outlined above, using the values "moore" and "mealy" for output behavior, as well as the values "deterministic", "nondeterministic" and "stochastic" for transition behavior. The most important parameter is score_calc, which governs important notions such as compatibility of states and scores of merge candidates. This behavior is encapsulated in a ScoreCalculation object, which provides two methods for compatibility and scoring respectively.

To understand compatibility and scoring, some knowledge about state-merging algorithms and the red-blue framework is required. In its default configuration, whenever our framework considers two states for merging, it creates a partitioning of states that is implied by merging the original two states. During the creation of this partitioning, score_calc.local_compatibility(a, b) is used to assess the local compatibility of two states a and b, where a represents a partition in which b is to be inserted. The states are represented using the internal GsmNode class. If this predicate returns False for any state pair encountered during partition creation, the original merge candidate is rejected immediately. After the partitioning is complete, score_calc.score_function(partitioning) is used to assign a score to the resulting partitioning, which is a dictionary mapping each merged state to the partition it was merged into. Scores are used to rank potential merge candidates. The values True and False are used to immediately accept or reject a potential merge candidate without further consideration.

Apart from the previously mentioned parameters, the GeneralizedStateMerging class also provides several other parameters that change the algorithm's behavior, such as pre- and post-processing, details in compatibility checking and the order in which merge candidates are considered.