Skip to content

Component_1 (Advanced Metrics Extractor)

LeoCal4 edited this page Dec 15, 2018 · 47 revisions

Advanced Metrics Extractor

Component 1 ( Advanced Metrics Extractor from now on) is used in order to extract (more or less) complex information from the input-given BPMN model. For the project's scope, a metric is defined as advanced if it is derived by computations and/or aggregation of basic metrics and on the model's elements.

This type of metrics is useful to have an in-depth analysis of the model, exposing a wide variety of informations, ranging from graphs theories related metrics to proportions between model elements.

For this, and beacuse all of those metrics are derived from the works of BPM experts from all over the world, we will be dividing them according to their author.

Metrics from "Applying software metrics to evaluate business process models"

by Rolón E, Ruiz F, García F, Piattini M (2006)

(Rolón E, Ruiz F, García F, Piattini M (2006) Applying software metrics to evaluate business process models) Some of Rolón's Metrics are the same as some of the metrics extracted by the Basic Metrics Extractor, but, for the sake of completeness, we decided to refer them anyway. There's not much to say about them, as they are already self explanatory.

  • TNT: total number of Tasks
  • TNCS: total number of Collapsed Subprocess
  • TNA: total number of Activities
  • TNDO: total number of Data Objects
  • TNG: total number of Gateways
  • TNEE: total number of End Events
  • TNIE: total number of Intermediate Events
  • TNSE: total number of Start Events
  • TNE: total number of Events
  • TNSF: total number of Sequence Flows

Two of the Rolón's metrics measure the connectivity level of precise elements in the model, in particular activities and partecipants (pools). This value is given by the division between the number of the elements and the number of every flows that connects them.

  • CLA: connectivity level between activities (TNA/NSFA)
  • CLP: connectivity level between partecipants (NMF/NP)

The last four of Rolón's metrics measure various kind of proportions between elements of the model.

  • PDOPin: proportion of data objects as incoming products and total data objects (NDOIn/TNDO)
  • PDOPout: proportion of data objects as outgoing products and total data objects (NDOOut/TNDO)
  • PDOTOut: proportion of data objects as outgoing product of activities of the model (NDOOut/TNT)
  • PLT: proportion of pools/lanes and activities (NL/TNT)

Metrics from "Control-flow complexity measurement of processes and weyuker’s properties"

by J. Cardoso (2007, doi = {10.1007/11837862_13})

Cardoso's first metric is the Control-Flow Complexity. It represents a weighted sum of all connectors that are used in a process model. In particular, every Exclusive (split) Gateway's value corresponds to the number of outgoing flows; every Inclusive (split) Gateway's value corresponds to 2^n - 1, where n is the number of outgoing flows; every Parallel (split) Gateway's value corresponds to 1. The other types of Gateway are not covered in the original source, so they haven't been considered. The complexity value affects the readbility, the maintanability, the reliability and other proprieties of the model.

  • CFC: control-flow Complexity

Metrics from "A Discourse on Complexity of Process Models"

by J. Cardoso, J. Mendling, G. Neumann, H.A. Reijers (2006, doi = {10.1007/11837862_13}), contained in Business Process Management Workshops by Johann Eder, Schahram Dustdar (Eds.)

Three other Cardoso's metrics are based on the number of Activities and Gateways in the model.

  • NOA: number of Activities
  • NOAC: number of Activities and Control-Flow
  • NOAJS: number of Activities, Joins and Splits

Three Cardoso's metrics are based on the works of Halstead, whose measures are among the most important in the field of software complexity. Those metrics are based on four values. We report their original meaning and the meaning in the BPMN field:

  • n1 = number of unique operators => number of unique activities and control-flow elements
  • n2 = number of unique operands => number of unique data variables
  • N1 = total number of operator occurrences => total number of activities and control-flow elements
  • N2 = total number of operand occurrences => total number of data variables

From those numbers, we can get to the Halstead-based Process Complexity (HPC) measures for process' length, volume and difficulty. They are calculated as follows:

  • Process Length: N = n1*log2(n1) + n2*log2(n2)
  • Process Volume: V = (N1+N2)*log2(n1+n2)
  • Process Difficulty: D = (n1/2)*(N2/n2)

Thus we get three metrics:

  • HPC_D: Halstead-based Process Complexity (process difficulty)
  • HPC_N: Halstead-based Process Complexity (process lenght)
  • HPC_V: Halstead-based Process Complexity (process volume)

The paper also discusses about a software complexity metric that is based on the impact of the information flow in a program’s structure. This is adapted to evaluate the complexity of processes in BPM, obtaining the Interface Complexity (IC), that is defined as:

IC = Length * (number of inputs * number of ouputs)^2

During the calculation of a software's complexity, length represents the number lines of code (LOC), and the number of inputs/outputs are represent the the flows of local information entering/leaving. For BPM models instead, the length of an activity is 1 if it is a black box, while it is represented by its LOC if it's a white box (we always consider activities as white box, so length is actually the same as the number of activities in the model): the fan-in/out are the number of Data Input/Output Associations. The four metrics that we obtain are:

  • NoI: number of Activities inputs (Fan-In)
  • NoO: number of Activities outputs (Fan-Out)
  • Lenght: Activities lenght (number of activities)
  • IC: Interface Complexity of Activities

The last metric discussed in the paper is the NOF, the number of archs present in the model.

  • NOF: number of Control Flow connections (number of archs)

Metrics from "Prediction Models for BPMN Usability and Maintainability"

by Rolón E, Sanchez L, Garcia F, Ruiz F, Piattini M, Caivano D, Visaggio G (2009, doi = {10.1007/11837862_13})

This paper presents the number of Sequence Flows metrics, that is equal to the NOF metric, and that is already extracted by the Basic Metrics Extractor.

  • TNSF: total number of Sequence Flows

Metrics from "On a quest for good process models: the cross-connectivity metric"

  • Vanderfeesten I, Reijers HA, Mendling J, van der Aalst WM, Cardoso J (2008, doi = {10.1007/978-3-540-69534-9_36})*

The Cross Connectivity metric is used to "measure the strength of the links between process model elements", so to measure the complexity of the mental operations that the reader of the model has to do in order to understand it. It is based on the "weakest-link metaphor", thus what counts the most it's the hardest part to understand in the model. A lower CC value means that they are more prone to include errore, because they are harder to understand. To get to this value, first we calculate the weight of every node in the model. Given the value d to represent the degree of the node (the number of incoming and outgoing flows of the node):

  • if the node is an Exclusive Gateway, its weight is 1/d;
  • if it's an Inclusive Gateway, its weight is (1 / 2^d - 1) + ((2^d - 2) / (2^d - 1)) * 1 / d;
  • otherwise, it is 1

The paper does not explicitly take into consideration every form of BPM Element, so we decided to give the type of nodes that were not nominated a weight of 1. After we got the weight of every node, we proceed to calculate the weight of the arcs. The weight of an arc is given by the product of the weight of his source node and the weight of his target node.

W(a) = w(src(a)) · w(dest(a))

With the weight of every arc in the model, we can obtain the value of every path. A path is the sequence of arcs that should be followed to get from a node n1 to a node n2. Its value is the product of the weights of every arc in the path.

v(p) = W(a1) ·W(a2) · ... ·W(ax)

The value of a connection between any given pair of node n1 and n2 is given by the maximum value of the set of paths from node n1 to node n2. In the case where the nodes are not connected, the value of the connection is 0.

V (n1, n2) = max[p∈Pn1,n2]v(p)

Eventually, with the values of the connections between every pair of nodes in the model, we can obtain the Cross-Connectivity value. It is defined as follow:

CC = Sum[n1,n2∈N]V(n1, n2) / (|N| · (|N| − 1))

  • CC: Cross-Connectivity

Metrics from " Quality metrics for business process modeling"

by Khlif W, Makni L, Zaaboub N, Ben-Abdallah H (2009)

Clone this wiki locally