Skip to content

Component_1 (Advanced Metrics Extractor)

LeoCal4 edited this page Dec 16, 2018 · 47 revisions

Advanced Metrics Extractor

Component 1 ( Advanced Metrics Extractor from now on) is used in order to extract (more or less) complex information from the input-given BPMN model. For the project's scope, a metric is defined as advanced if it is derived by computations and/or aggregation of basic metrics and on the model's elements.

This type of metrics is useful to have an in-depth analysis of the model, exposing a wide variety of informations, ranging from graphs theories related metrics to proportions between model elements.

For this, and beacuse all of those metrics are derived from the works of BPM experts from all over the world, we will be dividing them according to their author.

Metrics from "Applying software metrics to evaluate business process models"

by Rolón E, Ruiz F, García F, Piattini M (2006)

(Rolón E, Ruiz F, García F, Piattini M (2006) Applying software metrics to evaluate business process models) Some of Rolón's Metrics are the same as some of the metrics extracted by the Basic Metrics Extractor, but, for the sake of completeness, we decided to refer them anyway. There's not much to say about them, as they are already self explanatory.

  • TNT: total number of Tasks
  • TNCS: total number of Collapsed Subprocess
  • TNA: total number of Activities
  • TNDO: total number of Data Objects
  • TNG: total number of Gateways
  • TNEE: total number of End Events
  • TNIE: total number of Intermediate Events
  • TNSE: total number of Start Events
  • TNE: total number of Events
  • TNSF: total number of Sequence Flows

Two of the Rolón's metrics measure the connectivity level of precise elements in the model, in particular activities and partecipants (pools). This value is given by the division between the number of the elements and the number of every flows that connects them.

  • CLA: connectivity level between activities (TNA/NSFA)
  • CLP: connectivity level between partecipants (NMF/NP)

The last four of Rolón's metrics measure various kind of proportions between elements of the model.

  • PDOPin: proportion of data objects as incoming products and total data objects (NDOIn/TNDO)
  • PDOPout: proportion of data objects as outgoing products and total data objects (NDOOut/TNDO)
  • PDOTOut: proportion of data objects as outgoing product of activities of the model (NDOOut/TNT)
  • PLT: proportion of pools/lanes and activities (NL/TNT)

Metrics from "Control-flow complexity measurement of processes and weyuker’s properties"

by J. Cardoso (2007, doi = {10.1007/11837862_13})

Cardoso's first metric is the Control-Flow Complexity. It represents a weighted sum of all connectors that are used in a process model. In particular, every Exclusive (split) Gateway's value corresponds to the number of outgoing flows; every Inclusive (split) Gateway's value corresponds to 2^n - 1, where n is the number of outgoing flows; every Parallel (split) Gateway's value corresponds to 1. The other types of Gateway are not covered in the original source, so they haven't been considered. The complexity value affects the readbility, the maintanability, the reliability and other proprieties of the model.

  • CFC: control-flow Complexity

Metrics from "A Discourse on Complexity of Process Models"

by J. Cardoso, J. Mendling, G. Neumann, H.A. Reijers (2006, doi = {10.1007/11837862_13}), contained in Business Process Management Workshops by Johann Eder, Schahram Dustdar (Eds.)

Three other Cardoso's metrics are based on the number of Activities and Gateways in the model.

  • NOA: number of Activities
  • NOAC: number of Activities and Control-Flow
  • NOAJS: number of Activities, Joins and Splits

Three Cardoso's metrics are based on the works of Halstead, whose measures are among the most important in the field of software complexity. Those metrics are based on four values. We report their original meaning and the meaning in the BPMN field:

  • n1 = number of unique operators => number of unique activities and control-flow elements
  • n2 = number of unique operands => number of unique data variables
  • N1 = total number of operator occurrences => total number of activities and control-flow elements
  • N2 = total number of operand occurrences => total number of data variables

From those numbers, we can get to the Halstead-based Process Complexity (HPC) measures for process' length, volume and difficulty. They are calculated as follows:

  • Process Length: N = n1*log2(n1) + n2*log2(n2)
  • Process Volume: V = (N1+N2)*log2(n1+n2)
  • Process Difficulty: D = (n1/2)*(N2/n2)

Thus we get three metrics:

  • HPC_D: Halstead-based Process Complexity (process difficulty)
  • HPC_N: Halstead-based Process Complexity (process lenght)
  • HPC_V: Halstead-based Process Complexity (process volume)

The paper also discusses about a software complexity metric that is based on the impact of the information flow in a program’s structure. This is adapted to evaluate the complexity of processes in BPM, obtaining the Interface Complexity (IC), that is defined as:

IC = Length * (number of inputs * number of ouputs)^2

During the calculation of a software's complexity, length represents the number lines of code (LOC), and the number of inputs/outputs are represent the the flows of local information entering/leaving. For BPM models instead, the length of an activity is 1 if it is a black box, while it is represented by its LOC if it's a white box (we always consider activities as white box, so length is actually the same as the number of activities in the model): the fan-in/out are the number of Data Input/Output Associations. The four metrics that we obtain are:

  • NoI: number of Activities inputs (Fan-In)
  • NoO: number of Activities outputs (Fan-Out)
  • Lenght: Activities lenght (number of activities)
  • IC: Interface Complexity of Activities

The last metric discussed in the paper is the NOF, the number of archs present in the model.

  • NOF: number of Control Flow connections (number of archs)

Metrics from "Prediction Models for BPMN Usability and Maintainability"

by Rolón E, Sanchez L, Garcia F, Ruiz F, Piattini M, Caivano D, Visaggio G (2009, doi = {10.1007/11837862_13})

This paper presents the number of Sequence Flows metrics, that is equal to the NOF metric, and that is already extracted by the Basic Metrics Extractor.

  • TNSF: total number of Sequence Flows

Metrics from "On a quest for good process models: the cross-connectivity metric"

by Vanderfeesten I, Reijers HA, Mendling J, van der Aalst WM, Cardoso J (2008, doi = {10.1007/978-3-540-69534-9_36})

The Cross Connectivity metric is used to "measure the strength of the links between process model elements", so to measure the complexity of the mental operations that the reader of the model has to do in order to understand it. It is based on the "weakest-link metaphor", thus what counts the most it's the hardest part to understand in the model. A lower CC value means that they are more prone to include errore, because they are harder to understand. To get to this value, first we calculate the weight of every node in the model. Given the value d to represent the degree of the node (the number of incoming and outgoing flows of the node):

  • if the node is an Exclusive Gateway, its weight is 1/d;
  • if it's an Inclusive Gateway, its weight is (1 / 2^d - 1) + ((2^d - 2) / (2^d - 1)) * 1 / d;
  • otherwise, it is 1

The paper does not explicitly take into consideration every form of BPM Element, so we decided to give the type of nodes that were not nominated a weight of 1. After we got the weight of every node, we proceed to calculate the weight of the arcs. The weight of an arc is given by the product of the weight of his source node and the weight of his target node.

W(a) = w(src(a)) · w(dest(a))

With the weight of every arc in the model, we can obtain the value of every path. A path is the sequence of arcs that should be followed to get from a node n1 to a node n2. Its value is the product of the weights of every arc in the path.

v(p) = W(a1) ·W(a2) · ... ·W(ax)

The value of a connection between any given pair of node n1 and n2 is given by the maximum value of the set of paths from node n1 to node n2. In the case where the nodes are not connected, the value of the connection is 0.

V (n1, n2) = max[p∈Pn1,n2]v(p)

Eventually, with the values of the connections between every pair of nodes in the model, we can obtain the Cross-Connectivity value. It is defined as follow:

CC = Sum[n1,n2∈N]V(n1, n2) / (|N| · (|N| − 1))

  • CC: Cross-Connectivity

Metrics from "Quality metrics for business process modeling"

by Khlif W, Makni L, Zaaboub N, Ben-Abdallah H (2009)

The aim of the paper is to provide an adaptation of OO Software metrics for BPMN models. Besides some metric that we already covered, like the Halstead-based ones or the IC, the paper provides the definitions for the Imported Coupling of a Process and the Exported Coupling of a Process. Those metrics are used to provide a quality value that represents the coupling of a model. The ICP and the ECP are given by the sum of every outgoing/incoming flow of each task and/or of each task that is contained in the process (in case of subprocesses).

  • ICP: Imported Coupling of a Process
  • ECP: Exported Coupling of a Process

##Metrics from "Adopting the Cognitive Complexity Measure for Business Process Models" by Gruhn V, Laue R (2006, doi = {10.1109/COGINF.2006.365702})

This paper, as others that we have covered, wants to provide a metric that can measure the understandability and the maintainability of a Business Process Model. For this, the authors use as a base studies made on the cognitive weights of programming's basic control structures, and adapt them to BPMN structures, to obtain the Cognitive Weight metric. According to the paper, there are 8 type of structures that can be found in a model, and every one of them has a different weight:

  • Sequence: a sequence of simple consecutive steps. Weight: 1
  • Exlusive Choice 1: Exclusive split Gateways with 2 branches -> Weight: 2
  • Exclusive Choice 2: Exclusive split Gateways with more than 2 branches -> Weight: 3
  • Parallel Split and Synchronization: Parallel Gateways -> Weight: 4
  • Multiple Choice and Synchonizing Merge: Inclusive Gateways -> Weight: 7
  • User-define Function: Subprocesses -> Weight: 2
  • Multiple Instances Patterns: Multiple Instance Loop Characteristics -> Weight: 6
  • Cancel Activity: Cancel Events -> Weight: 1

According to the paper, there would also be another type of structure, the Cancel Case, a cancellation that deactivates all elements within another part of the model. We could't find anything like this in the BPMN notation, so we decided to not implement it. The sum of each weight of every structure present in the model is it's Cognitive Weight value.

  • W: Cognitive Weight

Metrics from "Complexity metrics for business process models"

by Gruhn V, Laue R (2006)

The Nesting Depth of a node is "the number of decisions in the control flow that are necessary to perform this action". The authors of the paper state that this value can affect the overall complexity of the model: the greater the nesting depth, the greater is the complexity. This leads us to the two metrics presented in the paper: the Maximum Nesting Depth and the Mean Nesting Depth. For their implementation, we chose to increment the value of the Nesting Depth only in the case of Exclusive and Complex Gateway, and, as this eventuality is not covered in the paper, we decided that, in the case of a node with more than one path that yields a different Nesting Depth value, the adopted value is the minimum among them.

  • MaxND: Maximum nesting depth
  • MeanND: Mean nesting depth

Metrics from "Cohesion and coupling metrics for workflow process design"

by Reijers HA, Vanderfeesten IT (2004, doi = {10.1007/978-3-540-25970-1_19})

"The coupling metric determines the number of related activities for each activity." It is given by the number of activities that are connected by a sequence flow to another activity, divided by the number of activities present in the model times the maximal number of activities' coupling (activities - 1)

Sum[s,t∈T]connected(s, t) / (|T|*(|T|-1)), where T is the set of the activities in the model

  • CP: Coupling

Metrics from "Finding a complexity measure for business process models"

by Latva-Koivisto AM (2001)

  • CNC: Coefficient of Network Complexity or Connectivity coefficient

Metrics from "Metrics for Process Models"

by Jan Mendling (2008), chapter 4

This book is probabily the most complete and precise of sources concerning BPMN models' analysis and metrics; some of what we could consider the "main" metrics come indeed from this book. It is mostly based on viewing and analysing BPMN models as graph, thus some degree of Graph Theory is obviously involved. We'll divide them following the same structure of the book (and thus of our classes).

Size Metrics

The Size of a model is simply the number of nodes that are present in it. A model with a larger size than others is more likely to contain a greater number of error than those other ones.

  • Sn: size

The Diameter of a model is "the length of the longest path from a start node to an end node". As for the Size, a model with a greater diameter is more likely to contain errors than others with smaller diamters.

  • diam: diameter

Density Metrics

The Density of a model is directly proportional to its error probability. It can be obtained by the number of archs (flows) divided by the number of nodes times the number of nodes minus 1 Δ(G) = |A| / (|N| * (|N| - 1)), where A is the set of the archs and N the set of the nodes

  • Δ(G): density

Partitionability Metrics

The Sequentialityrepresents the presence of simple consecutive nodes in the model, this being the easiest structure that a model can possibly present. A process with high Sequentiality should be less likely prone to have errors. It can be obtained as follows: Ξ(G) = |A ∩ {(T × T )}| / |A| -> number of arcs between noneconnector nodes divided by the number of arcs If every arc connects only non-connector nodes, the Sequentiality is 1.

  • Ξ(G): sequentiality

The Depth of a node n is based on two values: the in-depth value λin(n) and the out-depth value λout(n). They represent respectively the number of split nodes less the number of join node, and viceversa, that have to be visited in order to get to n. Of course, the in-depth value is based on the value of the predecessor's node, and the out-depth and the one of its successor. The depth λ(n) of a node is the minimum between its two depths, and the depth ^ of a model is the maximum depth among the ones of his nodes. The higher the depth, higher the probability of errors in the model.

  • ^: depth

Connector Interplay Metrics

  • MM: connector mismatch
  • CH: connector heterogeneity
  • CYC: cyclicity
  • TS: concurrency

Clone this wiki locally