Skip to content

2.2. Behavioral Anomaly Detection functionality

eliseu edited this page Apr 24, 2020 · 4 revisions

Behavioral Anomaly Detection

The Behavioral Anomaly Detection functionality embedded in the DINASORE has the main goal of analyzing all the interactions among function blocks and alert if anything is out of normal behavior. This is useful in two main situations. The first one is to detect malfunctions of function block implementation during the development phase, where so behavioral errors might occur after a certain time of execution. The bad management of internal resources, such as variables that accumulate a lot of data and the function block starts getting lazy, is one classical example. The second situation is when the function block behavior is already validated and is deployed in relevant conditions. Malfunctions might occur when there's an integration with third-party applications and, for some reason, they stop working or send a wrong input that was not foreseen and might cause some unexpected behavior. Ultimately, this functionality is ONLY applicable in cyclic behaviors, and not triggered based configurations, such as optimization of parameters on demand like a decision support system.

In order to detect if a certain function block is misbehaving, first, it is important to know what a normal behavior is. Hence, this functionality is divided into two phases: 1) Data collection for unsupervised model training; 2) Data collection of online classification of behavior using the trained algorithms. As specified in the Installation Manual, to activate this functionality, the -m argument should be used when executing the DINASORE. This argument has two different parameters: 1) Amount of samples to collect for training and 2) Window size in seconds to analyze the behavior for one single sample. For example, if you specify -m 5 10, this means that you will collect 5 samples for training, each one with 10 seconds for behavioral statistical data. In fact, if you inspect the csvs yourself, you'll see a higher amount of samples than the one you've specified. This is due to the fact that the strategy used for feature extraction uses lag features, meaning that it uses several features with time windows shorter than the ones specified in the -m argument (don't worry about these details, but if you wanna know how it works, just knock at my door). For behavioral classification, 3 algorithms were used, namely DBSCAN, Empirical Covariance, and One-Class SVM. For each pair of function block-algorithm, a csv file is created where all samples are registered and classified, suitable for showing the results in an external application. Additionally, there is a csv file per function block with the amount of input and output events received during the specified time window. This is particularly useful to, not only know what is the function block that altered the behavior, but also to know the function block that failed in the first place. As configurations tend to be in a pipeline-style (series of function blocks), if the first function block alters its behavior, the following ones will also be affected. This way is important to understand what function block triggered misbehaved. In order to better inspect what's happening, a monitoring tool with a GUI was built. This tool is available at DINASORE in the folder dinasore_ua/resources/ and is called monitoring.py. You can execute this tool as simple as python monitoring.py.

So, to better understand this functionality, a configuration will be used as an example and depicted in the following figure:

Normal behavior

This is an example of two different sensors - Temperature and Voltage - that can control a set of LEDs to show the state of the system. If the temperature is above a max value, LED 1 turns on, and if it is below min value, LED 2 turns on. Additionally, if Voltage is above a certain max value, the LED3 turns on. If you use the default parameters for the argument -m, execute the monitoring tool, and for a bit, you'll see the following GUI:

Normal Behavior GUI

As can be observed, each row is some behavioral analysis performed and each column is a function block of the configuration. Due to space limitations, please be careful about the number of function blocks you use, otherwise, it will be impossible to inspect anything in the GUI. The first three rows are a 2D scatter plot, where green dots are good behaviors and red dots are bad ones. Please don't try to interpret the x and y-axis, as they are a result of a PCA decomposition. The bottom two are more readable outputs of what's happening. The Health Monitoring plot shows the result of anomaly classification so you can know what is the current state of the function block, as well as the historical state. The Events plot depicts two graphs, one for the number of inputs and the other for outputs. This allows you to inspect if more or fewer events are triggered in each function block and detect anomalies. This graph is important because it gives you the answer to what function block failed in the first place.

The following image depicts what a failure looks like in the monitoring GUI:

Failure one GUI

In this figure, you can clearly see that the failing function blocks are CC_Voltage, VOLTAGE and LED3. This is confirmed by the anomaly injected in the configuration via 4DIAC by changing the RATE input data in the VOLTAGE function block from 2 to 10, as you can see:

Failure one 4DIAC

If we change it back to the value of 2, and now change the same RATE input from 2 to 10 but for the TEMPERATURE function block, you will get the following GUI:

Failure two GUI

Finally, if we set the RATE input back to 2 on the TEMPERATURE function block, we get back the normal behavior, as can be seen:

Normal two GUI

If you want to implement your own GUI, feel free to use the csv files available in dinasore_ua/resources/monitoring folder.

Clone this wiki locally