Skip to content

Commit 1b2d8fe

Browse files
authored
Merge pull request #1 from CESNET/readme
WIF - README
2 parents 49b9d64 + 907dabf commit 1b2d8fe

File tree

1 file changed

+182
-1
lines changed

1 file changed

+182
-1
lines changed

README.md

Lines changed: 182 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,182 @@
1-
WIF (Weak Indication Framework)
1+
# WIF - Weak Indication Framework
2+
## Description
3+
C++ library for fast development of (heterogeneous) detection and classification modules for (Encrypted) Network Traffic Analysis - (E)NTA. The library contains the most commonly used methods for ENTA. Therefore, WIF ims to minimize the time between the detection of a new threat and the deployment of tailored module for its detection.
4+
5+
WIF contains following structure and objects:
6+
- Classifiers
7+
- Pattern-matching via Regex
8+
- IP blocklists
9+
- Machine Learning via scikit-learn
10+
- Possible interconnection with [ALF](https://github.com/CESNET/ALF)
11+
- Combinators
12+
- Average
13+
- Dempster-Shafer Theory
14+
- Majority
15+
- Sum
16+
- Reporters
17+
- Unirec
18+
- Data storage classes
19+
- Classification result (*ClfResult*)
20+
- IP address
21+
- Network IP flow (*FlowFeatures*)
22+
- Utils
23+
- IP prefix (range or subnet)
24+
- Timer
25+
26+
Classifiers perform traffic classification and threat detection. Combinators are used to fuse weak results together to obtain more robust and accurate result. Reporters are used for additional data exfiltration from modules for increased explainability. Utils are mainky used by other parts of the library but are made available for others as well. For more info, we would kindly refer to the section **Using WIF for Development**.
27+
28+
Note that Unirec Repoter and ALF Classifier are only available when `BUILD_WITH_UNIREC` is enabled, described in the section **Build & Installation** below.
29+
30+
## Requirements
31+
- Python3.6 or Python3.8 devel with numpy:
32+
1. python36-devel, python36-numpy
33+
1. python38-devel, python38-numpy
34+
35+
Optionally, for additional features (`BUILD_WITH_UNIREC` option must be enabled manually):
36+
- [CESNET/Nemea](https://github.com/CESNET/Nemea), mainly libtrap, libunirec, and libunirec++
37+
38+
## Build & Installation
39+
### Build from Source
40+
```
41+
git clone https://github.com/CESNET/WIF.git
42+
cd WIF
43+
make
44+
# For setting installation folders etc.
45+
ccmake build
46+
sudo make install
47+
```
48+
49+
### Build and Install RPM
50+
```
51+
git clone https://github.com/CESNET/WIF.git
52+
cd WIF
53+
make rpm
54+
sudo rpm -i <pathToRpmOutputtedByPreviousCommand>
55+
```
56+
57+
## Documentation
58+
Doxygen documentation can be generated by calling:
59+
```
60+
make docs
61+
```
62+
63+
## Using WIF for Development
64+
65+
### Preparation
66+
WIF-based module should firstly transform data into *FlowFeatures* object as it is used as input to classifiers. *FlowFeatures* is a wrapper over **std::vector** of allowed types, see *DataVariant* class. The recommended use is to receive network flows periodically in a loop and run it through the implemented classification method. Therefore, *FlowFeatures* should keep its layout: contain the same features on the same indexes throught the processing.
67+
```
68+
#include <wif/storage/flowFeatures.hpp>
69+
WIF::FlowFeatures flow(NUMBER_OF_FLOW_ELEMENTS);
70+
```
71+
72+
### Classification
73+
All classifiers share a common interface define by an abstract *Classifier* class. The *classify()* takes either *FlowFeatures* or *std::vector\<WIF::FlowFeatures\>* and performs the classification. However, *setSourceFeatureIDs()* msut be called before the first *classify()* call. This method sets source indexes of the *FlowFeatures* which will be processed by the classifier.
74+
75+
The return type of *classify()* is *ClfResult* - variant holding either *double* (result of IP-blocklist-based detection: 0 or 1) or *std::vector\<double\>* (result of Machine Learning: array of probablities of each class). Each classifier defines what value type the *ClfResult* holds and what it means. The value can be obtained by one of the following calls:
76+
```
77+
clfResult.get<double>();
78+
clfResult.get<std::vector<double>>();
79+
```
80+
81+
The code example below shows how to correctly use *IpPrefixClassifier*:
82+
```
83+
#include <wif/classifiers/ipPrefixClassifier.hpp>
84+
85+
constexpr WIF::FeatureID SRC_IP_ID = 0;
86+
constexpr WIF::FeatureID DST_IP_ID = 1;
87+
88+
...
89+
90+
std::vector<WIF::IpPrefix> blocklist = { WIF::IpPrefix("10.0.0.0/28") };
91+
WIF::IpPrefixClassifier clf(blocklist);
92+
clf.setSourceFeatureIDs({
93+
SRC_IP_ID,
94+
DST_IP_ID
95+
});
96+
97+
...
98+
99+
while (record = receiveRecord()) {
100+
WIF::FlowFeatures flow(2);
101+
flow.set<WIF::IpAddress>(SRC_IP_ID, extractSrcIp(record));
102+
flow.set<WIF::IpAddress>(DST_IP_ID, extractDstIp(record));
103+
104+
WIF::ClfResult res = clf.classify(flow);
105+
if (res.get<double>() > 0) {
106+
std::cout << "Blocklisted communication detected!" << std::endl;
107+
}
108+
}
109+
```
110+
111+
### Scikit-learn Interconnection
112+
Python C API is used for performing ML-based classification in *ScikitMlClassifier* and *AlfClassifier*. Two files are required to perform this task. The first one is an actual ML model in a [pickle](https://docs.python.org/3/library/pickle.html) format. The second one is called *bridge*. This file must contain two functions:
113+
1. `init(model_path)`
114+
- Obtains a string with path to the ML model, loads it and returns it
115+
2. `classify(classifier, features)`
116+
- Obtains the loaded ML model and 2D array of features to classify, calls `predict_proba()`, and returns the output
117+
118+
Example `bridge.py` is shown below and can be used for many tasks:
119+
```
120+
import pickle
121+
122+
123+
def init(model_path):
124+
with open(model_path, 'rb') as f:
125+
return pickle.load(f)
126+
127+
128+
def classify(classifier, features):
129+
try:
130+
return classifier.predict_proba(features).tolist()
131+
except Exception as e:
132+
print(e)
133+
return []
134+
```
135+
136+
### Combination
137+
Combinators perform data combination and fusion. The interface of this object group is defined by abstract *Combinator* class. No prior method must be called before usage, all initialization is performed in constructors. Then, *combine()* method performs the actual combination.
138+
```
139+
#include <wif/combinators/averageCombinator.hpp>
140+
constexpr double THRESHOLD_FOR_POSITIVE_DETECTION 0.65
141+
142+
WIF::AverageCombinator avgCom;
143+
...
144+
145+
double averageScore = avgCom.combine({tlsSniScore, mlProba, blocklistScore});
146+
if (averageScore >= THRESHOLD_FOR_POSITIVE_DETECTION) {
147+
std::cout << "Positive detection!" << std::endl;
148+
}
149+
```
150+
151+
### Reporters
152+
The common interface of Reporters is defined by abstract *Reporter* object. Currently, the only available reporter is *UnirecRepoter* which is built on top of the [libunirec](https://github.com/CESNET/Nemea-Framework/tree/master/unirec). The function of reporters can be described as Finite State Automatons (FSM). Firstly, *onRecordStart()* must be called, to indicate a start of a new message. Then, *report(DataVariant)* can be called periodically. At the end, *onRecordEnd()* is called to indicate that the record can be sent to output. However, buffering may be used and record does not need to be sent right away. Use *flush()* to send out pending records.
153+
154+
In the case of *UnirecReporter*, *report(DataVariant)* method extracts the value held by the passed *DataVariant*, and it is used as a value of the next unirec field. Then, field ID is incremented and the next call of *report()* will set value to the next field. Therefore, it is required to follow the correct field order.
155+
156+
The example code for using *UnirecReporter* is shown below. For the documentation and usage of `Nemea`, we would kindly refer the reader to the [Nemea](https://github.com/CESNET/Nemea) and [libunirec](https://github.com/CESNET/Nemea-Framework/tree/master/unirec) repositories.
157+
```
158+
const std::string REPORTER_TEMPLATE = "double TLS_SNI_SCORE,uint8 DETECTION_RESULT";
159+
160+
Nemea::UnirecOutputInterface reporterIfc = unirec.buildOutputInterface();
161+
reporterIfc.changeTemplate(REPORTER_TEMPLATE);
162+
WIF::UnirecReporter unirecReporter(reporterIfc);
163+
164+
...
165+
166+
// Notice: the values are reported in the same order as defined in the REPORTER_TEMPLATE
167+
unirecReporter.onRecordStart();
168+
unirecReporter.report(tlsSniScore);
169+
unirecReporter.report(detectionResult);
170+
unirecReporter.onRecordEnd();
171+
unirecReporter.flush();
172+
173+
...
174+
```
175+
176+
## Contact
177+
If you have any questions or problems, you are welcomed to send an email to [[email protected]](mailto:[email protected]).
178+
179+
## License
180+
This project is distributed under the [BSD-3-Clause license](LICENSE).
181+
<br>
182+
&copy; 2024, CESNET z.s.p.o.

0 commit comments

Comments
 (0)