|
36 | 36 | - [Cymru CAP Program](#cymru-cap-program) |
37 | 37 | - [Cymru Full Bogons](#cymru-full-bogons) |
38 | 38 | - [HTML Table Parser](#html-table-parser) |
| 39 | + - [Key-Value Parser](#key-value-parser) |
39 | 40 | - [Twitter](#twitter) |
40 | 41 | - [Shadowserver](#shadowserver) |
41 | 42 | - [Shodan](#shodan) |
|
74 | 75 | - [RipeNCC Abuse Contact](#ripencc-abuse-contact) |
75 | 76 | - [Sieve](#sieve) |
76 | 77 | - [Taxonomy](#taxonomy) |
| 78 | + - [Threshold](#threshold) |
77 | 79 | - [Tor Nodes](#tor-nodes) |
78 | 80 | - [Url2FQDN](#url2fqdn) |
79 | 81 | - [Wait](#wait) |
@@ -267,6 +269,12 @@ Zipped files are automatically extracted if detected. |
267 | 269 |
|
268 | 270 | For extracted files, every extracted file is sent in its own report. Every report has a field named `extra.file_name` with the file name in the archive the content was extracted from. |
269 | 271 |
|
| 272 | +#### HTTP Response status code checks |
| 273 | + |
| 274 | +If the HTTP response' status code is not 2xx, this is treated as error. |
| 275 | + |
| 276 | +In Debug logging level, the request's and response's headers and body are logged for further inspection. |
| 277 | + |
270 | 278 | * * * |
271 | 279 |
|
272 | 280 | ### Generic URL Stream Fetcher |
@@ -979,6 +987,8 @@ Events with the Malware "TestSinkholingLoss" are ignored, as they are for the fe |
979 | 987 |
|
980 | 988 | * `use_malware_familiy_as_classification_identifier`: default: `true`. Use the `malw.family` field as `classification.type`. If `false`, check if the same as `malw.variant`. If it is the same, it is ignored. Otherwise saved as `extra.malware.family`. |
981 | 989 |
|
| 990 | +* * * |
| 991 | + |
982 | 992 | ### Generic CSV Parser |
983 | 993 |
|
984 | 994 | Lines starting with `'#'` will be ignored. Headers won't be interpreted. |
@@ -1011,6 +1021,22 @@ Lines starting with `'#'` will be ignored. Headers won't be interpreted. |
1011 | 1021 | - parse a value and ignore if it fails `"columns": "source.url|__IGNORE__"` |
1012 | 1022 |
|
1013 | 1023 | * `"column_regex_search"`: Optional. A dictionary mapping field names (as given per the columns parameter) to regular expression. The field is evaluated using `re.search`. Eg. to get the ASN out of `AS1234` use: `{"source.asn": "[0-9]*"}`. Make sure to properly escape any backslashes in your regular expression (See also [#1579](https://github.com/certtools/intelmq/issues/1579). |
| 1024 | + * `"compose_fields"`: Optional, dictionary. Create fields from columns, e.g. with data like this: |
| 1025 | + ```csv |
| 1026 | + # Host,Path |
| 1027 | + example.com,/foo/ |
| 1028 | + example.net,/bar/ |
| 1029 | + ``` |
| 1030 | + using this compose_fields parameter: |
| 1031 | + ```json |
| 1032 | + {"source.url": "http://{0}{1}"} |
| 1033 | + ``` |
| 1034 | + You get: |
| 1035 | + ``` |
| 1036 | + http://example.com/foo/ |
| 1037 | + http://example.net/bar/ |
| 1038 | + ``` |
| 1039 | + in the respective `source.url` fields. The value in the dictionary mapping is formatted whereas the columns are available with their index. |
1014 | 1040 | * `"default_url_protocol"`: For URLs you can give a default protocol which will be pretended to the data. |
1015 | 1041 | * `"delimiter"`: separation character of the CSV, e.g. `","` |
1016 | 1042 | * `"skip_header"`: Boolean, skip the first line of the file, optional. Lines starting with `#` will be skipped additionally, make sure you do not skip more lines than needed! |
@@ -1225,6 +1251,45 @@ Parses breaches and pastes and creates one event per e-mail address. The e-mail |
1225 | 1251 |
|
1226 | 1252 | * * * |
1227 | 1253 |
|
| 1254 | +### Key-Value Parser |
| 1255 | + |
| 1256 | +#### Information: |
| 1257 | +* `name:` intelmq.bots.parsers.key_value.parser |
| 1258 | +* `lookup:` no |
| 1259 | +* `public:` no |
| 1260 | +* `cache (redis db):` none |
| 1261 | +* `description:` Parses text lines in key=value format, for example FortiGate firewall logs. |
| 1262 | + |
| 1263 | +#### Configuration Parameters: |
| 1264 | + |
| 1265 | +* `pair_separator`: String separating key=value pairs, default "` `" (space). |
| 1266 | +* `kv_separator`: String separating key and value, default `=`. |
| 1267 | +* `keys`: Array of string->string, names of keys to propagate mapped to IntelMQ event fields. Example: |
| 1268 | + ```json |
| 1269 | + "keys": { |
| 1270 | + "srcip": "source.ip", |
| 1271 | + "dstip": "destination.ip" |
| 1272 | + } |
| 1273 | + ``` |
| 1274 | + The value mapped to `time.source` is parsed. If the value is numeric, it is interpreted. Otherwise, or if it fails, it is parsed fuzzy with dateutil. |
| 1275 | + If the value cannot be parsed, a warning is logged per line. |
| 1276 | +* `strip_quotes`: Boolean, remove opening and closing quotes from values, default true. |
| 1277 | + |
| 1278 | +#### Parsing limitations |
| 1279 | + |
| 1280 | +The input must not have (quoted) occurrences of the separator in the values. For example, this is not parsable (with space as separator): |
| 1281 | + |
| 1282 | +``` |
| 1283 | +key="long value" key2="other value" |
| 1284 | +``` |
| 1285 | + |
| 1286 | +In firewall logs like FortiGate, this does not occur. These logs usually look like: |
| 1287 | +``` |
| 1288 | +srcip=192.0.2.1 srcmac="00:00:5e:00:17:17" |
| 1289 | +``` |
| 1290 | + |
| 1291 | +* * * |
| 1292 | + |
1228 | 1293 | ### McAfee Advanced Threat Defense File |
1229 | 1294 |
|
1230 | 1295 | #### Information: |
@@ -2500,6 +2565,42 @@ For brevity, "type" means `classification.type` and "taxonomy" means `classifica |
2500 | 2565 |
|
2501 | 2566 | * * * |
2502 | 2567 |
|
| 2568 | +### Threshold |
| 2569 | + |
| 2570 | +#### Information: |
| 2571 | + |
| 2572 | +* **Cache parameters** (see in section [common parameters](#common-parameters)) |
| 2573 | +* `name`: threshold |
| 2574 | +* `lookup`: redis cache |
| 2575 | +* `public`: no |
| 2576 | +* `cache (redis db)`: 11 |
| 2577 | +* `description`: Check if the number of similar messages during a specified time interval exceeds a set value. |
| 2578 | + |
| 2579 | +#### Configuration Parameters: |
| 2580 | + |
| 2581 | +* `filter_keys`: String, comma-separated list of field names to consider or ignore when determining which messages are similar. |
| 2582 | +* `filter_type`: String, `whitelist` (consider only the fields in `filter_keys`) or `blacklist` (consider everything but the fields in `filter_keys`). |
| 2583 | +* `timeout`: Integer, number of seconds before threshold counter is reset. |
| 2584 | +* `threshold`: Integer, number of messages required before propagating one. In forwarded messages, the threshold is saved in the message as `extra.count`. |
| 2585 | +* `add_keys`: Array of string->string, optional, fields and values to add (or update) to propagated messages. Example: |
| 2586 | + ```json |
| 2587 | + "add_keys": { |
| 2588 | + "classification.type": "spam", |
| 2589 | + "comment": "Started more than 10 SMTP connections" |
| 2590 | + } |
| 2591 | + ``` |
| 2592 | + |
| 2593 | +#### Limitations |
| 2594 | + |
| 2595 | +This bot has certain limitations and is not a true threshold filter (yet). It works like this: |
| 2596 | +1. Every incoming message is hashed according to the `filter_*` parameters. |
| 2597 | +2. The hash is looked up in the cache and the count is incremented by 1, and the TTL of the key is (re-)set to the timeout. |
| 2598 | +3. If the new count matches the threshold exactly, the message is forwarded. Otherwise it is dropped. |
| 2599 | + |
| 2600 | +Please note: Even if a message is sent, any further identical messages are dropped, if the time difference to the last message is less than the timeout! The counter is not reset if the threshold is reached. |
| 2601 | + |
| 2602 | +* * * |
| 2603 | + |
2503 | 2604 | ### Tor Nodes |
2504 | 2605 |
|
2505 | 2606 | #### Information: |
|
0 commit comments