Skip to content

Commit 9648e87

Browse files
authored
Merge branch 'develop' into develop
2 parents 6fb682f + ab6c7c7 commit 9648e87

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1357
-156
lines changed

.travis.yml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
sudo: required
1+
os: linux
22
dist: xenial
33
language: python
44
env:
@@ -9,7 +9,7 @@ python:
99
- 3.6
1010
- 3.7
1111
- 3.8
12-
matrix:
12+
jobs:
1313
include:
1414
- python: 3.5
1515
env: mode=debian
@@ -25,12 +25,12 @@ before_install:
2525
- if [[ -v requirements ]]; then sudo systemctl start elasticsearch; fi
2626
install:
2727
- set -e
28-
- if [[ -v requirements ]]; then sudo apt-get install polipo lighttpd; fi
2928
- if [[ $mode == debian ]]; then sudo apt-get install dpkg-dev dh-python python-setuptools python3-setuptools python3-all debhelper quilt fakeroot dh-systemd safe-rm; pip3 install requests; pip3 install redis; pip3 install dnspython; pip3 install psutil; pip3 install python-dateutil; pip3 install termstyle; pip3 install pytz; pip3 install typing; fi
3029
- if [[ $requirements == true ]]; then for file in intelmq/bots/*/*/REQUIREMENTS.txt; do pip install -r $file; done; fi
3130
- if [[ -v requirements ]]; then pip install Cerberus!=1.3 codecov pyyaml requests_mock; fi
3231
- if [[ $mode == codestyle ]]; then pip install pycodestyle; fi
3332
- if [[ -v requirements ]]; then sudo sed -i '/^Defaults\tsecure_path.*$/ d' /etc/sudoers; fi
33+
- if [[ $TRAVIS_PYTHON_VERSION < '3.6' ]]; then rm intelmq/bin/intelmq_gen_docs.py intelmq/tests/bin/test_gen_docs.py; fi # file requires Python 3.6
3434
- if [[ -v requirements ]]; then sudo pip install .; fi
3535
- if [[ -v requirements ]]; then sudo intelmqsetup --skip-ownership; fi
3636
before_script:
@@ -49,17 +49,16 @@ before_script:
4949
- if [[ $mode == debian ]]; then tar -xzf ../intelmq_$version.orig.tar.gz; fi
5050
- if [[ $mode == debian ]]; then tar -xzf ../intelmq_$debversion.debian.tar.gz; fi
5151
- if [[ $mode == debian ]]; then popd; fi
52-
- if [[ -v requirements ]]; then sudo cp intelmq/tests/assets/* /var/www/html/ && sudo touch /var/www/html/$(date +%Y).txt; fi
5352
- if [[ -v requirements ]]; then gpg --import intelmq/tests/assets/key-public.pgp; fi
5453
- if [[ $requirements == true ]]; then sudo bash -c 'echo "[rabbitmq_management]." > /etc/rabbitmq/enabled_plugins' && sudo systemctl restart rabbitmq-server; fi
5554
script:
56-
- if [[ $requirements == true ]]; then TZ=utc INTELMQ_TEST_DATABASES=1 INTELMQ_TEST_LOCAL_WEB=1 INTELMQ_TEST_EXOTIC=1 nosetests --with-coverage --cover-package=intelmq --cover-branches; find contrib/ -name "test*.py" -exec nosetests {} \+; elif [[ $requirements == false ]]; then INTELMQ_TEST_LOCAL_WEB=1 nosetests --with-coverage --cover-package=intelmq --cover-branches; fi
55+
- if [[ $requirements == true ]]; then TZ=utc INTELMQ_TEST_DATABASES=1 INTELMQ_TEST_EXOTIC=1 nosetests --with-coverage --cover-package=intelmq --cover-branches; find contrib/ -name "test*.py" -exec nosetests {} \+; elif [[ $requirements == false ]]; then nosetests --with-coverage --cover-package=intelmq --cover-branches; fi
5756
- if [[ $mode == codestyle ]]; then pycodestyle intelmq/{bots,lib,bin}; fi
5857
- if [[ $mode == debian ]]; then pushd ../build; fi
5958
- if [[ $mode == debian ]]; then DEB_BUILD_OPTIONS='nocheck' dpkg-buildpackage -us -uc -d; fi
6059
- if [[ $mode == debian ]]; then popd; fi
6160
services:
62-
- redis-server
61+
- redis
6362
- postgresql
6463
- mongodb
6564
- rabbitmq

CHANGELOG.md

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,19 +19,25 @@ CHANGELOG
1919
- `create_request_session_from_bot`: Changed bot argument to optional, uses defaults.conf as fallback, renamed to `create_request_session`. Name `create_request_session_from_bot` will be removed in version 3.0.0.
2020

2121
### Development
22+
- `intelmq.bin.intelmq_gen_docs`: Add bot name to the `Feeds.md` documentation (PR#1617 by Birger Schacht).
2223

2324
### Harmonization
2425

2526
### Bots
2627
#### Collectors
2728
- `intelmq.bots.collectors.eset.collector`: Added (PR#1554 by Mikk Margus Möll).
28-
- `intelmq.bots.collectors.http.collector_http`: Added PGP signature check functionality (PR#1602 by sinus-x).
29+
- `intelmq.bots.collectors.http.collector_http`:
30+
- Added PGP signature check functionality (PR#1602 by sinus-x).
31+
- If status code is not 2xx, the request's and response's headers and body are logged in debug logging level (#1615).
2932

3033
#### Parsers
3134
- `intelmq.bots.parsers.eset.parser`: Added (PR#1554 by Mikk Margus Möll).
3235
- Ignore invalid "NXDOMAIN" IP addresses (PR#1573 by Mikk Margus Möll).
3336
- `intelmq.bots.parsers.hphosts`: Removed, feed is unavailable (#1559).
34-
- `intelmq.bots.parsers.cznic.parser_haas`: Added (PR#1560 by Filip Pokorný and Edvard Rejthar)
37+
- `intelmq.bots.parsers.cznic.parser_haas`: Added (PR#1560 by Filip Pokorný and Edvard Rejthar).
38+
- `intelmq.bots.parsers.cznic.parser_proki`: Added (PR#1599 by sinus-x).
39+
- `intelmq.bots.parsers.key_value.parser`: Added (PR#1607 by Karl-Johan Karlsson).
40+
- `intelmq.bots.parsers.generic.parser_csv`: Added new parameter `compose_fields`.
3541

3642
#### Experts
3743
- `intelmq.bots.experts.rfc1918.expert`:
@@ -56,6 +62,7 @@ CHANGELOG
5662
- Added `--update-database` option. (PR#1524 by Filip Pokorný)
5763
- Added `api_token` parameter. (PR#1524 by Filip Pokorný)
5864
- The script `update-rfiprisk-data` is now deprecated and will be removed in version 3.0.
65+
- Added `intelmq.bots.experts.threshold` (PR#1608 by Karl-Johan Karlsson).
5966

6067
#### Outputs
6168

@@ -64,6 +71,8 @@ CHANGELOG
6471
- Add ESET URL and Domain feeds
6572
- Remove unavailable *HPHosts Hosts file* feed (#1559).
6673
- Added CZ.NIC HaaS feed (PR#1560 by Filip Pokorný and Edvard Rejthar).
74+
- Added CZ.NIC Proki feed (PR#1599 by sinus-x).
75+
- Added CERT-BUND CB-Report Malware infections feed (PR#1598 by sinus-x).
6776
- Bots:
6877
- Enhanced documentation of RFC1918 Expert.
6978
- Updated documentation for Maxmind GeoIP, ASN Lookup, TOR Nodes and Recorded Future experts to reflect new `--update-database` option. (PR#1524 by Filip Pokorný)
@@ -74,6 +83,15 @@ CHANGELOG
7483

7584
### Tests
7685
- Added tests for `intelmq.lib.exceptions.PipelineError`.
86+
- `intelmq.tests.bots.collectors.http_collector.test_collector`: Use requests_mock to mock all requests and do not require a local webserver.
87+
- `intelmq.tests.bots.outputs.restapi.test_output`:
88+
- Use requests_mock to mock all requests and do not require a local webserver.
89+
- Add a test for checking the response status code.
90+
- `intelmq.tests.bots.collectors.mail.test_collector_url`: Use requests_mock to mock all requests and do not require a local webserver.
91+
- `intelmq.tests.bots.experts.ripe.test_expert`: Use requests_mock to mock all requests and do not require a local webserver.
92+
- The test flag (environment variable) `INTELMQ_TEST_EXOTIC` is no longer used.
93+
- Travis:
94+
- Remove installation of local web-server (not necessary anymore) and HTTP proxy (no tests anymore).
7795

7896
### Tools
7997
- `intelmqdump`:
@@ -113,6 +131,9 @@ CHANGELOG
113131
- `intelmq/bots/parsers/danger_rulez/parser`: correctly skip malformed rows by defining variables before referencing (PR#1601 by Tomas Bellus).
114132

115133
#### Experts
134+
- `intelmq.bots.experts.cymru_whois`:
135+
- Fix cache key calculation which previously led to duplicate keys and therefore wrong results in rare cases. The cache key calculation is intentionally not backwards-compatible (#1592, PR#1606).
136+
- The bot now caches and logs (as level INFO) empty responses from Cymru (PR#1606).
116137

117138
#### Outputs
118139
- `intelmq.bots.outputs.rt`: Added Request Tracker output bot (PR#1589 by Marius Urkis).

NEWS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,11 @@ See the changelog for a full list of changes.
3939
2.2.2 Bugfix release (unreleased)
4040
---------------------------------
4141

42+
### Bots
43+
#### Cymru Whois Lookup
44+
The cache key calculation has been fixed. It previously led to duplicate keys for different IP addresses and therefore wrong results in rare cases. The cache key calculation is intentionally not backwards-compatible. Therefore, this bot may take longer processing events than usual after applying this update.
45+
More details can be found in [issue #1592](https://github.com/certtools/intelmq/issues/1592).
46+
4247
### Requirements
4348

4449
### Tools

contrib/eventdb/apply_mapping_eventdb.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,9 +55,9 @@ def eventdb_apply(malware_name_column, malware_family_column, host, port,
5555

5656
cur.execute('SELECT DISTINCT "classification.identifier", "malware.name" FROM {table} '
5757
'WHERE "classification.taxonomy" = \'malicious code\' {where}'
58-
''.format(table=table, where=where))
58+
''.format(table=table, where=where))
5959
if dry_run:
60-
execute = lambda x, y: print(cur.mogrify(x, y).decode())
60+
execute = lambda x, y: print(cur.mogrify(x, y).decode()) # noqa: E731
6161
else:
6262
execute = cur.execute
6363
for (identifier, malware_name) in cur.fetchall():
@@ -76,7 +76,7 @@ def eventdb_apply(malware_name_column, malware_family_column, host, port,
7676
'AND "classification.identifier" IS DISTINCT FROM %s AND '
7777
'"classification.taxonomy" = \'malicious code\' {where}'
7878
''.format(table=table, where=where),
79-
(rule[1], malware_name, rule[1]))
79+
(rule[1], malware_name, rule[1]))
8080
break
8181
else:
8282
print('missing mapping for', repr(malware_name))

contrib/malware_name_mapping/download_mapping.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@
1919
URL_MALPEDIA = 'https://raw.githubusercontent.com/certtools/malware_name_mapping/master/malpedia.csv'
2020
URL_MISP = 'https://raw.githubusercontent.com/MISP/misp-galaxy/main/clusters/threat-actor.json'
2121

22-
REGEX_FROM_HUMAN = re.compile(r"((?P<res1>[a-z])(?=[A-Z])|" # "fooBar"
23-
r"(?P<res2>.)(\\ )(?=[^\]])|" # "foo bar" but not "foo[-_ ]?bar"
24-
r"(?P<res3>[^\[-])\\-(?=[^-]))") # "foo-bar" but not "foo[-_ ]?bar"
22+
REGEX_FROM_HUMAN = re.compile(r"((?P<res1>[a-z])(?=[A-Z])|" # "fooBar"
23+
r"(?P<res2>.)(\\ )(?=[^\]])|" # "foo bar" but not "foo[-_ ]?bar"
24+
r"(?P<res3>[^\[-])\\-(?=[^-]))") # "foo-bar" but not "foo[-_ ]?bar"
2525
IDENTIFIER_FROM_HUMAN = re.compile(r"[^a-z0-9]+")
2626

2727

@@ -75,7 +75,7 @@ def generate_regex_from_human(*values):
7575
return "^(%s)$" % "|".join(newvalues)
7676

7777

78-
def download(url: str=URL, add_default=False, params=None, include_malpedia=False,
78+
def download(url: str = URL, add_default=False, params=None, include_malpedia=False,
7979
include_misp=False, mwnmp_ignore_adware=False):
8080
download = requests.get(url)
8181
download.raise_for_status()
@@ -96,7 +96,7 @@ def download(url: str=URL, add_default=False, params=None, include_malpedia=Fals
9696
names = [actor["value"]] + actor.get("meta", {}).get("synonyms", [])
9797
identifier = ("%s-generic"
9898
"" % IDENTIFIER_FROM_HUMAN.sub("-",
99-
actor["value"].lower()))
99+
actor["value"].lower()))
100100
rule_name = "misp-threat-actors-%s" % identifier
101101

102102
rules.append(generate_rule(generate_regex_from_human(*names),

contrib/malware_name_mapping/test_download_mapping.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ def test_download_add_default_constant(self):
6969
'then': {'classification.identifier': 'constant'}}]
7070
)
7171

72-
maxDiff=None
72+
maxDiff = None
7373

7474

7575
class TestParser(unittest.TestCase):

contrib/systemd/systemd.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
#!/usr/bin/python3
22
# -*- coding: utf-8 -*-
3+
# flake8: noqa
34
import collections
45
import grp
56
import json

docs/Bots.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
- [Cymru CAP Program](#cymru-cap-program)
3737
- [Cymru Full Bogons](#cymru-full-bogons)
3838
- [HTML Table Parser](#html-table-parser)
39+
- [Key-Value Parser](#key-value-parser)
3940
- [Twitter](#twitter)
4041
- [Shadowserver](#shadowserver)
4142
- [Shodan](#shodan)
@@ -74,6 +75,7 @@
7475
- [RipeNCC Abuse Contact](#ripencc-abuse-contact)
7576
- [Sieve](#sieve)
7677
- [Taxonomy](#taxonomy)
78+
- [Threshold](#threshold)
7779
- [Tor Nodes](#tor-nodes)
7880
- [Url2FQDN](#url2fqdn)
7981
- [Wait](#wait)
@@ -267,6 +269,12 @@ Zipped files are automatically extracted if detected.
267269

268270
For extracted files, every extracted file is sent in its own report. Every report has a field named `extra.file_name` with the file name in the archive the content was extracted from.
269271

272+
#### HTTP Response status code checks
273+
274+
If the HTTP response' status code is not 2xx, this is treated as error.
275+
276+
In Debug logging level, the request's and response's headers and body are logged for further inspection.
277+
270278
* * *
271279

272280
### Generic URL Stream Fetcher
@@ -979,6 +987,8 @@ Events with the Malware "TestSinkholingLoss" are ignored, as they are for the fe
979987

980988
* `use_malware_familiy_as_classification_identifier`: default: `true`. Use the `malw.family` field as `classification.type`. If `false`, check if the same as `malw.variant`. If it is the same, it is ignored. Otherwise saved as `extra.malware.family`.
981989

990+
* * *
991+
982992
### Generic CSV Parser
983993

984994
Lines starting with `'#'` will be ignored. Headers won't be interpreted.
@@ -1011,6 +1021,22 @@ Lines starting with `'#'` will be ignored. Headers won't be interpreted.
10111021
- parse a value and ignore if it fails `"columns": "source.url|__IGNORE__"`
10121022
10131023
* `"column_regex_search"`: Optional. A dictionary mapping field names (as given per the columns parameter) to regular expression. The field is evaluated using `re.search`. Eg. to get the ASN out of `AS1234` use: `{"source.asn": "[0-9]*"}`. Make sure to properly escape any backslashes in your regular expression (See also [#1579](https://github.com/certtools/intelmq/issues/1579).
1024+
* `"compose_fields"`: Optional, dictionary. Create fields from columns, e.g. with data like this:
1025+
```csv
1026+
# Host,Path
1027+
example.com,/foo/
1028+
example.net,/bar/
1029+
```
1030+
using this compose_fields parameter:
1031+
```json
1032+
{"source.url": "http://{0}{1}"}
1033+
```
1034+
You get:
1035+
```
1036+
http://example.com/foo/
1037+
http://example.net/bar/
1038+
```
1039+
in the respective `source.url` fields. The value in the dictionary mapping is formatted whereas the columns are available with their index.
10141040
* `"default_url_protocol"`: For URLs you can give a default protocol which will be pretended to the data.
10151041
* `"delimiter"`: separation character of the CSV, e.g. `","`
10161042
* `"skip_header"`: Boolean, skip the first line of the file, optional. Lines starting with `#` will be skipped additionally, make sure you do not skip more lines than needed!
@@ -1225,6 +1251,45 @@ Parses breaches and pastes and creates one event per e-mail address. The e-mail
12251251

12261252
* * *
12271253

1254+
### Key-Value Parser
1255+
1256+
#### Information:
1257+
* `name:` intelmq.bots.parsers.key_value.parser
1258+
* `lookup:` no
1259+
* `public:` no
1260+
* `cache (redis db):` none
1261+
* `description:` Parses text lines in key=value format, for example FortiGate firewall logs.
1262+
1263+
#### Configuration Parameters:
1264+
1265+
* `pair_separator`: String separating key=value pairs, default "` `" (space).
1266+
* `kv_separator`: String separating key and value, default `=`.
1267+
* `keys`: Array of string->string, names of keys to propagate mapped to IntelMQ event fields. Example:
1268+
```json
1269+
"keys": {
1270+
"srcip": "source.ip",
1271+
"dstip": "destination.ip"
1272+
}
1273+
```
1274+
The value mapped to `time.source` is parsed. If the value is numeric, it is interpreted. Otherwise, or if it fails, it is parsed fuzzy with dateutil.
1275+
If the value cannot be parsed, a warning is logged per line.
1276+
* `strip_quotes`: Boolean, remove opening and closing quotes from values, default true.
1277+
1278+
#### Parsing limitations
1279+
1280+
The input must not have (quoted) occurrences of the separator in the values. For example, this is not parsable (with space as separator):
1281+
1282+
```
1283+
key="long value" key2="other value"
1284+
```
1285+
1286+
In firewall logs like FortiGate, this does not occur. These logs usually look like:
1287+
```
1288+
srcip=192.0.2.1 srcmac="00:00:5e:00:17:17"
1289+
```
1290+
1291+
* * *
1292+
12281293
### McAfee Advanced Threat Defense File
12291294

12301295
#### Information:
@@ -2500,6 +2565,42 @@ For brevity, "type" means `classification.type` and "taxonomy" means `classifica
25002565

25012566
* * *
25022567

2568+
### Threshold
2569+
2570+
#### Information:
2571+
2572+
* **Cache parameters** (see in section [common parameters](#common-parameters))
2573+
* `name`: threshold
2574+
* `lookup`: redis cache
2575+
* `public`: no
2576+
* `cache (redis db)`: 11
2577+
* `description`: Check if the number of similar messages during a specified time interval exceeds a set value.
2578+
2579+
#### Configuration Parameters:
2580+
2581+
* `filter_keys`: String, comma-separated list of field names to consider or ignore when determining which messages are similar.
2582+
* `filter_type`: String, `whitelist` (consider only the fields in `filter_keys`) or `blacklist` (consider everything but the fields in `filter_keys`).
2583+
* `timeout`: Integer, number of seconds before threshold counter is reset.
2584+
* `threshold`: Integer, number of messages required before propagating one. In forwarded messages, the threshold is saved in the message as `extra.count`.
2585+
* `add_keys`: Array of string->string, optional, fields and values to add (or update) to propagated messages. Example:
2586+
```json
2587+
"add_keys": {
2588+
"classification.type": "spam",
2589+
"comment": "Started more than 10 SMTP connections"
2590+
}
2591+
```
2592+
2593+
#### Limitations
2594+
2595+
This bot has certain limitations and is not a true threshold filter (yet). It works like this:
2596+
1. Every incoming message is hashed according to the `filter_*` parameters.
2597+
2. The hash is looked up in the cache and the count is incremented by 1, and the TTL of the key is (re-)set to the timeout.
2598+
3. If the new count matches the threshold exactly, the message is forwarded. Otherwise it is dropped.
2599+
2600+
Please note: Even if a message is sent, any further identical messages are dropped, if the time difference to the last message is less than the timeout! The counter is not reset if the threshold is reached.
2601+
2602+
* * *
2603+
25032604
### Tor Nodes
25042605

25052606
#### Information:

docs/Developers-Guide.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -195,14 +195,13 @@ There are a bunch of environment variables which switch on/off some tests:
195195
* `INTELMQ_TEST_DATABASES`: databases such as postgres, elasticsearch, mongodb are not tested by default, set to 1 to test those bots. These tests need preparation, e.g. running databases with users and certain passwords etc. Have a look at the `.travis.yml` in IntelMQ's repository for steps to set databases up.
196196
* `INTELMQ_SKIP_INTERNET`: tests requiring internet connection will be skipped if this is set to 1.
197197
* `INTELMQ_SKIP_REDIS`: redis-related tests are ran by default, set this to 1 to skip those.
198-
* `INTELMQ_TEST_LOCAL_WEB`: tests which connect to local web servers or proxies are active when set to 1. Running these tests assume a local webserverserving certain files and/or proxy. Example preparation steps can be found in `.travis.yml` again.
199198
* `INTELMQ_TEST_EXOTIC`: some bots and tests require libraries which may not be available, those are skipped by default. To run them, set this to 1.
200199
* `INTELMQ_TEST_REDIS_PASSWORD`: Set this value to the password for the local redis database if needed.
201200

202201
For example, to run all tests you can use:
203202

204203
```bash
205-
INTELMQ_TEST_DATABASES=1 INTELMQ_TEST_LOCAL_WEB=1 INTELMQ_TEST_EXOTIC=1 nosetests3
204+
INTELMQ_TEST_DATABASES=1 INTELMQ_TEST_EXOTIC=1 nosetests3
206205
```
207206

208207
### Configuration test files

0 commit comments

Comments
 (0)