Skip to content

Commit 1deb3ed

Browse files
authored
Merge pull request #16 from fkie-cad/issue-13-add-datasets
Minor tweaks to related work
2 parents 1801ff5 + 3aaa50b commit 1deb3ed

File tree

1 file changed

+49
-48
lines changed

1 file changed

+49
-48
lines changed

content/related_work.md

Lines changed: 49 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -143,12 +143,12 @@ Referenced datasets:
143143

144144
Referenced collections:
145145
- CAIDA
146-
- [Digital Corpora Database](#digital-corpora-2023)
147-
- [IMPACT](#impact-2021)
148-
- [Malware Traffic Analysis](#malware-traffic-analysis-2024)
149-
- [NETRESEC](#netresec-2024)
150-
- [SecRepo](#secrepo---samples-of-security-related-data-2020)
151-
- [The Honeypot Project](#the-honeynet-project)
146+
- [Digital Corpora Database](#digital-corpora)
147+
- [IMPACT](#impact)
148+
- [Malware Traffic Analysis](#malware-traffic-analysis)
149+
- [NETRESEC](#netresec)
150+
- [SecRepo](#secrepo---samples-of-security-related-data)
151+
- [The Honeypot Project](#the-honeynet-project-challenges)
152152

153153
### A Survey of Network-based Intrusion Detection Data Sets (2019)
154154

@@ -200,16 +200,16 @@ Referenced collections:
200200
- Contagiodump
201201
- covert.io
202202
- DEFCON CTF archive
203-
- [IMPACT](#impact-2021)
204-
- [Internet Traffic Archive](#the-internet-traffic-archive-2008)
203+
- [IMPACT](#impact)
204+
- [Internet Traffic Archive](#the-internet-traffic-archive)
205205
- Kaggle
206-
- [Malware Traffic Analysis](#malware-traffic-analysis-2024)
206+
- [Malware Traffic Analysis](#malware-traffic-analysis)
207207
- Mid-Atlantic CCDC
208208
- MAWILab
209-
- [NETRESEC](#netresec-2024)
209+
- [NETRESEC](#netresec)
210210
- OpenML
211211
- RIPE Data Repository
212-
- [SecRepo](#secrepo---samples-of-security-related-data-2020)
212+
- [SecRepo](#secrepo---samples-of-security-related-data)
213213
- Simple Web
214214
- UMass Trace Repository
215215
- Vast Challenges
@@ -242,7 +242,8 @@ Zheng, M., Robbins, H., Chai, Z., Thapa, P., & Moore, T. (2018). Cybersecurity r
242242
```
243243

244244
Tries to construct a taxonomy of the types of created and shared cybersecurity data(sets) by inspecting 965 related papers.
245-
Does not provide an actual list, rather aims to describe general observations, like the fact that only 6% of the surveyed papers created a dataset *and* made it publicly available.
245+
Does not provide an actual list, rather aims to describe general observations, like the fact that only 6% of the surveyed papers created a dataset
246+
*and* made it publicly available.
246247

247248
### A survey of deep learning-based network anomaly detection (2017)
248249

@@ -310,25 +311,16 @@ Referenced collections:
310311

311312
`Last updated` refers to the last time a new entry was added to the collection.
312313

313-
### Malware Traffic Analysis
314-
315-
```
316-
https://www.malware-traffic-analysis.net/
317-
(accessed 19.02.2024, last updated 14.02.2024)
318-
```
319-
320-
Various pcaps and malware samples stemming from individual campaigns or attack instances, but without any overall categorization or even overview.
321-
They are available as blog posts named something like "DarkGate activity" or "GootLoader Infection", which each one listing some references and download links to any relevant files.
322-
323-
### NETRESEC
314+
### Awesome Cybersecurity Datasets
324315

325316
```
326-
https://www.netresec.com/?page=PcapFiles
327-
(accessed 19.02.2024, last updated 04.01.2024)
317+
https://github.com/shramos/Awesome-Cybersecurity-Datasets
318+
(accessed 18.02.2024, last updated 23.01.2021)
328319
```
329320

330-
A large collection of pcap files and other repositories which are hosting pcaps themselves.
331-
They are categorized into CDX, Malware Traffic, Network Forensics, SCADA/ICS, CTF, Packet Injection/Man-on-the-Side, and Uncategorized.
321+
A "curated" personal collection of various cybersecurity-related datasets or collections, grouped into several categories such as "Network", "Software" or "Fraud".
322+
Each entry is described in only one or two sentences, and most datasets are not, or only partially, suitable for IDS research.
323+
The list is somewhat deprecated and does especially lack meaningful host-based datasets.
332324

333325
### Digital Corpora
334326

@@ -341,17 +333,6 @@ A collection of datasets mostly designed for the use in forensics education.
341333
It consists of various disk images, memory dumps and pcaps, as well as a bunch of benign and malicious files.
342334
It does not seem to contain actual log data.
343335

344-
### Awesome Cybersecurity Datasets
345-
346-
```
347-
https://github.com/shramos/Awesome-Cybersecurity-Datasets
348-
(accessed 18.02.2024, last updated 23.01.2021)
349-
```
350-
351-
A "curated" personal collection of various cybersecurity-related datasets or collections, grouped into several categories such as "Network", "Software" or "Fraud".
352-
Each entry is described in only one or two sentences, and most datasets are not, or only partially, suitable for IDS research.
353-
The list is somewhat deprecated and does especially lack meaningful host-based datasets.
354-
355336
### IMPACT
356337

357338
```
@@ -363,6 +344,36 @@ The "Information Marketplace for Policy and Analysis of Cyber-Risk and Trust" (I
363344
These are for the most part made up of network related files (pcaps and DNS logs) from a wide variety of scenarios (CTF events, IoT, corpo networks, etc.), as well as some miscellaneous things like network shapefiles.
364345
55 of these datasets were created by IMPACT, 15 are external (mostly CAIDA). Many datasets require prior authorization to access them.
365346

347+
### Malware Traffic Analysis
348+
349+
```
350+
https://www.malware-traffic-analysis.net/
351+
(accessed 19.02.2024, last updated 14.02.2024)
352+
```
353+
354+
Various pcaps and malware samples stemming from individual campaigns or attack instances, but without any overall categorization or even overview.
355+
They are available as blog posts named something like "DarkGate activity" or "GootLoader Infection", which each one listing some references and download links to any relevant files.
356+
357+
### NETRESEC
358+
359+
```
360+
https://www.netresec.com/?page=PcapFiles
361+
(accessed 19.02.2024, last updated 04.01.2024)
362+
```
363+
364+
A large collection of pcap files and other repositories which are hosting pcaps themselves.
365+
They are categorized into CDX, Malware Traffic, Network Forensics, SCADA/ICS, CTF, Packet Injection/Man-on-the-Side, and Uncategorized.
366+
367+
### Public Security Log Sharing Site
368+
369+
```
370+
https://log-sharing.dreamhosters.com/
371+
(accessed 18.02.2024, last updated 11.08.2010)
372+
```
373+
374+
A collection which started as an effort to collect various log samples, but seems to have been discontinued after operating for about one year.
375+
Currently, it consists of nine entries containing Linux syslogs, firewall logs, apache logs, and web proxy logs.
376+
366377
### SecRepo - Samples of Security Related Data
367378

368379
```
@@ -384,16 +395,6 @@ https://www.honeynet.org/challenges/
384395
A collection of 14 forensic challenges related to pcaps, malware and log files.
385396
However, most resources, except for the two newest challenges, are no longer available.
386397

387-
### Public Security Log Sharing Site
388-
389-
```
390-
https://log-sharing.dreamhosters.com/
391-
(accessed 18.02.2024, last updated 11.08.2010)
392-
```
393-
394-
A collection which started as an effort to collect various log samples, but seems to have been discontinued after operating for about one year.
395-
Currently, it consists of nine entries containing Linux syslogs, firewall logs, apache logs, and web proxy logs.
396-
397398
### The Internet Traffic Archive
398399

399400
```

0 commit comments

Comments
 (0)