Skip to content

Commit 23742b5

Browse files
authored
Merge pull request #975 from splunk/mirror_compressed_archive_to_s3
mirror attack_data archive to s3
2 parents ef67bda + 0591992 commit 23742b5

32 files changed

+121
-127113
lines changed

.github/validate_dataset_ymls.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
import datetime
2+
import pathlib
3+
import sys
4+
from enum import StrEnum, auto
5+
from uuid import UUID
6+
7+
from pydantic import BaseModel, Field, HttpUrl
8+
9+
10+
class Environment(StrEnum):
11+
attack_range = auto()
12+
13+
14+
class AttackDataYml(BaseModel):
15+
author: str = Field(..., min_length=5)
16+
id: UUID
17+
date: datetime.date
18+
description: str = Field(..., min_length=5)
19+
environment: Environment
20+
dataset: list[HttpUrl] = Field(..., min_length=1)
21+
sourcetypes: list[str] = Field(..., min_length=1)
22+
references: list[HttpUrl] = Field(..., min_length=1)
23+
24+
25+
# Get all of the yml files in the datasets folder
26+
datasets_root = pathlib.Path("datasets/")
27+
28+
29+
# We only permit certain filetypes to be present in this directory.
30+
# This is to avoid the inclusion of unsupported file types and to
31+
# assist in the validation of the YML files
32+
ALLOWED_SUFFIXES = [".yml", ".log", ".json"]
33+
SPECIAL_GIT_GILES = ".gitkeep"
34+
bad_files = [
35+
name
36+
for name in datasets_root.glob(r"**/*.*")
37+
if name.is_file()
38+
and not (name.suffix in ALLOWED_SUFFIXES or name.name == SPECIAL_GIT_GILES)
39+
]
40+
41+
if len(bad_files) > 0:
42+
print(
43+
f"Error, the following files were found in the {datasets_root} folder. Only files ending in {ALLOWED_SUFFIXES} or {SPECIAL_GIT_GILES} are allowed:"
44+
)
45+
print("\n".join([str(f) for f in bad_files]))
46+
sys.exit(1)
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
name: mirror-archive-on-merge-to-default-branch
2+
3+
on:
4+
push:
5+
branches:
6+
- master
7+
8+
jobs:
9+
mirror-archive:
10+
runs-on: ubuntu-latest
11+
env:
12+
BUCKET: attack-range-attack-data
13+
ATTACK_DATA_ARCHIVE_FILE: attack_data.tar.zstd
14+
steps:
15+
- name: Checkout Repo
16+
uses: actions/checkout@v4
17+
# We must EXPLICITLY specificy lfs: true. It defaults to false
18+
with:
19+
lfs: true
20+
21+
- name: Setup AWS CLI and Credentials
22+
uses: aws-actions/configure-aws-credentials@v4
23+
with:
24+
aws-access-key-id: ${{ secrets.ACCESS_KEY}}
25+
aws-secret-access-key: ${{ secrets.SECRET_ACCESS_KEY }}
26+
aws-region: us-west-2
27+
28+
- name: Create archive of ONLY the datasets folder
29+
run: |
30+
# The structure of the tar + zstd archive should mirror that of checking out the repo directly
31+
mkdir attack_data
32+
mv datasets/ attack_data/.
33+
34+
#Build some metadata about the archive for documentation purposes
35+
git rev-parse HEAD > attack_data/git_hash.txt
36+
date -u > attack_data/cache_build_date.txt
37+
38+
# Compress with number of threads equal to number of CPU cores.
39+
# Compression level 10 is a great compromise of speed and file size.
40+
# File size reductions are diminishing returns after this - determined experimentally.
41+
tar -c attack_data | zstd --compress -T0 -10 -o $ATTACK_DATA_ARCHIVE_FILE
42+
43+
- name: Upload Attack data archive file to S3 Bucket
44+
run: |
45+
aws s3 cp $ATTACK_DATA_ARCHIVE_FILE s3://$BUCKET/

datasets/attack_techniques/T1003.003/atomic_red_team/windows-sec-events.out renamed to datasets/attack_techniques/T1003.003/atomic_red_team/windows-sec-events.log

File renamed without changes.

datasets/attack_techniques/T1059/suspiciously_named_executables/suspiciously_named_executables.yaml renamed to datasets/attack_techniques/T1059/suspiciously_named_executables/suspiciously_named_executables.yml

File renamed without changes.

datasets/attack_techniques/T1078/aws_createloginprofile/aws_createloginprofile.yaml renamed to datasets/attack_techniques/T1078/aws_createloginprofile/aws_createloginprofile.yml

File renamed without changes.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:ac2b4ab4628203e0fe7ee7a52d77bc9451f094c94e09f21e3add1e0cf406c7da
3+
size 2369

datasets/attack_techniques/T1112/firewall_modify_delete/firewall-mod-delete.log.txt

Lines changed: 0 additions & 2 deletions
This file was deleted.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:143193f1b69751b409aaa63f719dde73210769a8c946051898b9e4a11e9a26a9
3+
size 13418872

datasets/attack_techniques/T1203/search_activity.txt

Lines changed: 0 additions & 74161 deletions
This file was deleted.

datasets/attack_techniques/T1489/linux_auditd_sysmon_service_stop.log/linux_auditd_sysmon_service_stop.log renamed to datasets/attack_techniques/T1489/linux_auditd_sysmon_service_stop/linux_auditd_sysmon_service_stop.log

File renamed without changes.

0 commit comments

Comments
 (0)