|
| 1 | +# Attack Data Validation Workflows |
| 2 | + |
| 3 | +This document explains the GitHub Actions workflows that automatically validate attack data YAML files on every pull request and push to ensure data quality and consistency. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The validation system consists of four main workflows that work together to ensure all attack data meets the required schema and quality standards: |
| 8 | + |
| 9 | +1. **validate-pr.yml** - Full validation on all PRs |
| 10 | +2. **validate-changed-files.yml** - Optimized validation for only changed files |
| 11 | +3. **validate-push.yml** - Validation on pushes to main branches |
| 12 | +4. **required-checks.yml** - Status checks and YAML linting |
| 13 | + |
| 14 | +## Workflows Description |
| 15 | + |
| 16 | +### 1. Validate Attack Data on PR (`validate-pr.yml`) |
| 17 | + |
| 18 | +**Triggers:** Pull requests to `master` or `main` branches |
| 19 | +**Purpose:** Comprehensive validation of all dataset YAML files |
| 20 | + |
| 21 | +**Features:** |
| 22 | +- Runs on PR open, synchronize, and reopen events |
| 23 | +- Validates all YAML files in the `datasets/` directory |
| 24 | +- Uses the validation script at `bin/validate.py` |
| 25 | +- Comments on PR with success/failure status |
| 26 | +- Only triggers when relevant files are changed |
| 27 | + |
| 28 | +**Path filters:** |
| 29 | +- `datasets/**/*.yml` |
| 30 | +- `datasets/**/*.yaml` |
| 31 | +- `bin/validate.py` |
| 32 | +- `bin/dataset_schema.json` |
| 33 | +- `bin/requirements.txt` |
| 34 | + |
| 35 | +### 2. Validate Changed Attack Data Files (`validate-changed-files.yml`) |
| 36 | + |
| 37 | +**Triggers:** Pull requests to `master` or `main` branches |
| 38 | +**Purpose:** Fast validation of only changed YAML files |
| 39 | + |
| 40 | +**Features:** |
| 41 | +- Optimized for performance - only validates changed files |
| 42 | +- Uses `tj-actions/changed-files` to detect modifications |
| 43 | +- Provides detailed feedback on which files passed/failed |
| 44 | +- Automatically skips if no YAML files were changed |
| 45 | +- Comments on PR with detailed results |
| 46 | + |
| 47 | +**Benefits:** |
| 48 | +- Faster execution for large repositories |
| 49 | +- Clear visibility into which specific files have issues |
| 50 | +- Reduces CI/CD time for PRs with few changes |
| 51 | + |
| 52 | +### 3. Validate Attack Data on Push (`validate-push.yml`) |
| 53 | + |
| 54 | +**Triggers:** Pushes to `master` or `main` branches |
| 55 | +**Purpose:** Safety net to catch validation failures that reach main branches |
| 56 | + |
| 57 | +**Features:** |
| 58 | +- Validates all dataset files after merge |
| 59 | +- Creates GitHub issues automatically if validation fails |
| 60 | +- Provides detailed error reporting |
| 61 | +- Labels issues with appropriate tags for triage |
| 62 | + |
| 63 | +**Issue Creation:** |
| 64 | +- Creates issues labeled with `bug`, `validation-failure`, `high-priority` |
| 65 | +- Includes commit hash and workflow run links |
| 66 | +- Provides action items for resolution |
| 67 | + |
| 68 | +### 4. Required Status Checks (`required-checks.yml`) |
| 69 | + |
| 70 | +**Triggers:** Pull requests to `master` or `main` branches |
| 71 | +**Purpose:** Enforce validation requirements and provide additional checks |
| 72 | + |
| 73 | +**Features:** |
| 74 | +- Basic YAML syntax linting with `yamllint` |
| 75 | +- Status check requirement enforcement |
| 76 | +- Configuration for branch protection rules |
| 77 | + |
| 78 | +## Setup Instructions |
| 79 | + |
| 80 | +### 1. Branch Protection Rules |
| 81 | + |
| 82 | +To enforce these validations, configure branch protection rules in your GitHub repository: |
| 83 | + |
| 84 | +1. Go to **Settings** → **Branches** |
| 85 | +2. Add a rule for your main branch (`master` or `main`) |
| 86 | +3. Enable **Require status checks to pass before merging** |
| 87 | +4. Add these required status checks: |
| 88 | + - `validate-attack-data` (from validate-pr.yml) |
| 89 | + - `validate-changed-files` (from validate-changed-files.yml) |
| 90 | + - `validation-status` (from required-checks.yml) |
| 91 | + - `yaml-lint` (from required-checks.yml) |
| 92 | + |
| 93 | +### 2. Repository Secrets |
| 94 | + |
| 95 | +No additional secrets are required for the validation workflows. They use the default `GITHUB_TOKEN` for commenting on PRs and creating issues. |
| 96 | + |
| 97 | +### 3. Dependencies |
| 98 | + |
| 99 | +The workflows automatically install Python dependencies from `bin/requirements.txt`: |
| 100 | +- `pyyaml` |
| 101 | +- `jsonschema` |
| 102 | +- Other dependencies as needed |
| 103 | + |
| 104 | +## Validation Rules |
| 105 | + |
| 106 | +The validation process checks: |
| 107 | + |
| 108 | +### Schema Validation |
| 109 | +- All YAML files must conform to the JSON schema in `bin/dataset_schema.json` |
| 110 | +- Required fields must be present and properly formatted |
| 111 | +- Data types must match schema specifications |
| 112 | + |
| 113 | +### Custom Validations |
| 114 | +- **UUID Format**: The `id` field must be a valid UUID |
| 115 | +- **Date Format**: The `date` field must follow YYYY-MM-DD format |
| 116 | +- **File Naming**: Template files and files with 'old' in the name are excluded |
| 117 | + |
| 118 | +### YAML Syntax |
| 119 | +- Valid YAML syntax |
| 120 | +- Proper indentation (2 spaces) |
| 121 | +- Line length limits (120 characters) |
| 122 | +- Consistent formatting |
| 123 | + |
| 124 | +## Workflow Outputs |
| 125 | + |
| 126 | +### Success Scenarios |
| 127 | +- ✅ PR comments indicating successful validation |
| 128 | +- ✅ Green status checks in PR interface |
| 129 | +- ✅ Detailed file-by-file validation results |
| 130 | + |
| 131 | +### Failure Scenarios |
| 132 | +- ❌ PR comments with error details |
| 133 | +- ❌ Failed status checks blocking merge |
| 134 | +- 🚨 Automatic issue creation for main branch failures |
| 135 | +- 📝 Detailed error logs in workflow runs |
| 136 | + |
| 137 | +## Troubleshooting |
| 138 | + |
| 139 | +### Common Issues |
| 140 | + |
| 141 | +1. **Schema Validation Errors** |
| 142 | + - Check that all required fields are present |
| 143 | + - Verify field data types match schema |
| 144 | + - Ensure proper YAML formatting |
| 145 | + |
| 146 | +2. **UUID Format Errors** |
| 147 | + - Generate valid UUIDs using tools like `uuidgen` |
| 148 | + - Ensure no extra characters or formatting |
| 149 | + |
| 150 | +3. **Date Format Errors** |
| 151 | + - Use YYYY-MM-DD format (e.g., 2024-01-15) |
| 152 | + - Avoid time components or other formats |
| 153 | + |
| 154 | +4. **YAML Syntax Errors** |
| 155 | + - Use a YAML validator or linter |
| 156 | + - Check indentation (use spaces, not tabs) |
| 157 | + - Verify string quoting when needed |
| 158 | + |
| 159 | +### Debugging Workflows |
| 160 | + |
| 161 | +1. **Check Workflow Logs** |
| 162 | + - Go to Actions tab in GitHub |
| 163 | + - Click on the failed workflow run |
| 164 | + - Review step-by-step execution logs |
| 165 | + |
| 166 | +2. **Local Testing** |
| 167 | + ```bash |
| 168 | + cd bin |
| 169 | + python validate.py ../datasets |
| 170 | + ``` |
| 171 | + |
| 172 | +3. **File-Specific Testing** |
| 173 | + ```bash |
| 174 | + cd bin |
| 175 | + python validate.py path/to/specific/file.yml |
| 176 | + ``` |
| 177 | + |
| 178 | +## Best Practices |
| 179 | + |
| 180 | +### For Contributors |
| 181 | + |
| 182 | +1. **Test Locally First** |
| 183 | + - Run validation script before pushing |
| 184 | + - Use the same schema and validation rules |
| 185 | + |
| 186 | +2. **Keep Changes Small** |
| 187 | + - Smaller PRs are easier to validate and review |
| 188 | + - Changed-files workflow provides faster feedback |
| 189 | + |
| 190 | +3. **Follow Schema Requirements** |
| 191 | + - Always include required fields |
| 192 | + - Use proper data types and formats |
| 193 | + - Reference schema documentation |
| 194 | + |
| 195 | +### For Maintainers |
| 196 | + |
| 197 | +1. **Monitor Validation Health** |
| 198 | + - Review failed workflows regularly |
| 199 | + - Update schema as requirements evolve |
| 200 | + - Keep dependencies updated |
| 201 | + |
| 202 | +2. **Branch Protection** |
| 203 | + - Enforce status checks on main branches |
| 204 | + - Require reviews in addition to validation |
| 205 | + - Consider additional quality gates |
| 206 | + |
| 207 | +3. **Issue Triage** |
| 208 | + - Address validation failures on main branches quickly |
| 209 | + - Create hotfix procedures for critical issues |
| 210 | + - Maintain schema documentation |
| 211 | + |
| 212 | +## Files Structure |
| 213 | + |
| 214 | +``` |
| 215 | +.github/ |
| 216 | +├── workflows/ |
| 217 | +│ ├── validate-pr.yml # Full PR validation |
| 218 | +│ ├── validate-changed-files.yml # Changed files validation |
| 219 | +│ ├── validate-push.yml # Push validation |
| 220 | +│ └── required-checks.yml # Status checks & linting |
| 221 | +└── VALIDATION_WORKFLOWS.md # This documentation |
| 222 | +
|
| 223 | +bin/ |
| 224 | +├── validate.py # Main validation script |
| 225 | +├── dataset_schema.json # JSON schema definition |
| 226 | +└── requirements.txt # Python dependencies |
| 227 | +
|
| 228 | +datasets/ # Attack data files |
| 229 | +└── **/*.yml, **/*.yaml # Files to validate |
| 230 | +``` |
| 231 | + |
| 232 | +## Support |
| 233 | + |
| 234 | +For issues with validation workflows: |
| 235 | + |
| 236 | +1. Check this documentation first |
| 237 | +2. Review workflow logs in GitHub Actions |
| 238 | +3. Test validation locally using the `validate.py` script |
| 239 | +4. Create an issue if problems persist |
| 240 | + |
| 241 | +For schema-related questions: |
| 242 | +- Review `bin/dataset_schema.json` |
| 243 | +- Check existing valid examples in `datasets/` |
| 244 | +- Refer to attack data documentation |
| 245 | + |
0 commit comments