OWASP · RicoKomenda · Mar 31, 2026 · ottosulin · Apr 4, 2026 · ottosulin
diff --git a/1.0/en/0x10-C01-Training-Data-Integrity-and-Traceability.md b/1.0/en/0x10-C01-Training-Data-Integrity-and-Traceability.md
@@ -13,7 +13,7 @@ Training data must be sourced, handled, and maintained in a way that preserves o
 Maintain a verifiable inventory of all datasets, accept only trusted sources, and log every change for auditability.
 
 | # | Description | Level |
-|:--------:|---------------------------------------------------------------------------------------------------------------------|:---:|
+| :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: |
 | **1.1.1** | **Verify that** an up-to-date inventory of every training-data source (origin, responsible party, license, collection method, intended use constraints, and processing history) is maintained. | 1 |
 | **1.1.2** | **Verify that** training data processes exclude unnecessary features, attributes, or fields (e.g., unused metadata, sensitive PII, leaked test data). | 1 |
 | **1.1.3** | **Verify that** all dataset changes are subject to a logged approval workflow. | 1 |
@@ -26,14 +26,15 @@ Maintain a verifiable inventory of all datasets, accept only trusted sources, an
 Restrict access to training data, encrypt it at rest and in transit, and validate its integrity to prevent tampering, theft, or data poisoning.
 
 | # | Description | Level |
-|:--------:|---------------------------------------------------------------------------------------------------------------------|:---:|
+| :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: |
 | **1.2.1** | **Verify that** access controls protect training data storage and pipelines. | 1 |
 | **1.2.2** | **Verify that** all access to training data is logged, including user, time, and action. | 1 |
 | **1.2.3** | **Verify that** training datasets are encrypted in transit and at rest, using current recommended cryptographic algorithms and key management practices. | 1 |
 | **1.2.4** | **Verify that** cryptographic hashes or digital signatures are used to ensure data integrity during training data storage and transfer. | 2 |
 | **1.2.5** | **Verify that** automated integrity monitoring is applied to guard against unauthorized modifications or corruption of training data. | 2 |
 | **1.2.6** | **Verify that** obsolete training data is securely purged or anonymized. | 1 |
 | **1.2.7** | **Verify that** all training dataset versions are uniquely identified, stored immutably, and auditable to support rollback and forensic analysis. | 3 |
+| **1.2.8** | **Verify that** distributed training data collection endpoints authenticate to the central aggregation system using mutual authentication, and that data received from those endpoints is integrity-verified (e.g., via cryptographic checksums or digital signatures generated at source) before being accepted into training pipelines. | 2 |
 
 ---
 
@@ -42,7 +43,7 @@ Restrict access to training data, encrypt it at rest and in transit, and validat
 Ensure labeling and annotation processes are access-controlled, auditable, and protect sensitive information.
 
 | # | Description | Level |
-|:--------:|---------------------------------------------------------------------------------------------------------------------|:---:|
+| :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: |
 | **1.3.1** | **Verify that** labeling interfaces and platforms enforce access controls that restrict who can create, modify, or approve annotations. | 1 |
 | **1.3.2** | **Verify that** all labeling activities are recorded in audit logs, including the annotator identity, timestamp, and action performed. | 1 |
 | **1.3.3** | **Verify that** annotator identity metadata is exported and retained alongside the dataset so that every annotation or preference pair can be attributed to a specific, verified human annotator throughout the training pipeline. | 1 |
@@ -57,7 +58,7 @@ Ensure labeling and annotation processes are access-controlled, auditable, and p
 Combine automated validation, manual spot-checks, and logged remediation to guarantee dataset reliability.
 
 | # | Description | Level |
-|:--------:|---------------------------------------------------------------------------------------------------------------------|:---:|
+| :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: |
 | **1.4.1** | **Verify that** automated tests catch format errors and nulls on every ingest or significant data transformation. | 1 |
 | **1.4.2** | **Verify that** training and fine-tuning pipelines implement data integrity validation and poisoning detection techniques (e.g., statistical analysis, outlier detection, embedding analysis) to identify potential data poisoning or unintentional corruption in training data. | 2 |
 | **1.4.3** | **Verify that** automatically generated labels (e.g., via models or weak supervision) are subject to confidence thresholds and consistency checks to detect misleading or low-confidence labels. | 2 |
@@ -72,7 +73,7 @@ Combine automated validation, manual spot-checks, and logged remediation to guar
 Track the full journey of each dataset from source to model input for auditability and incident response.
 
 | # | Description | Level |
-|:--------:|---------------------------------------------------------------------------------------------------------------------|:---:|
+| :--------: | ------------------------------------------------------------------------------------------------------------------- | :---: |
 | **1.5.1** | **Verify that** the lineage of each dataset and its components, including all transformations, augmentations, and merges, is recorded and can be reconstructed. | 1 |
 | **1.5.2** | **Verify that** lineage records are immutable, securely stored, and accessible for audits. | 2 |
 | **1.5.3** | **Verify that** lineage tracking covers synthetic data generated via augmentation, synthesis, or privacy-preserving techniques. | 2 |

diff --git a/1.0/en/0x93-Appendix-D_AI_Security_Controls_Inventory.md b/1.0/en/0x93-Appendix-D_AI_Security_Controls_Inventory.md
@@ -293,6 +293,7 @@ Verify origin and authenticity, scan dependencies, and enforce integrity of mode
 | Approved source and internal registry enforcement | 6.4.1 |
 | Malicious layer and trojan trigger scanning | 6.1.2 |
 | External dataset poisoning assessment (fingerprinting, outlier detection) | 6.5.1 |
+| Distributed data collection endpoint mutual authentication and source integrity verification | 1.2.8 |
 | Copyright and PII detection in external datasets | 6.5.2 |
 | Dataset origin and lineage documentation | 6.5.3 |
 | Automated AI BOM generation and signing in CI | 6.7.2 |