Skip to content

Commit f2d09f3

Browse files
Schema validation ensuring data schema path matches what is set for the data ref
1 parent 791619d commit f2d09f3

19 files changed

+3083
-1
lines changed

scripts/config/vale/styles/config/vocabularies/words/accept.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ Python
7272
quotingType
7373
rawContent
7474
repo
75+
Rollout
7576
rootDir
7677
sample_event_markdown
7778
sample_service_markdown

scripts/config/vale/vale.ini

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,6 @@ Vocab = words
66

77
[*.md]
88
BasedOnStyles = Vale
9+
10+
[src/changelog/agents.md]
11+
BasedOnStyles =

scripts/githooks/check-todos.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ EXCLUDED_DIRS=(
4141
"docs/"
4242
"node_modules/"
4343
".devcontainer/"
44+
"src/changelog"
4445
)
4546

4647

Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
# Schema Dataschema Consistency Validation
2+
3+
**Status**: 🚧 In Development
4+
**Last Updated**: 2025-11-13 15:20 GMT
5+
6+
## What It Does
7+
8+
This validation tool ensures consistency in CloudEvents event schemas by checking that the `dataschema` const value matches the `data` $ref value.
9+
10+
In CloudEvents schemas, these two properties should always reference the same schema file:
11+
12+
```yaml
13+
dataschema:
14+
type: string
15+
const: ../data/digital-letter-base-data.schema.yaml # Must match
16+
data:
17+
$ref: ../data/digital-letter-base-data.schema.yaml # Must match
18+
```
19+
20+
The validator automatically detects mismatches and reports them clearly.
21+
22+
## Why It Matters
23+
24+
Mismatched `dataschema` and `data` references can cause:
25+
26+
- Runtime validation failures
27+
- Confusing error messages
28+
- Incorrect schema documentation
29+
- Integration issues with event consumers
30+
31+
This validation catches these issues early in development.
32+
33+
## Quick Start
34+
35+
### Validate Your Schemas
36+
37+
```bash
38+
# Validate all event schemas in current directory
39+
npm run validate:dataschema-consistency
40+
41+
# Or use make
42+
make validate-dataschema-consistency
43+
44+
# Validate specific directory
45+
npm run validate:dataschema-consistency -- /path/to/schemas
46+
```
47+
48+
### Expected Output
49+
50+
**When all schemas are valid**:
51+
52+
```plaintext
53+
✓ Validating event schemas...
54+
✓ Found 22 schema files
55+
✓ All schemas valid - no mismatches detected
56+
```
57+
58+
**When mismatches are found**:
59+
60+
```plaintext
61+
✗ Validation failed for 2 schemas:
62+
63+
File: uk.nhs.notify.digital.letters.event.v1.schema.yaml
64+
Error: dataschema const does not match data $ref
65+
Expected: ../data/schema-a.yaml
66+
Actual: ../data/schema-b.yaml
67+
68+
File: another-event.v1.schema.yaml
69+
Error: dataschema const does not match data $ref
70+
Expected: ../data/correct-schema.yaml
71+
Actual: ../data/wrong-schema.yaml
72+
73+
✗ 2 validation errors found
74+
```
75+
76+
## Usage
77+
78+
### In Development
79+
80+
Run validation before committing schema changes:
81+
82+
```bash
83+
# Add to your workflow
84+
git add src/cloudevents/domains/*/events/*.yaml
85+
make validate-dataschema-consistency
86+
git commit -m "feat: add new event schema"
87+
```
88+
89+
### In CI/CD
90+
91+
The validation runs automatically in the CI/CD pipeline:
92+
93+
- **Pull Requests**: Validates all schema files
94+
- **Main Branch**: Runs on every commit
95+
- **Failure**: Pipeline fails if mismatches detected
96+
97+
### Programmatic Use
98+
99+
Use the validation function directly in your code:
100+
101+
```typescript
102+
import { validateDataschemaConsistency } from './validator-lib';
103+
104+
const schema = {
105+
properties: {
106+
dataschema: {
107+
const: '../data/schema.yaml'
108+
},
109+
data: {
110+
$ref: '../data/schema.yaml'
111+
}
112+
}
113+
};
114+
115+
const result = validateDataschemaConsistency(schema);
116+
117+
if (!result.valid) {
118+
console.error(result.errorMessage);
119+
console.log(`Expected: ${result.dataschemaValue}`);
120+
console.log(`Actual: ${result.dataRefValue}`);
121+
}
122+
```
123+
124+
## What Gets Validated
125+
126+
### Validated Schemas
127+
128+
The tool checks schemas that have BOTH:
129+
130+
- A `properties.dataschema.const` value
131+
- A `properties.data.$ref` value
132+
133+
### Skipped Schemas
134+
135+
Schemas are automatically skipped (no error) if they:
136+
137+
- Don't have a `dataschema` property
138+
- Don't have a `data` property
139+
- Are not CloudEvents event schemas
140+
141+
### Validation Rules
142+
143+
1. **Exact Match**: Values must match exactly (case-sensitive)
144+
2. **No Whitespace**: Trailing/leading spaces cause validation failure
145+
3. **String Only**: Both values must be strings
146+
4. **Not Null**: Null or undefined values fail validation
147+
148+
## Common Issues
149+
150+
### Mismatch Detected
151+
152+
**Problem**: Validator reports mismatch
153+
154+
**Solution**: Update schema to use consistent reference:
155+
156+
```yaml
157+
# Before (incorrect)
158+
dataschema:
159+
const: ../data/old-schema.yaml
160+
data:
161+
$ref: ../data/new-schema.yaml
162+
163+
# After (correct)
164+
dataschema:
165+
const: ../data/new-schema.yaml
166+
data:
167+
$ref: ../data/new-schema.yaml
168+
```
169+
170+
### Case Sensitivity
171+
172+
**Problem**: `Schema.yaml` vs `schema.yaml`
173+
174+
**Solution**: Ensure exact case match:
175+
176+
```yaml
177+
# Both must use same case
178+
dataschema:
179+
const: ../data/Schema.yaml # Capital S
180+
data:
181+
$ref: ../data/Schema.yaml # Capital S
182+
```
183+
184+
### Whitespace Issues
185+
186+
**Problem**: Hidden spaces cause validation failure
187+
188+
**Solution**: Remove trailing whitespace:
189+
190+
```yaml
191+
# Before (incorrect - space after .yaml)
192+
dataschema:
193+
const: ../data/schema.yaml
194+
195+
# After (correct)
196+
dataschema:
197+
const: ../data/schema.yaml
198+
```
199+
200+
## Where to Get Help
201+
202+
- **Documentation**: See `/src/changelog/2025-11-13/001-01-request-*.md` for background
203+
- **Requirements**: See `/src/changelog/2025-11-13/001-03-requirements-*.md` for detailed specs
204+
- **Issues**: Report problems in GitHub Issues
205+
- **Questions**: Ask in team channels
206+
207+
## Development Status
208+
209+
### Current Status: 🚧 In Development
210+
211+
- ✅ Validation logic implemented and tested
212+
- ⏳ CLI script in progress
213+
- ⏳ CI/CD integration pending
214+
- ⏳ Documentation being refined
215+
216+
### Upcoming
217+
218+
- Full CI/CD pipeline integration
219+
- Additional validation rules if needed
220+
- Performance optimizations
221+
- Enhanced error messages
222+
223+
---
224+
225+
**Note**: This document will be updated as the feature develops. Check the "Last Updated" timestamp above.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Schema Consistency Validation Enhancement
2+
3+
**Date**: 2025-11-13 14:31 GMT
4+
**Branch**: rossbugginsnhs/2025-11-13/schema-restrictions
5+
6+
## Objective
7+
8+
Enhance the CloudEvents validator at `src/cloudevents/tools/validator` to enforce consistency between `dataschema` const values and `data` $ref values across all event schemas.
9+
10+
### Current Pattern in Event Schemas
11+
12+
All event schemas follow this pattern:
13+
14+
```yaml
15+
dataschema:
16+
type: string
17+
const: ../data/digital-letter-base-data.schema.yaml
18+
description: Canonical URI of the example event's data schema.
19+
data:
20+
$ref: ../data/digital-letter-base-data.schema.yaml
21+
description: Example payload wrapper containing notify-payload.
22+
```
23+
24+
### Challenge
25+
26+
- `dataschema.const` is a literal value that validates instance data
27+
- `data.$ref` is schema metadata that tells validators which schema to use
28+
- JSON Schema has no built-in way to cross-reference between literal values and schema keywords
29+
30+
### Proposed Solution
31+
32+
Add validation to the existing validator tool at `src/cloudevents/tools/validator` to:
33+
34+
1. Parse event schema files
35+
2. Extract the `dataschema.const` value
36+
3. Extract the `data.$ref` value
37+
4. Fail validation if they don't match
38+
39+
This would be integrated into the existing validation tooling and CI/CD pipeline to ensure consistency across all 22+ event schemas that follow this pattern.
40+
41+
## Next Steps
42+
43+
1. Create a validation function in `validator-lib.ts`
44+
2. Add a standalone validation script or extend existing validator
45+
3. Add tests for the new validation
46+
4. Integrate into CI/CD pipeline
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Implementation Plan: Schema Consistency Validation
2+
3+
**Date**: 2025-11-13 14:38 GMT
4+
**Branch**: rossbugginsnhs/2025-11-13/schema-restrictions
5+
**Related Request**: [001-01-request-schema-dataschema-ref-consistency.md](./001-01-request-schema-dataschema-ref-consistency.md)## Overview
6+
7+
Add validation to ensure that in CloudEvents event schemas, the `dataschema` const value matches the `data` $ref value.
8+
9+
## Implementation Steps
10+
11+
### 1. Create New Validation Library
12+
13+
Create a new library file `dataschema-consistency-lib.ts` with validation function:
14+
15+
- Export `validateDataschemaConsistency(schemaObject)` function
16+
- Export `DataschemaConsistencyResult` interface type
17+
- Checks if the schema has both `properties.dataschema.const` and `properties.data.$ref`
18+
- Returns validation result with details if they don't match
19+
- Returns success if they match or if the pattern doesn't apply
20+
21+
**Location**: `src/cloudevents/tools/validator/dataschema-consistency-lib.ts`
22+
23+
**Rationale**: Create new file instead of modifying existing validator-lib.ts to keep changes isolated and avoid impacting existing validation functionality.
24+
25+
### 2. Create Standalone Validation Script
26+
27+
Create a script that:
28+
29+
- Scans all event schema files in specified directories
30+
- Validates each schema for dataschema/data consistency
31+
- Reports all inconsistencies
32+
- Exits with error code if any inconsistencies found
33+
- Imports from the new dataschema-consistency-lib.ts
34+
35+
**Location**: `src/cloudevents/tools/validator/validate-dataschema-consistency.ts`
36+
37+
### 3. Add Unit Tests
38+
39+
Create comprehensive tests for:
40+
41+
- Matching dataschema and data values (should pass)
42+
- Mismatched values (should fail with clear message)
43+
- Schemas without dataschema property (should skip)
44+
- Schemas without data property (should skip)
45+
- Edge cases (null, undefined, different path formats)
46+
47+
**Location**: `src/cloudevents/tools/validator/__tests__/validate-dataschema-consistency.test.ts`
48+
49+
**Note**: Tests will import from `dataschema-consistency-lib` (new file), not from existing validator-lib.
50+
51+
### 4. Update Makefile
52+
53+
Add a new make target to run the consistency validation:
54+
55+
```makefile
56+
validate-dataschema-consistency:
57+
npm run validate:dataschema-consistency
58+
```
59+
60+
**Location**: `src/cloudevents/Makefile`
61+
62+
### 5. Update package.json
63+
64+
Add script to run the consistency validator:
65+
66+
```json
67+
"validate:dataschema-consistency": "tsx tools/validator/validate-dataschema-consistency.ts"
68+
```
69+
70+
**Location**: `src/cloudevents/package.json`
71+
72+
### 6. Integrate into CI/CD Pipeline
73+
74+
Add validation step to the existing validation workflow or create new step.
75+
76+
**Location**: `.github/workflows/` or relevant CI/CD configuration
77+
78+
## Success Criteria
79+
80+
- [ ] Validation function correctly identifies matching dataschema/data pairs
81+
- [ ] Validation function correctly identifies mismatches with helpful error messages
82+
- [ ] All 22+ existing event schemas pass validation
83+
- [ ] Unit tests achieve 100% code coverage for new functions
84+
- [ ] Script can be run standalone via `make` or `npm run`
85+
- [ ] Integration into CI/CD prevents merging schemas with inconsistencies
86+
- [ ] Documentation updated if needed
87+
88+
## Testing Strategy
89+
90+
1. Run against all existing event schemas to ensure they currently pass
91+
2. Create test schemas with intentional mismatches to verify detection
92+
3. Test edge cases (missing properties, null values, etc.)
93+
4. Verify error messages are clear and actionable
94+
95+
## Rollout Plan
96+
97+
1. Implement and test locally
98+
2. Run against all existing schemas to verify current state
99+
3. Add to CI/CD pipeline as warning initially
100+
4. Monitor for false positives
101+
5. Convert to blocking validation once confident

0 commit comments

Comments
 (0)