|
| 1 | +# JSON Schema Library Utilization Analysis |
| 2 | + |
| 3 | +## Current State |
| 4 | + |
| 5 | +### Backend (Python) |
| 6 | +**Library Available:** `jsonschema` (Draft202012Validator) |
| 7 | +**Currently Used In:** |
| 8 | +- ✅ `src/lambda/update_configuration/index.py:88-135` - Validates extractionSchema on upload |
| 9 | +- ✅ `lib/idp_common_pkg/idp_common/extraction/agentic_idp.py` - Some validation |
| 10 | + |
| 11 | +**Schema Definitions Available:** |
| 12 | +- ✅ `EXTRACTION_CLASS_SCHEMA` in `schema_definition.py` |
| 13 | +- ✅ `EXTRACTION_SCHEMA_ARRAY` (referenced but need to verify) |
| 14 | +- ✅ Comprehensive schema with AWS extensions defined |
| 15 | + |
| 16 | +### Frontend (JavaScript) |
| 17 | +**Library Available:** `ajv` + `ajv-formats` |
| 18 | +**Currently Used In:** |
| 19 | +- ✅ `src/ui/src/hooks/useSchemaValidation.js` - Custom validation logic |
| 20 | +- ⚠️ Duplicates validation that could use AJV's built-in features |
| 21 | + |
| 22 | +## Opportunities for Improvement |
| 23 | + |
| 24 | +### HIGH PRIORITY: Backend Migration Validation |
| 25 | + |
| 26 | +**File:** `lib/idp_common_pkg/idp_common/config_schema/migration.py` |
| 27 | +**Issue:** NO validation after migration |
| 28 | +**Risk:** Can produce invalid schemas that break downstream |
| 29 | + |
| 30 | +**Current:** |
| 31 | +```python |
| 32 | +def migrate_legacy_to_schema(legacy_classes): |
| 33 | + # ... migration logic ... |
| 34 | + return _convert_classes_to_json_schema(migrated_classes) |
| 35 | + # NO VALIDATION! |
| 36 | +``` |
| 37 | + |
| 38 | +**Proposed:** |
| 39 | +```python |
| 40 | +def migrate_legacy_to_schema(legacy_classes, validate=True): |
| 41 | + result = _convert_classes_to_json_schema(migrated_classes) |
| 42 | + |
| 43 | + if validate: |
| 44 | + from jsonschema import Draft202012Validator, ValidationError |
| 45 | + from .schema_definition import EXTRACTION_CLASS_SCHEMA |
| 46 | + |
| 47 | + validator = Draft202012Validator(EXTRACTION_CLASS_SCHEMA) |
| 48 | + try: |
| 49 | + if isinstance(result, list): |
| 50 | + for schema in result: |
| 51 | + validator.validate(schema) |
| 52 | + else: |
| 53 | + validator.validate(result) |
| 54 | + except ValidationError as e: |
| 55 | + raise ValueError(f"Migration produced invalid schema: {e.message}") |
| 56 | + |
| 57 | + return result |
| 58 | +``` |
| 59 | + |
| 60 | +**Impact:** Prevents invalid schemas from being created |
| 61 | + |
| 62 | +--- |
| 63 | + |
| 64 | +### HIGH PRIORITY: Validate AWS Extensions |
| 65 | + |
| 66 | +**File:** `lib/idp_common_pkg/idp_common/config_schema/migration.py:56-67` |
| 67 | +**Issue:** No validation of AWS extension values |
| 68 | +**Risk:** Invalid evaluation_method, confidence_threshold can be stored |
| 69 | + |
| 70 | +**Current:** |
| 71 | +```python |
| 72 | +if "evaluation_method" in attr: |
| 73 | + schema_attr["x-aws-idp-evaluation-method"] = attr["evaluation_method"] |
| 74 | + # No validation! |
| 75 | + |
| 76 | +if "confidence_threshold" in attr: |
| 77 | + threshold = attr["confidence_threshold"] |
| 78 | + # Weak string-to-float conversion |
| 79 | +``` |
| 80 | + |
| 81 | +**Proposed:** Use schema definition to validate |
| 82 | +```python |
| 83 | +def _validate_aws_extensions(extensions: Dict[str, Any]) -> None: |
| 84 | + """Validate AWS IDP extensions against schema.""" |
| 85 | + from jsonschema import Draft202012Validator, ValidationError |
| 86 | + |
| 87 | + # Extract AWS extension schema from EXTRACTION_CLASS_SCHEMA |
| 88 | + aws_extension_schema = { |
| 89 | + "type": "object", |
| 90 | + "properties": { |
| 91 | + "x-aws-idp-evaluation-method": { |
| 92 | + "type": "string", |
| 93 | + "enum": ["EXACT", "NUMERIC_EXACT", "FUZZY", "SEMANTIC"] |
| 94 | + }, |
| 95 | + "x-aws-idp-confidence-threshold": { |
| 96 | + "type": "number", |
| 97 | + "minimum": 0, |
| 98 | + "maximum": 1 |
| 99 | + } |
| 100 | + } |
| 101 | + } |
| 102 | + |
| 103 | + validator = Draft202012Validator(aws_extension_schema) |
| 104 | + try: |
| 105 | + validator.validate(extensions) |
| 106 | + except ValidationError as e: |
| 107 | + raise ValueError(f"Invalid AWS extension: {e.message}") |
| 108 | +``` |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +### MEDIUM PRIORITY: Frontend - Use AJV for All Validation |
| 113 | + |
| 114 | +**File:** `src/ui/src/hooks/useSchemaValidation.js:67-133` |
| 115 | +**Issue:** Manual validation logic duplicates what AJV can do |
| 116 | + |
| 117 | +**Current:** Manual checks for minLength/maxLength, etc. |
| 118 | +```javascript |
| 119 | +if (attribute.type === 'string') { |
| 120 | + if (attribute.minLength !== undefined && attribute.maxLength !== undefined) { |
| 121 | + if (attribute.minLength > attribute.maxLength) { |
| 122 | + errors.push({ path: '/minLength', message: 'minLength cannot be greater than maxLength' }); |
| 123 | + } |
| 124 | + } |
| 125 | +} |
| 126 | +``` |
| 127 | + |
| 128 | +**Proposed:** Define meta-schema and use AJV |
| 129 | +```javascript |
| 130 | +const ATTRIBUTE_META_SCHEMA = { |
| 131 | + type: 'object', |
| 132 | + properties: { |
| 133 | + type: { enum: ['string', 'number', 'integer', 'boolean', 'object', 'array', 'null'] }, |
| 134 | + // AJV will validate all JSON Schema keywords automatically |
| 135 | + }, |
| 136 | + // Add custom formats for AWS extensions |
| 137 | + if: { properties: { type: { const: 'string' } } }, |
| 138 | + then: { |
| 139 | + properties: { |
| 140 | + minLength: { type: 'integer', minimum: 0 }, |
| 141 | + maxLength: { type: 'integer', minimum: 0 } |
| 142 | + }, |
| 143 | + // AJV can validate this relationship: |
| 144 | + if: { |
| 145 | + required: ['minLength', 'maxLength'] |
| 146 | + }, |
| 147 | + then: { |
| 148 | + // Custom keyword or use ajv-keywords plugin |
| 149 | + } |
| 150 | + } |
| 151 | +}; |
| 152 | + |
| 153 | +const validateAttribute = useCallback((attribute) => { |
| 154 | + const validate = ajv.compile(ATTRIBUTE_META_SCHEMA); |
| 155 | + const valid = validate(attribute); |
| 156 | + |
| 157 | + if (!valid) { |
| 158 | + return { |
| 159 | + valid: false, |
| 160 | + errors: validate.errors.map(err => ({ |
| 161 | + path: err.instancePath, |
| 162 | + message: err.message |
| 163 | + })) |
| 164 | + }; |
| 165 | + } |
| 166 | + |
| 167 | + return { valid: true, errors: [] }; |
| 168 | +}, [ajv]); |
| 169 | +``` |
| 170 | + |
| 171 | +**Benefits:** |
| 172 | +- Eliminate 70+ lines of manual validation |
| 173 | +- Leverage AJV's optimized validation |
| 174 | +- Automatically handle new JSON Schema keywords |
| 175 | + |
| 176 | +--- |
| 177 | + |
| 178 | +### MEDIUM PRIORITY: Backend - Validate Configuration on Read |
| 179 | + |
| 180 | +**File:** `lib/idp_common_pkg/idp_common/config/configuration_manager.py` |
| 181 | +**Issue:** No validation when reading from DynamoDB |
| 182 | + |
| 183 | +**Proposed:** |
| 184 | +```python |
| 185 | +def get_configuration(self, configuration_type: str, validate=True) -> Dict[str, Any]: |
| 186 | + """Get configuration with optional validation.""" |
| 187 | + config = # ... fetch from DynamoDB ... |
| 188 | + |
| 189 | + if validate and 'classes' in config: |
| 190 | + from idp_common.config_schema import validate_extraction_schema |
| 191 | + try: |
| 192 | + validate_extraction_schema(config['classes']) |
| 193 | + except Exception as e: |
| 194 | + logger.error(f"Invalid config in DynamoDB: {e}") |
| 195 | + # Could return default or raise |
| 196 | + |
| 197 | + return config |
| 198 | +``` |
| 199 | + |
| 200 | +--- |
| 201 | + |
| 202 | +### LOW PRIORITY: Frontend - Share Schema Definition |
| 203 | + |
| 204 | +**Issue:** Schema definition duplicated between frontend and backend |
| 205 | + |
| 206 | +**Current:** |
| 207 | +- Backend: `lib/idp_common_pkg/idp_common/config_schema/schema_definition.py` |
| 208 | +- Frontend: Partial schema in `useSchemaValidation.js` |
| 209 | + |
| 210 | +**Proposed:** |
| 211 | +- Generate JSON file from Python schema definition |
| 212 | +- Import in both frontend and backend |
| 213 | +- Single source of truth |
| 214 | + |
| 215 | +--- |
| 216 | + |
| 217 | +## Implementation Priority |
| 218 | + |
| 219 | +### Phase 1 (HIGH - Immediate) |
| 220 | +1. ✅ Add validation to `migrate_legacy_to_schema()` |
| 221 | +2. ✅ Add AWS extension validation in migration |
| 222 | +3. ✅ Validate after migration in configuration_resolver |
| 223 | + |
| 224 | +### Phase 2 (MEDIUM - This Sprint) |
| 225 | +4. ⚠️ Use AJV meta-schema in useSchemaValidation |
| 226 | +5. ⚠️ Add validation on config read in ConfigurationManager |
| 227 | + |
| 228 | +### Phase 3 (LOW - Future) |
| 229 | +6. ⬜ Share schema definition between frontend/backend |
| 230 | +7. ⬜ Add JSON Schema $ref resolution using library features |
| 231 | + |
| 232 | +## Code Savings Estimate |
| 233 | + |
| 234 | +- Backend validation: +50 lines (new code for safety) |
| 235 | +- Frontend AJV improvements: -70 lines (eliminate manual validation) |
| 236 | +- **Net:** -20 lines, +much better validation coverage |
0 commit comments