Skip to content

Ingestion Validation #91

Merged
Mesh-ach merged 102 commits intostagingfrom
develop
Jun 3, 2025
Merged

Ingestion Validation #91
Mesh-ach merged 102 commits intostagingfrom
develop

Conversation

@Mesh-ach
Copy link
Collaborator

@Mesh-ach Mesh-ach commented Jun 3, 2025

changes

  • Refactored validation pipeline to support:
    • Unified schema validation across cohort, course, and finance datasets.
    • Column normalization and fuzzy matching using aliases and canonical names.
    • Optional vs. required field distinctions with soft vs. hard failure handling.
  • Replaced rigid schema checks with dynamic schema loading from JSON specs (base_schema.json and optional institution-specific extensions).
  • Introduced Pandera-based validation using dynamically constructed regex-based DataFrameSchema.
  • Error reporting via a custom HardValidationError class.
  • Centralized logic for model inference, schema merging, and failure classification.
  • Updated validation.py and tests to support the new logic and structure.

context

Previously, schema validation was tightly coupled to static column definitions. This limited flexibility and introduced maintenance overhead as models evolved.

This PR introduces a scalable, extensible schema-driven approach for validating CSV uploads:

  • Schemas are centrally managed via JSON.
  • Optional finance fields are now treated as non-blocking, while a core subset remains required (e.g., Pell).
  • Designed to handle mixed-model files and future schema evolution with minimal code changes.

questions

No questions at this time


Mesh-ach and others added 29 commits June 2, 2025 17:39
@Mesh-ach Mesh-ach merged commit 3a50f17 into staging Jun 3, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant