Skip to content

Implement input validation in DataGenerator #32

@moshesham

Description

@moshesham

Problem

DataGenerator accepts user input without validation:

  • num_records can be negative or extremely large
  • seed can be any value
  • YAML config structure not validated

Impact

  • Potential DoS through memory exhaustion
  • Unexpected behavior with invalid inputs
  • Poor error messages
  • Security risk

Tasks

  • Add _validate_num_records() method
  • Add _validate_config() method
  • Add _validate_seed() method if needed
  • Call validation methods in __init__()
  • Add tests for validation methods
  • Add tests for edge cases (negative, zero, very large)
  • Update docstrings with validation information

Validation Rules

  • num_records: Must be 1 <= n <= 10,000,000
  • seed: Must be non-negative integer if provided
  • Config must have required keys: 'data_generation', 'fields'
  • Each field must have: 'name', 'type', 'values'

References

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions