@@ -374,4 +374,106 @@ operations immediately upon first failure to prevent partial data imports.
374374
375375- ** WHEN** any Parquet file fails to convert to CSV
376376- ** THEN** system SHALL abort all other conversion tasks immediately
377- - ** AND** system SHALL return error indicating which file failed conversion
377+ - ** AND** system SHALL return error indicating which file failed conversion
378+
379+ ### Requirement: Arrow Schema Inference from Parquet Files
380+
381+ The system SHALL support inferring Arrow schemas from Parquet file metadata without reading the full data.
382+
383+ #### Scenario: Infer schema from single Parquet file
384+
385+ - ** WHEN** user requests schema inference from a single Parquet file
386+ - ** THEN** system SHALL read only the Parquet metadata (not data)
387+ - ** AND** system SHALL return the Arrow schema with field names and types
388+ - ** AND** system SHALL include nullability information for each field
389+
390+ #### Scenario: Infer union schema from multiple Parquet files
391+
392+ - ** WHEN** user requests schema inference from multiple Parquet files
393+ - ** THEN** system SHALL read metadata from all files
394+ - ** AND** system SHALL compute a union schema that accommodates all files
395+ - ** AND** system SHALL widen types when fields have different types across files
396+ - ** AND** type widening SHALL follow these rules:
397+ - Identical types remain unchanged
398+ - DECIMAL types widen to max(precision), max(scale)
399+ - VARCHAR types widen to max(size)
400+ - DECIMAL + DOUBLE widens to DOUBLE
401+ - Incompatible types fall back to VARCHAR(2000000)
402+
403+ #### Scenario: Schema inference error handling
404+
405+ - ** WHEN** schema inference encounters an error
406+ - ** THEN** system SHALL return SchemaInferenceError with file path context
407+ - ** AND** system SHALL indicate whether the error was in reading metadata or type conversion
408+
409+ ### Requirement: Arrow to Exasol DDL Generation
410+
411+ The system SHALL support generating Exasol CREATE TABLE DDL statements from inferred schemas.
412+
413+ #### Scenario: Column name handling with Quoted mode
414+
415+ - ** WHEN** generating DDL with Quoted column name mode
416+ - ** THEN** column names SHALL be wrapped in double quotes
417+ - ** AND** internal double quotes in names SHALL be escaped by doubling
418+ - ** AND** original column names SHALL be preserved exactly
419+
420+ #### Scenario: Column name handling with Sanitize mode
421+
422+ - ** WHEN** generating DDL with Sanitize column name mode
423+ - ** THEN** column names SHALL be converted to uppercase
424+ - ** AND** invalid identifier characters SHALL be replaced with underscore
425+ - ** AND** names starting with digits SHALL be prefixed with underscore
426+ - ** AND** Exasol reserved words SHALL be quoted
427+
428+ #### Scenario: DDL type generation
429+
430+ - ** WHEN** generating DDL column types
431+ - ** THEN** ExasolType SHALL be converted to valid DDL syntax
432+ - ** AND** BOOLEAN SHALL generate "BOOLEAN"
433+ - ** AND** VARCHAR(n) SHALL generate "VARCHAR(n)"
434+ - ** AND** DECIMAL(p,s) SHALL generate "DECIMAL(p,s)"
435+ - ** AND** DOUBLE SHALL generate "DOUBLE"
436+ - ** AND** DATE SHALL generate "DATE"
437+ - ** AND** TIMESTAMP SHALL generate "TIMESTAMP" or "TIMESTAMP WITH LOCAL TIME ZONE"
438+
439+ #### Scenario: Complete DDL statement generation
440+
441+ - ** WHEN** generating CREATE TABLE DDL
442+ - ** THEN** output SHALL include "CREATE TABLE schema.table (" prefix
443+ - ** AND** output SHALL include column definitions separated by commas
444+ - ** AND** output SHALL include closing ");"
445+ - ** AND** schema prefix SHALL be optional (omit if not provided)
446+
447+ ### Requirement: Auto Table Creation for Parquet Import
448+
449+ The system SHALL support automatically creating target tables before Parquet import when enabled.
450+
451+ #### Scenario: Auto-create table option enabled
452+
453+ - ** WHEN** importing Parquet with create_table_if_not_exists=true
454+ - ** AND** target table does not exist
455+ - ** THEN** system SHALL infer schema from Parquet file(s)
456+ - ** AND** system SHALL generate CREATE TABLE DDL
457+ - ** AND** system SHALL execute DDL before IMPORT statement
458+ - ** AND** import SHALL proceed normally after table creation
459+
460+ #### Scenario: Auto-create with existing table
461+
462+ - ** WHEN** importing Parquet with create_table_if_not_exists=true
463+ - ** AND** target table already exists
464+ - ** THEN** system SHALL skip DDL execution
465+ - ** AND** import SHALL proceed normally using existing table schema
466+
467+ #### Scenario: Auto-create option disabled (default)
468+
469+ - ** WHEN** importing Parquet with create_table_if_not_exists=false (default)
470+ - ** THEN** system SHALL NOT attempt schema inference
471+ - ** AND** system SHALL NOT execute any CREATE TABLE DDL
472+ - ** AND** import SHALL assume table already exists
473+
474+ #### Scenario: Multi-file auto-create
475+
476+ - ** WHEN** importing multiple Parquet files with create_table_if_not_exists=true
477+ - ** THEN** system SHALL compute union schema from all files
478+ - ** AND** system SHALL create table with widened types
479+ - ** AND** all files SHALL be importable into the created table
0 commit comments