Skip to content

Commit 7be6307

Browse files
authored
feat: better handling of empty SQL files (#216)
Enhance the SQL file loader to gracefully skip files lacking named statements, preventing errors during loading. This change allows for seamless processing of directories containing both named queries and raw SQL scripts. Update tests to reflect the new behavior and ensure proper logging of skipped files.
1 parent c12cf82 commit 7be6307

File tree

8 files changed

+392
-51
lines changed

8 files changed

+392
-51
lines changed

AGENTS.md

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -379,12 +379,157 @@ def _raise_data_required(self):
379379
- Abstract raise statements to inner functions in try blocks
380380
- Remove unnecessary try/catch blocks that will be caught higher in the execution
381381

382+
#### Two-Tier Error Handling Pattern
383+
384+
When processing user input that may be incomplete or malformed, use a two-tier approach:
385+
386+
**Tier 1: Graceful Skip (Expected Incomplete Input)**
387+
388+
- Condition: Input lacks required markers but is otherwise valid
389+
- Action: Return empty result (empty dict, None, etc.)
390+
- Log level: DEBUG
391+
- Example: SQL file without `-- name:` markers
392+
393+
**Tier 2: Hard Error (Malformed Input)**
394+
395+
- Condition: Input has required markers but is malformed
396+
- Action: Raise specific exception with clear message
397+
- Log level: ERROR (via exception handler)
398+
- Example: Duplicate statement names, empty names
399+
400+
**Implementation Pattern:**
401+
402+
```python
403+
def parse_user_input(content: str, source: str) -> "dict[str, Result]":
404+
"""Parse user input with two-tier error handling.
405+
406+
Files without required markers are gracefully skipped by returning
407+
an empty dictionary. The caller is responsible for handling empty results.
408+
409+
Args:
410+
content: Raw input content to parse.
411+
source: Source identifier for error reporting.
412+
413+
Returns:
414+
Dictionary of parsed results. Empty dict if no required markers found.
415+
416+
Raises:
417+
ParseError: If required markers are present but malformed.
418+
"""
419+
# Check for required markers
420+
markers = list(MARKER_PATTERN.finditer(content))
421+
if not markers:
422+
return {} # Tier 1: Graceful skip
423+
424+
results = {}
425+
for marker in markers:
426+
# Parse marker
427+
if malformed_marker(marker):
428+
raise ParseError(source, "Malformed marker") # Tier 2: Hard error
429+
430+
# Process and add result
431+
results[marker.name] = process(marker)
432+
433+
return results
434+
435+
436+
def caller_function(file_path: str) -> None:
437+
"""Caller handles empty results from graceful skip."""
438+
results = parse_user_input(content, str(file_path))
439+
440+
if not results:
441+
logger.debug(
442+
"Skipping file without required markers: %s",
443+
str(file_path),
444+
extra={"file_path": str(file_path), "correlation_id": get_correlation_id()},
445+
)
446+
return # Early return - don't process empty results
447+
448+
# Process non-empty results
449+
for name, result in results.items():
450+
store_result(name, result)
451+
```
452+
453+
**Key Benefits:**
454+
455+
1. **Clear Intent**: Distinguishes "no markers" from "bad markers"
456+
2. **User-Friendly**: Doesn't break batch processing for missing markers
457+
3. **Debuggable**: DEBUG logs show what was skipped
458+
4. **Fail-Fast**: Malformed input still raises clear errors
459+
460+
**When to Use:**
461+
462+
- File/directory loading with optional markers
463+
- Batch processing where some inputs may lack required data
464+
- Migration systems processing mixed file types
465+
- Configuration loading with optional sections
466+
467+
**When NOT to Use:**
468+
469+
- Required input validation (use strict validation)
470+
- Single-file processing where empty result is an error
471+
- API endpoints expecting specific data format
472+
473+
**Real-World Example:**
474+
475+
See `sqlspec/loader.py:_parse_sql_content()` for the reference implementation in SQL file loading.
476+
382477
### Logging Standards
383478

384479
- Use `logging` module, NEVER `print()`
385480
- NO f-strings in log messages - use lazy formatting
386481
- Provide meaningful context in all log messages
387482

483+
#### DEBUG Level for Expected Skip Behavior
484+
485+
Use DEBUG level (not INFO or WARNING) when gracefully skipping expected conditions:
486+
487+
```python
488+
# GOOD - DEBUG for expected skip
489+
if not statements:
490+
logger.debug(
491+
"Skipping SQL file without named statements: %s",
492+
path_str,
493+
extra={
494+
"file_path": path_str,
495+
"correlation_id": CorrelationContext.get(),
496+
},
497+
)
498+
return
499+
500+
# BAD - INFO suggests important information
501+
logger.info("Skipping file %s", path_str) # Too high level
502+
503+
# BAD - WARNING suggests potential problem
504+
logger.warning("No statements found in %s", path_str) # Misleading
505+
```
506+
507+
**DEBUG vs INFO vs WARNING Guidelines:**
508+
509+
- **DEBUG**: Expected behavior that aids troubleshooting
510+
- Files gracefully skipped during batch processing
511+
- Optional features not enabled (dependencies missing)
512+
- Cache hits/misses
513+
- Internal state transitions
514+
515+
- **INFO**: Significant events during normal operation
516+
- Connection pool created
517+
- Migration applied successfully
518+
- Background task started
519+
520+
- **WARNING**: Unexpected but recoverable conditions
521+
- Retrying after transient failure
522+
- Falling back to alternative implementation
523+
- Configuration using deprecated options
524+
525+
**Context Requirements:**
526+
527+
Always include `extra` dict with:
528+
529+
- Primary identifier (file_path, query_name, etc.)
530+
- Correlation ID via `CorrelationContext.get()`
531+
- Additional relevant context (size, duration, etc.)
532+
388533
### Documentation Standards
389534

390535
**Docstrings (Google Style - MANDATORY)**:
@@ -395,6 +540,43 @@ def _raise_data_required(self):
395540
- Sphinx-compatible format
396541
- Focus on WHY not WHAT
397542

543+
**Documenting Graceful Degradation:**
544+
545+
When functions implement graceful skip behavior, document it clearly in the docstring:
546+
547+
```python
548+
def parse_content(content: str, source: str) -> "dict[str, Result]":
549+
"""Parse content and extract structured data.
550+
551+
Files without required markers are gracefully skipped by returning
552+
an empty dictionary. The caller is responsible for handling empty results
553+
appropriately.
554+
555+
Args:
556+
content: Raw content to parse.
557+
source: Source identifier for error reporting.
558+
559+
Returns:
560+
Dictionary mapping names to results.
561+
Empty dict if no required markers found in the content.
562+
563+
Raises:
564+
ParseError: If required markers are present but malformed
565+
(duplicate names, empty names, invalid content).
566+
"""
567+
```
568+
569+
**Key elements:**
570+
571+
1. **First paragraph after summary**: Explain graceful skip behavior
572+
2. **Caller responsibility**: Note that caller must handle empty results
573+
3. **Returns section**: Explicitly document empty dict case
574+
4. **Raises section**: Only document hard errors, not graceful skips
575+
576+
**Example from codebase:**
577+
578+
See `sqlspec/loader.py:_parse_sql_content()` for reference implementation.
579+
398580
**Project Documentation**:
399581

400582
- Update `docs/` for new features and API changes
@@ -995,17 +1177,20 @@ SQLSpec implements Apache Arrow support through a dual-path architecture: native
9951177
### When to Implement Arrow Support
9961178

9971179
**Implement select_to_arrow() when**:
1180+
9981181
- Adapter supports high-throughput analytical queries
9991182
- Users need integration with pandas, Polars, or data science tools
10001183
- Data interchange with Arrow ecosystem (Parquet, Spark, etc.) is common
10011184
- Large result sets are typical for the adapter's use cases
10021185

10031186
**Use native Arrow path when**:
1187+
10041188
- Database driver provides direct Arrow output (e.g., ADBC `fetch_arrow_table()`)
10051189
- Zero-copy data transfer available
10061190
- Performance is critical for large datasets
10071191

10081192
**Use conversion path when**:
1193+
10091194
- Database driver returns dict/row results
10101195
- Native Arrow support not available
10111196
- Conversion overhead acceptable for use case
@@ -1057,6 +1242,7 @@ class NativeArrowDriver(AsyncDriverAdapterBase):
10571242
```
10581243

10591244
**Key principles**:
1245+
10601246
- Use `ensure_pyarrow()` for dependency validation
10611247
- Validate `native_only` flag if adapter doesn't support native path
10621248
- Preserve Arrow schema metadata from database
@@ -1087,6 +1273,7 @@ async def select_to_arrow(self, statement, /, *parameters, **kwargs):
10871273
```
10881274

10891275
**When to use**:
1276+
10901277
- Adapter has no native Arrow support
10911278
- Conversion overhead acceptable (<20% for most cases)
10921279
- Provides Arrow compatibility for all adapters
@@ -1109,6 +1296,7 @@ UUID → utf8 (converted to string)
11091296
```
11101297

11111298
**Complex type handling**:
1299+
11121300
- Arrays: Preserve as Arrow list types when possible
11131301
- JSON: Convert to utf8 (text) for portability
11141302
- UUIDs: Convert to strings for cross-platform compatibility
@@ -1130,6 +1318,7 @@ result = create_arrow_result(record_batch, rows_affected=record_batch.num_rows)
11301318
```
11311319

11321320
**Benefits**:
1321+
11331322
- Consistent API across all adapters
11341323
- Automatic to_pandas(), to_polars(), to_dict() support
11351324
- Iteration and length operations
@@ -1138,12 +1327,14 @@ result = create_arrow_result(record_batch, rows_affected=record_batch.num_rows)
11381327
### Testing Requirements
11391328

11401329
**Unit tests** for Arrow helpers:
1330+
11411331
- Test `convert_dict_to_arrow()` with various data types
11421332
- Test empty result handling
11431333
- Test NULL value preservation
11441334
- Test schema inference
11451335

11461336
**Integration tests** per adapter:
1337+
11471338
- Test native Arrow path (if supported)
11481339
- Test table and batch return formats
11491340
- Test pandas/Polars conversion
@@ -1153,6 +1344,7 @@ result = create_arrow_result(record_batch, rows_affected=record_batch.num_rows)
11531344
- Test empty results
11541345

11551346
**Performance benchmarks** (for native paths):
1347+
11561348
- Measure native vs conversion speedup
11571349
- Validate zero-copy behavior
11581350
- Benchmark memory usage
@@ -1233,13 +1425,15 @@ When implementing Arrow support:
12331425
### Common Pitfalls
12341426

12351427
**Avoid**:
1428+
12361429
- Returning raw Arrow objects instead of ArrowResult
12371430
- Missing `ensure_pyarrow()` dependency check
12381431
- Not supporting both "table" and "batch" return formats
12391432
- Ignoring `native_only` flag when adapter has no native support
12401433
- Breaking existing `execute()` behavior
12411434

12421435
**Do**:
1436+
12431437
- Use `create_arrow_result()` for consistent wrapping
12441438
- Support all standard type mappings
12451439
- Test with large datasets
@@ -1249,16 +1443,19 @@ When implementing Arrow support:
12491443
### Performance Guidelines
12501444

12511445
**Native path targets**:
1446+
12521447
- Overhead <5% vs direct driver Arrow fetch
12531448
- Zero-copy data transfer
12541449
- 5-10x faster than dict conversion for datasets >10K rows
12551450

12561451
**Conversion path targets**:
1452+
12571453
- Overhead <20% vs standard `execute()` for datasets <1K rows
12581454
- Overhead <15% for datasets 1K-100K rows
12591455
- Overhead <10% for datasets >100K rows (columnar efficiency)
12601456

12611457
**Memory targets**:
1458+
12621459
- Peak memory <2x dict representation
12631460
- Arrow columnar format more efficient for large datasets
12641461

docs/changelog.rst

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,52 @@ SQLSpec Changelog
1010
Recent Updates
1111
==============
1212

13+
SQL Loader Graceful Error Handling
14+
-----------------------------------
15+
16+
**Breaking Change**: Files without named statements (``-- name:``) are now gracefully skipped instead of raising ``SQLFileParseError``.
17+
18+
This allows loading directories containing both aiosql-style named queries and raw DDL/DML scripts without errors.
19+
20+
**What Changed:**
21+
22+
- Files without ``-- name:`` markers return empty dict instead of raising exception
23+
- Directory loading continues when encountering such files
24+
- Skipped files are logged at DEBUG level
25+
- Malformed named statements (duplicate names, etc.) still raise exceptions
26+
27+
**Migration Guide:**
28+
29+
Code explicitly catching ``SQLFileParseError`` for files without named statements will need updating:
30+
31+
.. code-block:: python
32+
33+
# OLD (breaks):
34+
try:
35+
loader.load_sql("directory/")
36+
except SQLFileParseError as e:
37+
if "No named SQL statements found" in str(e):
38+
pass
39+
40+
# NEW (recommended):
41+
loader.load_sql("directory/") # Just works - DDL files skipped
42+
if not loader.list_queries():
43+
# No queries loaded
44+
pass
45+
46+
**Example Use Case:**
47+
48+
.. code-block:: python
49+
50+
# Directory structure:
51+
# migrations/
52+
# ├── schema.sql # Raw DDL (no -- name:) → SKIP
53+
# ├── queries.sql # Named queries → LOAD
54+
# └── seed-data.sql # Raw DML (no -- name:) → SKIP
55+
56+
loader = SQLFileLoader()
57+
loader.load_sql("migrations/") # Loads only named queries, skips DDL
58+
1359
Hybrid Versioning with Fix Command
1460
-----------------------------------
1561

0 commit comments

Comments
 (0)