@@ -379,12 +379,157 @@ def _raise_data_required(self):
379379- Abstract raise statements to inner functions in try blocks
380380- Remove unnecessary try/catch blocks that will be caught higher in the execution
381381
382+ # ### Two-Tier Error Handling Pattern
383+
384+ When processing user input that may be incomplete or malformed, use a two-tier approach :
385+
386+ **Tier 1: Graceful Skip (Expected Incomplete Input)**
387+
388+ - Condition : Input lacks required markers but is otherwise valid
389+ - Action : Return empty result (empty dict, None, etc.)
390+ - Log level : DEBUG
391+ - Example : SQL file without `-- name:` markers
392+
393+ **Tier 2: Hard Error (Malformed Input)**
394+
395+ - Condition : Input has required markers but is malformed
396+ - Action : Raise specific exception with clear message
397+ - Log level : ERROR (via exception handler)
398+ - Example : Duplicate statement names, empty names
399+
400+ **Implementation Pattern:**
401+
402+ ` ` ` python
403+ def parse_user_input(content: str, source: str) -> "dict[str, Result]":
404+ """Parse user input with two-tier error handling.
405+
406+ Files without required markers are gracefully skipped by returning
407+ an empty dictionary. The caller is responsible for handling empty results.
408+
409+ Args:
410+ content: Raw input content to parse.
411+ source: Source identifier for error reporting.
412+
413+ Returns:
414+ Dictionary of parsed results. Empty dict if no required markers found.
415+
416+ Raises:
417+ ParseError: If required markers are present but malformed.
418+ """
419+ # Check for required markers
420+ markers = list(MARKER_PATTERN.finditer(content))
421+ if not markers:
422+ return {} # Tier 1: Graceful skip
423+
424+ results = {}
425+ for marker in markers:
426+ # Parse marker
427+ if malformed_marker(marker):
428+ raise ParseError(source, "Malformed marker") # Tier 2: Hard error
429+
430+ # Process and add result
431+ results[marker.name] = process(marker)
432+
433+ return results
434+
435+
436+ def caller_function(file_path: str) -> None:
437+ """Caller handles empty results from graceful skip."""
438+ results = parse_user_input(content, str(file_path))
439+
440+ if not results:
441+ logger.debug(
442+ "Skipping file without required markers: %s",
443+ str(file_path),
444+ extra={"file_path": str(file_path), "correlation_id": get_correlation_id()},
445+ )
446+ return # Early return - don't process empty results
447+
448+ # Process non-empty results
449+ for name, result in results.items():
450+ store_result(name, result)
451+ ` ` `
452+
453+ **Key Benefits:**
454+
455+ 1. **Clear Intent** : Distinguishes "no markers" from "bad markers"
456+ 2. **User-Friendly** : Doesn't break batch processing for missing markers
457+ 3. **Debuggable** : DEBUG logs show what was skipped
458+ 4. **Fail-Fast** : Malformed input still raises clear errors
459+
460+ **When to Use:**
461+
462+ - File/directory loading with optional markers
463+ - Batch processing where some inputs may lack required data
464+ - Migration systems processing mixed file types
465+ - Configuration loading with optional sections
466+
467+ **When NOT to Use:**
468+
469+ - Required input validation (use strict validation)
470+ - Single-file processing where empty result is an error
471+ - API endpoints expecting specific data format
472+
473+ **Real-World Example:**
474+
475+ See `sqlspec/loader.py:_parse_sql_content()` for the reference implementation in SQL file loading.
476+
382477# ## Logging Standards
383478
384479- Use `logging` module, NEVER `print()`
385480- NO f-strings in log messages - use lazy formatting
386481- Provide meaningful context in all log messages
387482
483+ # ### DEBUG Level for Expected Skip Behavior
484+
485+ Use DEBUG level (not INFO or WARNING) when gracefully skipping expected conditions :
486+
487+ ` ` ` python
488+ # GOOD - DEBUG for expected skip
489+ if not statements:
490+ logger.debug(
491+ "Skipping SQL file without named statements: %s",
492+ path_str,
493+ extra={
494+ "file_path": path_str,
495+ "correlation_id": CorrelationContext.get(),
496+ },
497+ )
498+ return
499+
500+ # BAD - INFO suggests important information
501+ logger.info("Skipping file %s", path_str) # Too high level
502+
503+ # BAD - WARNING suggests potential problem
504+ logger.warning("No statements found in %s", path_str) # Misleading
505+ ` ` `
506+
507+ **DEBUG vs INFO vs WARNING Guidelines:**
508+
509+ - **DEBUG**: Expected behavior that aids troubleshooting
510+ - Files gracefully skipped during batch processing
511+ - Optional features not enabled (dependencies missing)
512+ - Cache hits/misses
513+ - Internal state transitions
514+
515+ - **INFO**: Significant events during normal operation
516+ - Connection pool created
517+ - Migration applied successfully
518+ - Background task started
519+
520+ - **WARNING**: Unexpected but recoverable conditions
521+ - Retrying after transient failure
522+ - Falling back to alternative implementation
523+ - Configuration using deprecated options
524+
525+ **Context Requirements:**
526+
527+ Always include `extra` dict with :
528+
529+ - Primary identifier (file_path, query_name, etc.)
530+ - Correlation ID via `CorrelationContext.get()`
531+ - Additional relevant context (size, duration, etc.)
532+
388533# ## Documentation Standards
389534
390535**Docstrings (Google Style - MANDATORY)**:
@@ -395,6 +540,43 @@ def _raise_data_required(self):
395540- Sphinx-compatible format
396541- Focus on WHY not WHAT
397542
543+ **Documenting Graceful Degradation:**
544+
545+ When functions implement graceful skip behavior, document it clearly in the docstring :
546+
547+ ` ` ` python
548+ def parse_content(content: str, source: str) -> "dict[str, Result]":
549+ """Parse content and extract structured data.
550+
551+ Files without required markers are gracefully skipped by returning
552+ an empty dictionary. The caller is responsible for handling empty results
553+ appropriately.
554+
555+ Args:
556+ content: Raw content to parse.
557+ source: Source identifier for error reporting.
558+
559+ Returns:
560+ Dictionary mapping names to results.
561+ Empty dict if no required markers found in the content.
562+
563+ Raises:
564+ ParseError: If required markers are present but malformed
565+ (duplicate names, empty names, invalid content).
566+ """
567+ ` ` `
568+
569+ **Key elements:**
570+
571+ 1. **First paragraph after summary** : Explain graceful skip behavior
572+ 2. **Caller responsibility** : Note that caller must handle empty results
573+ 3. **Returns section** : Explicitly document empty dict case
574+ 4. **Raises section** : Only document hard errors, not graceful skips
575+
576+ **Example from codebase:**
577+
578+ See `sqlspec/loader.py:_parse_sql_content()` for reference implementation.
579+
398580**Project Documentation**:
399581
400582- Update `docs/` for new features and API changes
@@ -995,17 +1177,20 @@ SQLSpec implements Apache Arrow support through a dual-path architecture: native
9951177# ## When to Implement Arrow Support
9961178
9971179**Implement select_to_arrow() when**:
1180+
9981181- Adapter supports high-throughput analytical queries
9991182- Users need integration with pandas, Polars, or data science tools
10001183- Data interchange with Arrow ecosystem (Parquet, Spark, etc.) is common
10011184- Large result sets are typical for the adapter's use cases
10021185
10031186**Use native Arrow path when**:
1187+
10041188- Database driver provides direct Arrow output (e.g., ADBC `fetch_arrow_table()`)
10051189- Zero-copy data transfer available
10061190- Performance is critical for large datasets
10071191
10081192**Use conversion path when**:
1193+
10091194- Database driver returns dict/row results
10101195- Native Arrow support not available
10111196- Conversion overhead acceptable for use case
@@ -1057,6 +1242,7 @@ class NativeArrowDriver(AsyncDriverAdapterBase):
10571242` ` `
10581243
10591244**Key principles**:
1245+
10601246- Use `ensure_pyarrow()` for dependency validation
10611247- Validate `native_only` flag if adapter doesn't support native path
10621248- Preserve Arrow schema metadata from database
@@ -1087,6 +1273,7 @@ async def select_to_arrow(self, statement, /, *parameters, **kwargs):
10871273` ` `
10881274
10891275**When to use**:
1276+
10901277- Adapter has no native Arrow support
10911278- Conversion overhead acceptable (<20% for most cases)
10921279- Provides Arrow compatibility for all adapters
@@ -1109,6 +1296,7 @@ UUID → utf8 (converted to string)
11091296` ` `
11101297
11111298**Complex type handling**:
1299+
11121300- Arrays : Preserve as Arrow list types when possible
11131301- JSON : Convert to utf8 (text) for portability
11141302- UUIDs : Convert to strings for cross-platform compatibility
@@ -1130,6 +1318,7 @@ result = create_arrow_result(record_batch, rows_affected=record_batch.num_rows)
11301318` ` `
11311319
11321320**Benefits**:
1321+
11331322- Consistent API across all adapters
11341323- Automatic to_pandas(), to_polars(), to_dict() support
11351324- Iteration and length operations
@@ -1138,12 +1327,14 @@ result = create_arrow_result(record_batch, rows_affected=record_batch.num_rows)
11381327# ## Testing Requirements
11391328
11401329**Unit tests** for Arrow helpers:
1330+
11411331- Test `convert_dict_to_arrow()` with various data types
11421332- Test empty result handling
11431333- Test NULL value preservation
11441334- Test schema inference
11451335
11461336**Integration tests** per adapter:
1337+
11471338- Test native Arrow path (if supported)
11481339- Test table and batch return formats
11491340- Test pandas/Polars conversion
@@ -1153,6 +1344,7 @@ result = create_arrow_result(record_batch, rows_affected=record_batch.num_rows)
11531344- Test empty results
11541345
11551346**Performance benchmarks** (for native paths):
1347+
11561348- Measure native vs conversion speedup
11571349- Validate zero-copy behavior
11581350- Benchmark memory usage
@@ -1233,13 +1425,15 @@ When implementing Arrow support:
12331425# ## Common Pitfalls
12341426
12351427**Avoid**:
1428+
12361429- Returning raw Arrow objects instead of ArrowResult
12371430- Missing `ensure_pyarrow()` dependency check
12381431- Not supporting both "table" and "batch" return formats
12391432- Ignoring `native_only` flag when adapter has no native support
12401433- Breaking existing `execute()` behavior
12411434
12421435**Do**:
1436+
12431437- Use `create_arrow_result()` for consistent wrapping
12441438- Support all standard type mappings
12451439- Test with large datasets
@@ -1249,16 +1443,19 @@ When implementing Arrow support:
12491443# ## Performance Guidelines
12501444
12511445**Native path targets**:
1446+
12521447- Overhead <5% vs direct driver Arrow fetch
12531448- Zero-copy data transfer
12541449- 5-10x faster than dict conversion for datasets >10K rows
12551450
12561451**Conversion path targets**:
1452+
12571453- Overhead <20% vs standard `execute()` for datasets <1K rows
12581454- Overhead <15% for datasets 1K-100K rows
12591455- Overhead <10% for datasets >100K rows (columnar efficiency)
12601456
12611457**Memory targets**:
1458+
12621459- Peak memory <2x dict representation
12631460- Arrow columnar format more efficient for large datasets
12641461
0 commit comments