Skip to content

Add file filtering with glob patterns to bake_folder#59

Merged
schovi merged 8 commits intomasterfrom
file-filters
Nov 10, 2025
Merged

Add file filtering with glob patterns to bake_folder#59
schovi merged 8 commits intomasterfrom
file-filters

Conversation

@schovi
Copy link
Owner

@schovi schovi commented Nov 8, 2025

Problem

The bake_folder macro previously embedded all files from a directory into the binary with limited control. Users couldn't selectively include or exclude files based on patterns, forcing them to either:

  • Bake entire directories (including unwanted test files, build artifacts, logs)
  • Manually structure directories to separate desired files
  • Use multiple bake_folder calls with complex directory layouts

This made it difficult to embed only production assets while excluding development/test files from the compiled binary.

Solution

Add glob pattern filtering to bake_folder via include_patterns and exclude_patterns parameters. Patterns support standard glob syntax (*, **, ?) and are applied at compile time to control which files get embedded.

The implementation adds pattern matching logic to the loader process that filters discovered files before embedding them. Include patterns act as whitelist (OR logic), exclude patterns as blacklist (OR logic), applied in that order.

Changes

Macro Interface (src/baked_file_system.cr:178):

  • Add include_patterns and exclude_patterns parameters to bake_folder macro
  • Serialize filter patterns as JSON for loader process communication
  • Maintain backward compatibility (nil patterns = no filtering)

Pattern Matching Engine (src/loader/loader.cr:14):

  • glob_to_regex(): Convert glob patterns to regex (*[^/]*, ** → recursive paths, ? → single char)
  • matches_pattern?(): Test if file path matches a glob pattern
  • filter_files(): Apply include/exclude filters to file list

Loader Integration (src/loader/loader.cr:83):

  • Accept filter patterns from macro via command-line arguments
  • Apply filtering after directory scanning but before embedding
  • Convert absolute paths to relative for pattern matching
  • Preserve existing file discovery behavior when no filters specified

Documentation (README.md:40):

  • Pattern syntax reference and examples
  • Include/exclude usage patterns
  • Combined filtering examples
  • Use cases and important notes

Tests:

  • Pattern matching unit tests (loader_spec.cr)
  • End-to-end filtering integration tests (baked_file_system_spec.cr)
  • Test fixtures for various directory structures (spec/storage/filters/)

Quality & Impact

✅ All tests passing (unit and integration)

Testing Coverage:

  • Pattern matching correctness (wildcards, recursive paths)
  • Include-only filtering (whitelist)
  • Exclude-only filtering (blacklist)
  • Combined filtering (include + exclude)
  • Edge cases (empty results, no filters, dotfiles interaction)

Backward Compatibility: Fully preserved - existing code continues to work without changes.

Performance: Minimal impact - filtering runs once at compile time, zero runtime overhead.

- Add matches_pattern? method supporting glob patterns (*, **, ?)
- Implement glob_to_regex converter with placeholder-based replacement
- Add comprehensive unit tests for pattern matching edge cases
- Support cross-platform path normalization (backslash to forward slash)

Pattern support:
- * matches any characters except path separator
- ** matches zero or more directory levels
- ? matches single character except path separator
- Handles leading slashes, backslashes, and edge cases

Files modified:
- src/loader/loader.cr: Pattern matching methods
- spec/loader_spec.cr: 15 new pattern matching tests

All tests passing (15/15)
- Add filter_files method to apply include/exclude patterns
- Integrate filtering into load method after Dir.glob
- Convert paths to relative for pattern matching
- Update load signature with include_patterns and exclude_patterns
- Add 9 comprehensive unit tests for filter_files method

Filtering logic:
- Apply include patterns first (OR logic across patterns)
- Then apply exclude patterns (OR logic across patterns)
- Patterns match against relative paths from baked directory
- Returns filtered array maintaining original order

Tests cover:
- Include-only filtering
- Exclude-only filtering
- Combined include+exclude
- Multiple patterns with OR logic
- Empty results and edge cases

All existing tests updated and passing (47/47)
- Add include_patterns and exclude_patterns to bake_folder macro
- Serialize patterns as JSON for loader process communication
- Update loader.cr to parse JSON filter parameters
- Maintain backward compatibility (parameters optional, default nil)

Macro changes:
- Extended inner macro in BakedFileSystem module
- Extended public bake_folder macro with new parameters
- Conditional JSON serialization only when filters provided
- Updated documentation with filter parameter descriptions

Loader changes:
- Parse ARGV[3] as JSON containing include/exclude arrays
- Extract pattern arrays from JSON with error handling
- Updated max_size to ARGV[4] (shifted by filter param)
- Graceful fallback if JSON parsing fails

All existing tests passing (56/56)
Backward compatible - existing code requires no changes
- Create test fixture directory (spec/storage/filters) with sample files
- Add 4 filtered storage test classes with different pattern combinations
- Add 7 comprehensive integration tests for filtering scenarios
- Fix JSON null handling in loader.cr
- Fix macro compile-time JSON serialization
- Update existing test expectations for new fixture files

Test fixtures:
- src/main.cr, src/lib.cr (Crystal source files)
- test/spec.cr, test/helper.cr (test files)
- docs/README.md (markdown file)
- config.yml (YAML config)

Test coverage:
- Include-only patterns (filter to .cr files)
- Exclude-only patterns (exclude test directory)
- Combined include+exclude patterns
- Empty result with allow_empty flag
- Relative path matching verification
- Content reading from filtered files
- Compile-time error behavior documentation

All tests passing (117/117)
Cross-platform compatible via CI
- Add comprehensive "File Filtering" section to README
- Document glob pattern syntax (*, **, ?)
- Provide examples for include-only, exclude-only, and combined filtering
- Explain pattern matching rules and behavior
- List common use cases for filtering
- Update bake_folder signature in Options section
- Update Best Practices to mention file filtering

Documentation includes:
- Pattern syntax reference with examples
- Include patterns (whitelist) examples
- Exclude patterns (blacklist) examples
- Combined filtering examples
- Important notes on pattern matching behavior
- Practical use cases

All documentation clear, concise, and example-driven
Restores spec/storage/filters/docs/README.md that was accidentally
deleted in commit 36c74b8. This file is required by 3 test cases:
- BakedFileSystem load only files without hidden one
- file filtering excludes files matching exclude patterns
- file filtering applies both include and exclude patterns

Fixes all 3 CI test failures showing off-by-one file counts.
@schovi schovi marked this pull request as ready for review November 10, 2025 12:23
@schovi schovi merged commit 84ee1e1 into master Nov 10, 2025
8 checks passed
@schovi schovi deleted the file-filters branch November 10, 2025 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant