|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +`bentools/etl` is a PHP library implementing the Extract/Transform/Load pattern for data processing workflows. It's designed to be flexible, event-driven, and support both synchronous and asynchronous (ReactPHP) processing. |
| 8 | + |
| 9 | +**Core concept:** Extract data from a source, apply transformations, and load results into a destination. |
| 10 | + |
| 11 | +## Commands |
| 12 | + |
| 13 | +### Testing & Quality |
| 14 | +```bash |
| 15 | +# Run all CI checks (PHP-CS-Fixer, PHPStan, Pest with coverage) |
| 16 | +composer ci:check |
| 17 | + |
| 18 | +# Run tests only |
| 19 | +vendor/bin/pest |
| 20 | + |
| 21 | +# Run tests with coverage |
| 22 | +vendor/bin/pest --coverage |
| 23 | + |
| 24 | +# Run a single test file |
| 25 | +vendor/bin/pest tests/Behavior/FlushTest.php |
| 26 | + |
| 27 | +# Run PHPStan type checking |
| 28 | +vendor/bin/phpstan analyse |
| 29 | + |
| 30 | +# Run code style fixer |
| 31 | +vendor/bin/php-cs-fixer fix |
| 32 | +``` |
| 33 | + |
| 34 | +### Requirements |
| 35 | +- PHP >=8.2 |
| 36 | +- Tests use Pest (not PHPUnit syntax) |
| 37 | +- 100% code coverage expected before PRs |
| 38 | + |
| 39 | +## Architecture |
| 40 | + |
| 41 | +### Core Components |
| 42 | + |
| 43 | +**EtlExecutor** (`src/EtlExecutor.php`) |
| 44 | +- Main entry point for building and executing ETL workflows |
| 45 | +- Uses builder pattern via `EtlBuilderTrait` to chain extractors, transformers, and loaders |
| 46 | +- Dispatches events at each lifecycle stage (init, extract, transform, load, flush, end) |
| 47 | +- Handles exceptions through dedicated event types (ExtractException, TransformException, etc.) |
| 48 | + |
| 49 | +**EtlState** (`src/EtlState.php`) |
| 50 | +- Immutable state object passed through the entire workflow |
| 51 | +- Tracks: current item, indices, flush timing, loaded items count, output |
| 52 | +- Contains context (arbitrary data), source, and destination |
| 53 | +- Version system for state updates during processing |
| 54 | + |
| 55 | +**EtlConfiguration** (`src/EtlConfiguration.php`) |
| 56 | +- Configuration object for flush frequency, batch size, and other options |
| 57 | +- `flushEvery` - Controls how often the loader flushes (default: INF) |
| 58 | +- `batchSize` - Controls how many items are grouped for batch transformation (default: 1) |
| 59 | + |
| 60 | +### Three Main Interfaces |
| 61 | + |
| 62 | +1. **ExtractorInterface** (`src/Extractor/`) |
| 63 | + - `extract(EtlState $state): iterable` - Returns an iterable of items to process |
| 64 | + - Built-in: CSV, JSON, FileExtractor, STDINExtractor, IterableExtractor, ReactStreamExtractor |
| 65 | + |
| 66 | +2. **TransformerInterface** (`src/Transformer/`) |
| 67 | + - `transform(mixed $item, EtlState $state): mixed` - Transforms extracted items |
| 68 | + - Return value can be a single value, an array, or a generator (yield) |
| 69 | + - Yielded items generate multiple loads from a single extracted item |
| 70 | + - Built-in: CallableTransformer, ChainTransformer, NullTransformer |
| 71 | + |
| 72 | +3. **BatchTransformerInterface** (`src/Transformer/`) |
| 73 | + - `transform(array $items, EtlState $state): Generator` - Transforms a batch of items at once |
| 74 | + - Separate interface from `TransformerInterface` (does NOT extend it) |
| 75 | + - Activated when `batchSize` is set in `EtlConfiguration` and transformer implements this interface |
| 76 | + - Each yielded value becomes an individual item for the load phase |
| 77 | + - Built-in: CallableBatchTransformer |
| 78 | + |
| 79 | +4. **LoaderInterface** (`src/Loader/`) |
| 80 | + - `load(mixed $item, EtlState $state): void` - Loads transformed items |
| 81 | + - `flush(bool $isEarly, EtlState $state): mixed` - Called at flush frequency or end |
| 82 | + - Built-in: InMemoryLoader, CSV, JSON, DoctrineORM, STDOUTLoader |
| 83 | + |
| 84 | +### Event System |
| 85 | + |
| 86 | +**Event dispatching** (`src/EventDispatcher/`) |
| 87 | +- Custom PSR-14 implementation with priority support |
| 88 | +- Events: InitEvent, StartEvent, ExtractEvent, TransformEvent, BeforeLoadEvent, LoadEvent, FlushEvent, EndEvent |
| 89 | +- Exception events: ExtractExceptionEvent, TransformExceptionEvent, LoadExceptionEvent, FlushExceptionEvent |
| 90 | +- Use `->on{EventName}(callable $listener, int $priority = 0)` on EtlExecutor |
| 91 | + |
| 92 | +**Control flow exceptions:** |
| 93 | +- `SkipRequest` - Skip current item, continue processing |
| 94 | +- `StopRequest` - Stop entire workflow immediately |
| 95 | + |
| 96 | +### Processors |
| 97 | + |
| 98 | +**ProcessorInterface** (`src/Processor/`) |
| 99 | +- `IterableProcessor` - Default synchronous processing |
| 100 | +- `ReactStreamProcessor` - Async processing with ReactPHP streams (experimental) |
| 101 | + |
| 102 | +### Recipes |
| 103 | + |
| 104 | +**Recipe** (`src/Recipe/`) |
| 105 | +- Reusable workflow configurations (combine extractors, transformers, loaders, event listeners) |
| 106 | +- `FilterRecipe` - Skip/exclude items based on callable filter |
| 107 | +- `LoggerRecipe` - PSR-3 logging integration |
| 108 | + |
| 109 | +### Utility Functions |
| 110 | + |
| 111 | +`src/functions.php` provides helper functions: |
| 112 | +- `extractFrom()` - Create executor starting with extractor |
| 113 | +- `transformWith()` - Create executor starting with transformer |
| 114 | +- `loadInto()` - Create executor starting with loader |
| 115 | +- `withRecipe()` - Create executor with recipe |
| 116 | +- `chain()` - Chain multiple extractors/transformers/loaders |
| 117 | +- `stdIn()` / `stdOut()` - STDIN/STDOUT helpers |
| 118 | +- `skipWhen()` - Conditional skip recipe |
| 119 | + |
| 120 | +## Key Patterns |
| 121 | + |
| 122 | +### Immutability & Cloning |
| 123 | +- EtlExecutor uses `ClonableTrait` - all builder methods return clones |
| 124 | +- EtlState has version tracking - always get latest via `$state->getLastVersion()` |
| 125 | + |
| 126 | +### Fluent Building |
| 127 | +```php |
| 128 | +$executor = (new EtlExecutor()) |
| 129 | + ->extractFrom($extractor) |
| 130 | + ->transformWith($transformer) |
| 131 | + ->loadInto($loader) |
| 132 | + ->onTransform(fn($event) => /* ... */) |
| 133 | + ->process($source, $destination); |
| 134 | +``` |
| 135 | + |
| 136 | +### NextTick Callbacks |
| 137 | +- `$state->nextTick(callable $callback)` - Schedule callback after current item |
| 138 | +- Useful for deferring operations or cleanup |
| 139 | +- Consumed between items and guaranteed to run even if workflow stops |
| 140 | + |
| 141 | +### Batch Transform |
| 142 | +- Configure via `new EtlConfiguration(batchSize: N)` to group N items per batch |
| 143 | +- Requires a transformer implementing `BatchTransformerInterface` (separate from `TransformerInterface`) |
| 144 | +- Processing flow: items are chunked via `iterable_chunk()`, then for each chunk: |
| 145 | + 1. ExtractEvent fires per item (items can be skipped individually) |
| 146 | + 2. `transform(array $items, EtlState $state): Generator` is called once for the whole batch |
| 147 | + 3. Each yielded result goes through TransformEvent → Load individually |
| 148 | +- `nextTick` callbacks are consumed between batches, not between items within a batch |
| 149 | +- When `batchSize` is set but transformer is not `BatchTransformerInterface`, batching is ignored |
| 150 | +- Note: `$state->currentItemKey` during Transform/Load events points to the last item of the batch |
| 151 | + |
| 152 | +### Flush Timing |
| 153 | +- Configurable via `new EtlConfiguration(flushEvery: N)` |
| 154 | +- `flush()` called when: frequency threshold reached, or at end (with `$isEarly = false`) |
| 155 | +- Early flush = during processing, final flush = at termination |
| 156 | + |
| 157 | +## Testing Patterns |
| 158 | + |
| 159 | +- Tests are organized in `tests/Behavior/` and `tests/Unit/` |
| 160 | +- Use Pest syntax (`test()`, `expect()`, `it()`) |
| 161 | +- Mock with Mockery when needed |
| 162 | +- Coverage is tracked - don't reduce it |
| 163 | + |
| 164 | +## Development Notes |
| 165 | + |
| 166 | +- PHP 8.2+ features are welcome (readonly properties, enums, etc.) |
| 167 | +- Prefer immutability and value objects |
| 168 | +- Event listeners should be side-effect free when possible |
| 169 | +- Transformers returning generators (yield) allow 1-to-many transformations |
| 170 | +- Loaders can implement `ConditionalLoaderInterface` to skip certain items |
0 commit comments