|
1 | 1 | # Batch Processing Sample |
2 | 2 |
|
3 | | -This sample demonstrates how to process large batches of tasks with controlled concurrency using Cadence's `x.NewBatchFuture` functionality. |
| 3 | +This sample demonstrates how to process large batches of activities with controlled concurrency using Cadence's `workflow.NewBatchFuture` functionality, while respecting the 1024 pending activities limit per workflow. |
4 | 4 |
|
5 | | -## What it does |
| 5 | +## The problem it solves |
| 6 | + |
| 7 | +**The Problem**: When processing large datasets (thousands of records, files, or API calls), you face a dilemma: |
| 8 | +- **Sequential processing**: Too slow, poor user experience |
| 9 | +- **Unlimited concurrency**: Overwhelms databases, APIs, or downstream services |
| 10 | +- **Manual concurrency control**: Complex error handling and resource management |
| 11 | +- **Cadence limits**: Max 1024 pending activities per workflow |
| 12 | + |
| 13 | +**Real-world scenarios**: |
| 14 | +- Processing 10,000 user records for a migration |
| 15 | +- Sending emails to 50,000 subscribers |
| 16 | +- Generating reports for 1,000 customers |
| 17 | +- Processing files in a data pipeline |
| 18 | + |
| 19 | +### The Solution |
| 20 | +`workflow.NewBatchFuture` provides a robust solution: |
| 21 | + |
| 22 | +**Controlled Concurrency**: Process items in parallel while respecting system limits |
| 23 | +**Automatic Error Handling**: Failed activities don't crash the entire batch |
| 24 | +**Resource Efficiency**: Optimal throughput without overwhelming downstream services |
| 25 | +**Built-in Observability**: Monitoring, retries, and failure tracking |
| 26 | +**Workflow Integration**: Seamless integration with Cadence's workflow engine |
| 27 | + |
| 28 | +This eliminates the need to build custom concurrency control, error handling, and monitoring systems. |
| 29 | + |
| 30 | +## Sample behavior |
6 | 31 |
|
7 | 32 | - Creates a configurable number of activities (default: 10) |
8 | 33 | - Executes them with controlled concurrency (default: 3) |
9 | | -- Simulates work with random delays (900-999ms per task) |
| 34 | +- Simulates work with random delays (900-999ms per activity) |
10 | 35 | - Handles cancellation gracefully |
11 | 36 |
|
12 | | -## Real-world use cases |
| 37 | +## Technical considerations |
13 | 38 |
|
14 | | -- Batch data processing |
15 | | -- Bulk operations |
16 | | -- ETL jobs |
17 | | -- Report generation |
18 | | -- File processing |
| 39 | +- **Cadence limit**: Maximum 1024 pending activities per workflow |
| 40 | +- **Resource management**: Controlled concurrency prevents system overload |
| 41 | +- **Error handling**: Failed activities don't crash the entire batch |
19 | 42 |
|
20 | 43 | ## How to run |
21 | 44 |
|
22 | | -Start Worker: |
| 45 | +1. Build the sample: |
| 46 | +```bash |
| 47 | +make batch |
| 48 | +``` |
| 49 | + |
| 50 | +2. Start Worker: |
23 | 51 | ```bash |
24 | 52 | ./bin/batch -m worker |
25 | 53 | ``` |
26 | 54 |
|
27 | | -Start Workflow: |
| 55 | +3. Start Workflow: |
28 | 56 | ```bash |
29 | 57 | ./bin/batch -m trigger |
30 | 58 | ``` |
31 | | - |
32 | | -## Key concepts |
33 | | - |
34 | | -- **Batch processing**: Process multiple tasks efficiently |
35 | | -- **Concurrency control**: Limit simultaneous executions |
36 | | -- **Activity factories**: Lazy evaluation of activities |
37 | | -- **Future-based execution**: Asynchronous task management |
38 | | -- **Context cancellation**: Graceful handling of timeouts |
|
0 commit comments