|
| 1 | +# Architecture Design |
| 2 | + |
| 3 | +## Overall Architecture |
| 4 | + |
| 5 | +GoBatch consists of three layers: |
| 6 | + |
| 7 | +1. **Interface Layer** |
| 8 | + - Provides APIs for upper-level applications |
| 9 | + - Includes job orchestration, management, start and pause operations |
| 10 | + |
| 11 | +2. **Core Layer** |
| 12 | + - Provides job execution engine |
| 13 | + - Includes common components for data processing, file I/O, parallel processing, and error handling |
| 14 | + |
| 15 | +3. **Foundation Layer** |
| 16 | + - Goroutine pool management |
| 17 | + - Transaction management |
| 18 | + - Job execution state recording |
| 19 | + - Logging |
| 20 | + |
| 21 | +[](../images/layer.png) |
| 22 | + |
| 23 | +As a batch processing framework, GoBatch's core capabilities are job orchestration and execution. Applications must first complete job orchestration through GoBatch interfaces before executing tasks. |
| 24 | + |
| 25 | +In terms of job structure, a Job consists of multiple Steps, each containing business logic, executed in sequence. Job orchestration involves constructing different business logic into multiple Steps and assembling them into a Job in a specific order, managed by the GoBatch runtime. As a batch processing framework, GoBatch can manage multiple jobs. |
| 26 | + |
| 27 | +During job execution, applications can pass parameters to specified jobs. GoBatch generates a JobInstance based on the input parameters. A JobInstance may be executed multiple times, and for each execution, GoBatch creates a JobExecution record to track the execution state. Similarly, each Step execution generates a StepExecution record. GoBatch stores JobInstance, JobExecution, and StepExecution through Repository in the database. |
| 28 | + |
| 29 | +GoBatch supports multiple ways to trigger job execution. Applications can trigger jobs through scheduled tasks, real-time events, or command-line interfaces. |
| 30 | + |
| 31 | +The execution flow of GoBatch batch processing applications is as follows: |
| 32 | + |
| 33 | +[](../images/arch.png) |
| 34 | + |
| 35 | +## Core Components |
| 36 | + |
| 37 | +### Job |
| 38 | +Job is the highest-level concept in batch processing, representing a complete batch task. Each Job contains one or more Steps executed in a specific order. The main responsibility of a Job is to coordinate the execution of Steps. For detailed information about Jobs, see [Job](job.md). |
| 39 | + |
| 40 | +### Step |
| 41 | +Step is an independent processing unit within a Job. GoBatch supports three types of steps: |
| 42 | + |
| 43 | +1. **SimpleStep** |
| 44 | + - Executes a task in a single thread |
| 45 | + - Suitable for simple processing logic |
| 46 | + - Implements business logic through Handler or Task interface |
| 47 | + |
| 48 | +2. **ChunkStep** |
| 49 | + - Processes data in chunks |
| 50 | + - Implements "read-process-write" pattern |
| 51 | + - Supports transaction management |
| 52 | + - Main components: |
| 53 | + - ItemReader: Data reading |
| 54 | + - ItemProcessor: Data processing |
| 55 | + - ItemWriter: Data writing |
| 56 | + |
| 57 | +3. **PartitionStep** |
| 58 | + - Supports parallel processing |
| 59 | + - Splits large tasks into subtasks |
| 60 | + - Can aggregate subtask results |
| 61 | + - Main components: |
| 62 | + - Partitioner: Task partitioning |
| 63 | + - Aggregator: Result aggregation |
| 64 | + |
| 65 | +For detailed information about Steps, see [Step](step.md). |
| 66 | + |
| 67 | +### Builders |
| 68 | + |
| 69 | +1. **JobBuilder** |
| 70 | + - Used to build Job instances |
| 71 | + - Supports Steps and Listeners configuration |
| 72 | + - Provides fluent API |
| 73 | + |
| 74 | +2. **StepBuilder** |
| 75 | + - Used to build Step instances |
| 76 | + - Supports Reader, Processor, Writer configuration |
| 77 | + - Supports partition and listener configuration |
| 78 | + - Provides fluent API |
| 79 | + |
| 80 | +[](../images/builder.png) |
| 81 | + |
| 82 | +## Execution Mechanism |
| 83 | + |
| 84 | +### Job Orchestration |
| 85 | +1. **Step Building** |
| 86 | + - Create Step instances using StepBuilder |
| 87 | + - Configure Step processing logic and behavior |
| 88 | + - Set listeners and other parameters |
| 89 | + |
| 90 | +[](../images/step_builder.png) |
| 91 | + |
| 92 | +2. **Job Building** |
| 93 | + - Create Job instances using JobBuilder |
| 94 | + - Add Steps and configure execution order |
| 95 | + - Set Job-level listeners |
| 96 | + |
| 97 | +[](../images/job_reassemble.png) |
| 98 | + |
| 99 | +3. **Registration** |
| 100 | + - Register Job to JobRegistry |
| 101 | + - Support runtime Job lookup and management |
| 102 | + |
| 103 | +### Job Execution |
| 104 | + |
| 105 | +1. **Job Execution Flow** |
| 106 | + - Parameter validation |
| 107 | + - Create JobInstance and JobExecution |
| 108 | + - Execute Steps in sequence |
| 109 | + - State management and context maintenance |
| 110 | + - Process execution results |
| 111 | + |
| 112 | +2. **Step Execution Flow** |
| 113 | + - Step initialization |
| 114 | + - Resource allocation |
| 115 | + - Execute business logic |
| 116 | + - SimpleStep: Direct Handler execution |
| 117 | + - ChunkStep: Iterative read-process-write |
| 118 | + - PartitionStep: Parallel subtask execution |
| 119 | + - Resource cleanup |
| 120 | + - State update |
| 121 | + |
| 122 | +[](../images/start_job.png) |
| 123 | + |
| 124 | +### Transaction Management |
| 125 | + |
| 126 | +1. **TransactionManager** |
| 127 | + - Manage database transactions |
| 128 | + - Provide transaction begin, commit, and rollback operations |
| 129 | + - Support custom transaction managers |
| 130 | + |
| 131 | +2. **Chunk Processing** |
| 132 | + - Each Chunk as a transaction unit |
| 133 | + - Support failure rollback |
| 134 | + - Provide retry mechanism |
| 135 | + |
| 136 | +## Extension Mechanism |
| 137 | + |
| 138 | +### Listener Interfaces |
| 139 | + |
| 140 | +1. **JobListener** |
| 141 | + - BeforeJob: Callback before job execution |
| 142 | + - AfterJob: Callback after job execution |
| 143 | + |
| 144 | +2. **StepListener** |
| 145 | + - BeforeStep: Callback before step execution |
| 146 | + - AfterStep: Callback after step execution |
| 147 | + |
| 148 | +3. **ChunkListener** |
| 149 | + - BeforeChunk: Callback before chunk processing |
| 150 | + - AfterChunk: Callback after chunk processing |
| 151 | + - OnError: Error handling callback |
| 152 | + |
| 153 | +4. **PartitionListener** |
| 154 | + - BeforePartition: Callback before partitioning |
| 155 | + - AfterPartition: Callback after partitioning |
| 156 | + - OnError: Error handling callback |
| 157 | + |
| 158 | +## State Management |
| 159 | + |
| 160 | +### Execution State Recording |
| 161 | +GoBatch records runtime states through the following objects: |
| 162 | + |
| 163 | +1. **JobInstance** |
| 164 | + - Corresponds to a set of parameters for a Job |
| 165 | + - Same parameters map to the same JobInstance |
| 166 | + |
| 167 | +2. **JobExecution** |
| 168 | + - Corresponds to one execution of a JobInstance |
| 169 | + - Restart generates new JobExecution |
| 170 | + |
| 171 | +3. **StepContext** |
| 172 | + - Corresponds to Step context under a JobInstance |
| 173 | + - Independent of execution count |
| 174 | + |
| 175 | +4. **StepExecution** |
| 176 | + - Corresponds to Step execution under a JobExecution |
| 177 | + - Restart generates new StepExecution |
| 178 | + |
| 179 | +The database table relationships of these 4 objects are as follows: |
| 180 | +[](../images/status_record.png) |
| 181 | + |
| 182 | +### State Transitions |
| 183 | +Job and Step execution states: |
| 184 | +- STARTING: Waiting for execution |
| 185 | +- STARTED: Currently executing |
| 186 | +- STOPPING: Stopping in progress |
| 187 | +- STOPPED: Stopped |
| 188 | +- COMPLETED: Successfully completed |
| 189 | +- FAILED: Execution failed |
| 190 | +- UNKNOWN: Unknown state |
| 191 | + |
| 192 | +[](../images/status_trans.png) |
0 commit comments