Skip to content

Commit 74ee138

Browse files
committed
init gobatch doc
0 parents  commit 74ee138

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+2991
-0
lines changed

.github/workflows/docs.yml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: Deploy Documentation
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
paths:
8+
- '**' # 监控所有文件变化
9+
10+
jobs:
11+
deploy:
12+
runs-on: ubuntu-latest
13+
steps:
14+
- uses: actions/checkout@v3
15+
16+
- name: Setup Node.js
17+
uses: actions/setup-node@v3
18+
with:
19+
node-version: '16'
20+
21+
- name: Install dependencies
22+
run: |
23+
npm install -g gitbook-cli
24+
gitbook install
25+
26+
- name: Build documentation
27+
run: |
28+
gitbook build
29+
30+
- name: Deploy to GitHub Pages
31+
uses: peaceiris/actions-gh-pages@v3
32+
with:
33+
github_token: ${{ secrets.GITHUB_TOKEN }}
34+
publish_dir: ./_book
35+
force_orphan: true

SUMMARY.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Summary
2+
3+
* 中文文档
4+
* [简介](zh/introduction.md)
5+
* [快速开始](zh/quickstart.md)
6+
* [基础知识](zh/basics.md)
7+
* [架构设计](zh/architecture.md)
8+
* [任务介绍](zh/job.md)
9+
* [Step介绍](zh/step.md)
10+
* [依赖](zh/dependencies.md)
11+
* [配置](zh/configuration.md)
12+
* [一个简单示例](zh/usage_examples.md)
13+
* [文件处理示例](zh/file_examples.md)
14+
* [贡献指南](zh/contribution_guide.md)
15+
* [问题反馈](zh/feedback.md)
16+
* English Documentation
17+
* [Introduction](en/introduction.md)
18+
* [Quick Start](en/quickstart.md)
19+
* [Basics](en/basics.md)
20+
* [Architecture](en/architecture.md)
21+
* [Job](en/job.md)
22+
* [Step](en/step.md)
23+
* [Dependencies](en/dependencies.md)
24+
* [Configuration](en/configuration.md)
25+
* [A Simple Example](en/usage_examples.md)
26+
* [File Example](en/file_examples.md)
27+
* [Contribution Guide](en/contribution_guide.md)
28+
* [Feedback](en/feedback.md)

book.json

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
{
2+
"title": "GoBatch Documentation",
3+
"description": "GoBatch batch processing framework documentation",
4+
"language": "zh-hans",
5+
"plugins": [
6+
"-sharing",
7+
"expandable-chapters",
8+
"copy-code-button",
9+
"language-picker"
10+
],
11+
"pluginsConfig": {
12+
"language-picker": {
13+
"grid-columns": 2,
14+
"languages": [
15+
{
16+
"lang": "zh-hans",
17+
"name": "简体中文",
18+
"link": "zh"
19+
},
20+
{
21+
"lang": "en",
22+
"name": "English",
23+
"link": "en"
24+
}
25+
]
26+
}
27+
}
28+
}

en/api_reference.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# API Reference
2+
3+
The complete API documentation is available at [GoDoc](http://godoc.org/github.com/chararch/gobatch).

en/architecture.md

Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# Architecture Design
2+
3+
## Overall Architecture
4+
5+
GoBatch consists of three layers:
6+
7+
1. **Interface Layer**
8+
- Provides APIs for upper-level applications
9+
- Includes job orchestration, management, start and pause operations
10+
11+
2. **Core Layer**
12+
- Provides job execution engine
13+
- Includes common components for data processing, file I/O, parallel processing, and error handling
14+
15+
3. **Foundation Layer**
16+
- Goroutine pool management
17+
- Transaction management
18+
- Job execution state recording
19+
- Logging
20+
21+
[](../images/layer.png)
22+
23+
As a batch processing framework, GoBatch's core capabilities are job orchestration and execution. Applications must first complete job orchestration through GoBatch interfaces before executing tasks.
24+
25+
In terms of job structure, a Job consists of multiple Steps, each containing business logic, executed in sequence. Job orchestration involves constructing different business logic into multiple Steps and assembling them into a Job in a specific order, managed by the GoBatch runtime. As a batch processing framework, GoBatch can manage multiple jobs.
26+
27+
During job execution, applications can pass parameters to specified jobs. GoBatch generates a JobInstance based on the input parameters. A JobInstance may be executed multiple times, and for each execution, GoBatch creates a JobExecution record to track the execution state. Similarly, each Step execution generates a StepExecution record. GoBatch stores JobInstance, JobExecution, and StepExecution through Repository in the database.
28+
29+
GoBatch supports multiple ways to trigger job execution. Applications can trigger jobs through scheduled tasks, real-time events, or command-line interfaces.
30+
31+
The execution flow of GoBatch batch processing applications is as follows:
32+
33+
[](../images/arch.png)
34+
35+
## Core Components
36+
37+
### Job
38+
Job is the highest-level concept in batch processing, representing a complete batch task. Each Job contains one or more Steps executed in a specific order. The main responsibility of a Job is to coordinate the execution of Steps. For detailed information about Jobs, see [Job](job.md).
39+
40+
### Step
41+
Step is an independent processing unit within a Job. GoBatch supports three types of steps:
42+
43+
1. **SimpleStep**
44+
- Executes a task in a single thread
45+
- Suitable for simple processing logic
46+
- Implements business logic through Handler or Task interface
47+
48+
2. **ChunkStep**
49+
- Processes data in chunks
50+
- Implements "read-process-write" pattern
51+
- Supports transaction management
52+
- Main components:
53+
- ItemReader: Data reading
54+
- ItemProcessor: Data processing
55+
- ItemWriter: Data writing
56+
57+
3. **PartitionStep**
58+
- Supports parallel processing
59+
- Splits large tasks into subtasks
60+
- Can aggregate subtask results
61+
- Main components:
62+
- Partitioner: Task partitioning
63+
- Aggregator: Result aggregation
64+
65+
For detailed information about Steps, see [Step](step.md).
66+
67+
### Builders
68+
69+
1. **JobBuilder**
70+
- Used to build Job instances
71+
- Supports Steps and Listeners configuration
72+
- Provides fluent API
73+
74+
2. **StepBuilder**
75+
- Used to build Step instances
76+
- Supports Reader, Processor, Writer configuration
77+
- Supports partition and listener configuration
78+
- Provides fluent API
79+
80+
[](../images/builder.png)
81+
82+
## Execution Mechanism
83+
84+
### Job Orchestration
85+
1. **Step Building**
86+
- Create Step instances using StepBuilder
87+
- Configure Step processing logic and behavior
88+
- Set listeners and other parameters
89+
90+
[](../images/step_builder.png)
91+
92+
2. **Job Building**
93+
- Create Job instances using JobBuilder
94+
- Add Steps and configure execution order
95+
- Set Job-level listeners
96+
97+
[](../images/job_reassemble.png)
98+
99+
3. **Registration**
100+
- Register Job to JobRegistry
101+
- Support runtime Job lookup and management
102+
103+
### Job Execution
104+
105+
1. **Job Execution Flow**
106+
- Parameter validation
107+
- Create JobInstance and JobExecution
108+
- Execute Steps in sequence
109+
- State management and context maintenance
110+
- Process execution results
111+
112+
2. **Step Execution Flow**
113+
- Step initialization
114+
- Resource allocation
115+
- Execute business logic
116+
- SimpleStep: Direct Handler execution
117+
- ChunkStep: Iterative read-process-write
118+
- PartitionStep: Parallel subtask execution
119+
- Resource cleanup
120+
- State update
121+
122+
[](../images/start_job.png)
123+
124+
### Transaction Management
125+
126+
1. **TransactionManager**
127+
- Manage database transactions
128+
- Provide transaction begin, commit, and rollback operations
129+
- Support custom transaction managers
130+
131+
2. **Chunk Processing**
132+
- Each Chunk as a transaction unit
133+
- Support failure rollback
134+
- Provide retry mechanism
135+
136+
## Extension Mechanism
137+
138+
### Listener Interfaces
139+
140+
1. **JobListener**
141+
- BeforeJob: Callback before job execution
142+
- AfterJob: Callback after job execution
143+
144+
2. **StepListener**
145+
- BeforeStep: Callback before step execution
146+
- AfterStep: Callback after step execution
147+
148+
3. **ChunkListener**
149+
- BeforeChunk: Callback before chunk processing
150+
- AfterChunk: Callback after chunk processing
151+
- OnError: Error handling callback
152+
153+
4. **PartitionListener**
154+
- BeforePartition: Callback before partitioning
155+
- AfterPartition: Callback after partitioning
156+
- OnError: Error handling callback
157+
158+
## State Management
159+
160+
### Execution State Recording
161+
GoBatch records runtime states through the following objects:
162+
163+
1. **JobInstance**
164+
- Corresponds to a set of parameters for a Job
165+
- Same parameters map to the same JobInstance
166+
167+
2. **JobExecution**
168+
- Corresponds to one execution of a JobInstance
169+
- Restart generates new JobExecution
170+
171+
3. **StepContext**
172+
- Corresponds to Step context under a JobInstance
173+
- Independent of execution count
174+
175+
4. **StepExecution**
176+
- Corresponds to Step execution under a JobExecution
177+
- Restart generates new StepExecution
178+
179+
The database table relationships of these 4 objects are as follows:
180+
[](../images/status_record.png)
181+
182+
### State Transitions
183+
Job and Step execution states:
184+
- STARTING: Waiting for execution
185+
- STARTED: Currently executing
186+
- STOPPING: Stopping in progress
187+
- STOPPED: Stopped
188+
- COMPLETED: Successfully completed
189+
- FAILED: Execution failed
190+
- UNKNOWN: Unknown state
191+
192+
[](../images/status_trans.png)

en/basics.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Basics
2+
3+
## Core Concepts
4+
5+
### Job
6+
A Job represents a complete batch processing task. It consists of one or more Steps that are executed in a specific sequence. Each Job has a unique name and can be configured with various parameters and listeners. For detailed information about Jobs, see [Job](job.md).
7+
8+
### Step
9+
A Step is a single phase in a Job that encapsulates an independent unit of processing. GoBatch supports three types of steps:
10+
- **Simple Step**: Executes a single task in one thread
11+
- **Chunk Step**: Processes data in chunks (read-process-write pattern)
12+
- **Partition Step**: Splits a large task into multiple sub-tasks for parallel processing
13+
14+
For detailed information about Steps, see [Step](step.md).
15+
16+
### JobInstance
17+
A JobInstance represents a logical run of a Job, uniquely identified by the Job name and job parameters. Multiple JobExecutions may be created for a single JobInstance in case of failures.
18+
19+
### JobExecution
20+
A JobExecution represents a single attempt to execute a JobInstance. Each execution tracks its status, start time, end time, and execution results.
21+
22+
### StepExecution
23+
A StepExecution represents a single attempt to execute a Step within a JobExecution. It contains information about the step's execution status and results.

en/configuration.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Configuration Guide
2+
3+
## Global Settings
4+
5+
### Database Configuration
6+
7+
GoBatch needs a database to store job and step execution contexts, so you must set up the database connection before running any job.
8+
9+
```go
10+
gobatch.SetDB(sqlDb)
11+
```
12+
13+
### Transaction Manager
14+
15+
For chunk steps, you must register a TransactionManager instance with GoBatch. The transaction manager interface is defined as:
16+
17+
```go
18+
type TransactionManager interface {
19+
BeginTx() (tx interface{}, err BatchError)
20+
Commit(tx interface{}) BatchError
21+
Rollback(tx interface{}) BatchError
22+
}
23+
```
24+
25+
If you have set up the database but haven't set a transaction manager, GoBatch will create a default transaction manager instance for you.
26+
27+
### Concurrency Control
28+
29+
GoBatch uses internal task pools to run jobs and steps. You can set the maximum concurrency using:
30+
31+
```go
32+
// Set maximum running jobs (default: 10)
33+
gobatch.SetMaxRunningJobs(100)
34+
35+
// Set maximum running steps (default: 1000)
36+
gobatch.SetMaxRunningSteps(5000)
37+
```

en/contribution_guide.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Contribution Guide
2+
3+
## Development Setup
4+
5+
1. **Fork Repository**
6+
- Fork [GoBatch repository](https://github.com/chararch/gobatch)
7+
- Clone locally
8+
9+
## Code Standards
10+
11+
- Use `gofmt` to format code
12+
- Add comments for exported items
13+
- Write unit tests
14+
15+
## Submission Process
16+
17+
1. **Create Branch**
18+
- Use meaningful branch names (e.g., `feature/new-reader`)
19+
- One task per branch
20+
21+
2. **Commit Code**
22+
- Clear commit messages
23+
- Reference related issues
24+
25+
3. **Update Documentation**
26+
- Update relevant docs
27+
- Ensure examples work
28+
29+
4. **Submit PR**
30+
- Describe changes
31+
- Ensure tests pass

0 commit comments

Comments
 (0)