Skip to content

Commit 12e9a8d

Browse files
authored
[!] add YAML-based chain definitions (#720)
Introduces support for defining task chains in YAML format, enhancing the readability and maintainability of chain configurations. Users can now define chains and tasks in a structured YAML file and load them directly into database. Simplifies chain configuration compared to SQL inserts. Improves user experience with a human-readable format. Facilitates better version control and collaboration on chain definitions.
1 parent 61518f6 commit 12e9a8d

28 files changed

+2762
-74
lines changed

README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,40 @@ SELECT timetable.add_job('reindex-job', '0 0 * * 7', 'bash',
5656
- Full support for database driven logging
5757
- Enhanced cron-style scheduling
5858
- Optional concurrency protection
59+
- **NEW**: YAML-based chain definitions for easy configuration
60+
61+
## YAML Configuration
62+
63+
You can now define chains using YAML files instead of SQL inserts, making configuration more readable and maintainable:
64+
65+
```yaml
66+
chains:
67+
- name: "Daily ETL Pipeline"
68+
schedule: "0 2 * * *" # 2 AM daily
69+
live: true
70+
max_instances: 1
71+
timeout: 3600000 # 1 hour
72+
73+
tasks:
74+
- name: "Extract data"
75+
command: "SELECT extract_sales_data($1)"
76+
parameters: ["yesterday"]
77+
78+
- name: "Transform data"
79+
command: "CALL transform_sales_data()"
80+
autonomous: true
81+
82+
- name: "Load to warehouse"
83+
command: "CALL load_to_warehouse()"
84+
```
85+
86+
Load YAML chains with:
87+
88+
```bash
89+
pg_timetable --file chains.yaml --connstr "postgresql://user:pass@host/db"
90+
```
91+
92+
See [`samples/yaml/`](samples/yaml/) for more examples and [`docs/yaml-format.md`](docs/yaml-format.md) for complete format specification.
5993

6094
## Installation
6195

docs/components.md

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,15 @@ The scheduling in **pg_timetable** encompasses three different abstraction level
1313
Currently, there are three different kinds of commands:
1414

1515
### `SQL`
16+
1617
SQL snippet. Starting a cleanup, refreshing a materialized view or processing data.
1718

1819
### `PROGRAM`
20+
1921
External Command. Anything that can be called as an external binary, including shells, e.g. `bash`, `pwsh`, etc. The external command will be called using golang's [exec.CommandContext](https://pkg.go.dev/os/exec#CommandContext).
2022

2123
### `BUILTIN`
24+
2225
Internal Command. A prebuilt functionality included in **pg_timetable**. These include:
2326

2427
* *NoOp*
@@ -78,42 +81,52 @@ In most cases, they have to be brought to live by passing input parameters to th
7881
Depending on the **command** kind argument can be represented by different *JSON* values.
7982

8083
#### `SQL`
84+
8185
Schema: `array`
8286

8387
Example:
88+
8489
```sql
8590
'[ "one", 2, 3.14, false ]'::jsonb
8691
```
8792

8893
#### `PROGRAM`
94+
8995
Schema: `array of strings`
9096

9197
Example:
98+
9299
```sql
93100
'["-x", "Latin-ASCII", "-o", "orte_ansi.txt", "orte.txt"]'::jsonb
94101
```
95102

96103
#### `BUILTIN: Sleep`
104+
97105
Schema: `integer`
98106

99107
Example:
108+
100109
```sql
101110
'5' :: jsonb
102111
```
103112

104113
#### `BUILTIN: Log`
114+
105115
Schema: `any`
106116

107117
Examples:
118+
108119
```sql
109120
'"WARNING"'::jsonb
110121
'{"Status": "WARNING"}'::jsonb
111122
```
112123

113124
#### `BUILTIN: SendMail`
125+
114126
Schema: `object`
115127

116128
Example:
129+
117130
```sql
118131
'{
119132
"username": "[email protected]",
@@ -133,9 +146,11 @@ Example:
133146
```
134147

135148
#### `BUILTIN: Download`
149+
136150
Schema: `object`
137151

138152
Example:
153+
139154
```sql
140155
'{
141156
"workersnum": 2,
@@ -145,9 +160,11 @@ Example:
145160
```
146161

147162
#### `BUILTIN: CopyFromFile`
163+
148164
Schema: `object`
149165

150166
Example:
167+
151168
```sql
152169
'{
153170
"sql": "COPY location FROM STDIN",
@@ -156,9 +173,11 @@ Example:
156173
```
157174

158175
#### `BUILTIN: CopyToFile`
176+
159177
Schema: `object`
160178

161179
Example:
180+
162181
```sql
163182
'{
164183
"sql": "COPY location TO STDOUT",
@@ -167,10 +186,12 @@ Example:
167186
```
168187

169188
#### `BUILTIN: Shutdown`
170-
*value ignored*
189+
190+
value ignored
171191

172192
#### `BUILTIN: NoOp`
173-
*value ignored*
193+
194+
value ignored
174195

175196
## Chain
176197

@@ -202,4 +223,4 @@ Once tasks have been arranged, they have to be scheduled as a **chain**. For thi
202223

203224
-- Run VACUUM at 00:05 every day in August UTC
204225
SELECT timetable.add_job('execute-func', '5 0 * 8 *', 'VACUUM');
205-
```
226+
```

docs/yaml-format.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# YAML Chain Definition Format for pg_timetable
2+
3+
This document defines the YAML format for defining chains of scheduled tasks in pg_timetable.
4+
5+
## YAML Schema
6+
7+
```yaml
8+
# Top-level structure
9+
chains:
10+
- name: "chain-name" # Required: chain_name (TEXT, unique)
11+
schedule: "* * * * *" # Required: run_at (cron format)
12+
live: true # Optional: live (BOOLEAN), default: false
13+
max_instances: 1 # Optional: max_instances (INTEGER)
14+
timeout: 30000 # Optional: timeout in milliseconds (INTEGER)
15+
self_destruct: false # Optional: self_destruct (BOOLEAN), default: false
16+
exclusive: false # Optional: exclusive_execution (BOOLEAN), default: false
17+
client_name: "worker-1" # Optional: client_name (TEXT)
18+
on_error: "SELECT log_error()" # Optional: on_error SQL (TEXT)
19+
20+
tasks: # Required: array of tasks
21+
- name: "task-1" # Optional: task_name (TEXT)
22+
kind: "SQL" # Optional: kind (SQL|PROGRAM|BUILTIN), default: SQL
23+
command: "SELECT $1, $2" # Required: command (TEXT)
24+
parameters: # Optional: parameters (array of execution parameters)
25+
- ["value1", 42] # First execution with these parameters
26+
- ["value2", 99] # Second execution with different parameters
27+
run_as: "postgres" # Optional: run_as (TEXT) - role for SET ROLE
28+
connect_string: "postgresql://user@host/otherdb" # Optional: database_connection (TEXT)
29+
ignore_error: false # Optional: ignore_error (BOOLEAN), default: false
30+
autonomous: false # Optional: autonomous (BOOLEAN), default: false
31+
timeout: 5000 # Optional: timeout in milliseconds (INTEGER)
32+
33+
- name: "task-2"
34+
kind: "PROGRAM"
35+
command: "bash"
36+
parameters: ["-c", "echo hello"]
37+
ignore_error: true
38+
```
39+
40+
## Field Mappings
41+
42+
### Chain Level
43+
44+
| YAML Field | DB Column | Type | Default | Description |
45+
|------------|-----------|------|---------|-------------|
46+
| `name` | `chain_name` | TEXT | **required** | Unique chain identifier |
47+
| `schedule` | `run_at` | cron | **required** | Cron-style schedule |
48+
| `live` | `live` | BOOLEAN | `false` | Whether chain is active |
49+
| `max_instances` | `max_instances` | INTEGER | `null` | Max parallel instances |
50+
| `timeout` | `timeout` | INTEGER | `0` | Chain timeout (ms) |
51+
| `self_destruct` | `self_destruct` | BOOLEAN | `false` | Delete after success |
52+
| `exclusive` | `exclusive_execution` | BOOLEAN | `false` | Pause other chains |
53+
| `client_name` | `client_name` | TEXT | `null` | Restrict to specific client |
54+
| `on_error` | `on_error` | TEXT | `null` | Error handling SQL |
55+
56+
### Task Level
57+
58+
| YAML Field | DB Column | Type | Default | Description |
59+
|------------|-----------|------|---------|-------------|
60+
| `name` | `task_name` | TEXT | `null` | Task description |
61+
| `kind` | `kind` | ENUM | `'SQL'` | Command type (SQL/PROGRAM/BUILTIN) |
62+
| `command` | `command` | TEXT | **required** | Command to execute |
63+
| `parameters` | via `timetable.parameter` | Array of any | `null` | Array of parameter values stored as individual JSONB rows with order_id |
64+
| `run_as` | `run_as` | TEXT | `null` | Role for SET ROLE |
65+
| `connect_string` | `database_connection` | TEXT | `null` | Connection string |
66+
| `ignore_error` | `ignore_error` | BOOLEAN | `false` | Continue on error |
67+
| `autonomous` | `autonomous` | BOOLEAN | `false` | Execute outside transaction |
68+
| `timeout` | `timeout` | INTEGER | `0` | Task timeout (ms) |
69+
70+
## Task Ordering
71+
72+
Tasks are ordered sequentially within a chain based on their array position. The system will automatically assign appropriate `task_order` values with spacing (e.g., 10, 20, 30) to allow future insertions.
73+
74+
## Examples
75+
76+
### Simple SQL Job
77+
78+
```yaml
79+
chains:
80+
- name: "daily-report"
81+
schedule: "0 9 * * *" # 9 AM daily
82+
live: true
83+
tasks:
84+
- name: "generate-report"
85+
command: "CALL generate_daily_report()"
86+
```
87+
88+
### Multi-task Chain
89+
90+
```yaml
91+
chains:
92+
- name: "etl-pipeline"
93+
schedule: "0 2 * * *" # 2 AM daily
94+
live: true
95+
max_instances: 1
96+
timeout: 3600000 # 1 hour
97+
98+
tasks:
99+
- name: "extract-data"
100+
command: "SELECT extract_sales_data($1)"
101+
parameters: ["2023-01-01"]
102+
103+
- name: "transform-data"
104+
command: "CALL transform_sales_data()"
105+
autonomous: true
106+
107+
- name: "load-data"
108+
command: "CALL load_to_warehouse()"
109+
ignore_error: false
110+
```
111+
112+
### Program Task
113+
114+
```yaml
115+
chains:
116+
- name: "backup-job"
117+
schedule: "0 3 * * 0" # Sunday 3 AM
118+
live: true
119+
120+
tasks:
121+
- name: "pg-dump"
122+
kind: "PROGRAM"
123+
command: "pg_dump"
124+
parameters:
125+
- ["-h", "localhost", "-U", "postgres", "-d", "mydb", "-f", "/backups/mydb.sql"]
126+
```
127+
128+
## Validation Rules
129+
130+
1. **Required Fields**: `name`, `schedule`, `tasks`, and `command` for each task
131+
2. **Unique Names**: Chain names must be unique across the database
132+
3. **Valid Cron**: Schedule must be valid cron format (5 fields)
133+
4. **Valid Kind**: Task kind must be one of: SQL, PROGRAM, BUILTIN
134+
5. **Parameter Types**: Parameters can be any JSON-compatible type (strings, numbers, booleans, arrays, objects) and are stored as individual JSONB values
135+
6. **Timeout Values**: Must be non-negative integers (milliseconds)

0 commit comments

Comments
 (0)