[Docs] Add LLM contribution guide by corgy-w · Pull Request #10411 · apache/seatunnel

corgy-w · 2026-01-28T15:37:38Z

Purpose of this pull request

This PR introduces AGENTS.md to the repository root.

Motivation:
As AI coding assistants (like Cursor, Windsurf, Trae, etc.) are becoming more popular, having a dedicated context file helps them understand the project structure, coding standards, and testing procedures of Apache SeaTunnel. This file mirrors the AGENTS.md practice found in other mature Apache projects (e.g., Apache Superset) but adapts it specifically to SeaTunnel's engineering conventions.

Key contents of AGENTS.md:

Critical Requirements: Validation steps (Spotless, Verify, Test) before pushing.
Git Convention: Standardized commit message format.
Project Structure: Overview of key modules like seatunnel-connectors-v2 and seatunnel-engine.
Tech Stack: Rules for dependencies (e.g., using shaded packages) and code style.
Testing Strategy: Guidelines for Unit Tests and E2E Tests.
Documentation: Requirement for bilingual (EN/ZH) updates.

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If necessary, please update incompatible-changes.md to describe the incompatibility caused by this PR.
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config

davidzollo · 2026-01-28T15:45:54Z

AGENTS.md

+Any incompatible change MUST:
+
+* Be explicitly documented


Incompatible changes should be documented in docs/en/introduction/concepts/incompatible-changes.md

davidzollo · 2026-01-28T15:46:30Z

Good practice for AI coding

davidzollo

+1
LGTM
It's better to add soft links for three LLMs

AGENTS.md

chl-wxp · 2026-01-29T01:36:29Z

Isn’t Chinese documentation in the plan?

corgy-w · 2026-01-29T06:22:12Z

Isn’t Chinese documentation in the plan?

@chl-wxp I took a look at the loading logic of commonly used tools. It's all unnecessary to load Chinese items like AGENT/GPT in the root directory

DanielCarter-stack · 2026-01-29T06:49:32Z

Issue 1: Missing Chinese version of documentation `AGENTS_ZH.md`

Location: Root directory

# Files promised but not created in PR
AGENTS_ZH.md

Related Context:

PR description explicitly states: "This PR introduces AGENTS.md (and its Chinese version AGENTS_ZH.md)"
Project documentation standards require: "Any user-visible change MUST update: docs/en AND docs/zh"
Existing Chinese documentation directory: docs/zh/ exists with complete structure

Issue Description:
The PR description explicitly promises to create a Chinese version AGENTS_ZH.md, but the actual changes only create an English version AGENTS.md and three symlinks. This violates:

The explicit promise in the PR description
The project's own bilingual documentation standards (also mentioned in AGENTS.md lines 170-177)

Potential Risks:

Risk 1: Chinese contributors using AI assistants may not receive optimal guidance
Risk 2: Violates the project's bilingual documentation convention, causing inconsistent documentation structure
Risk 3: May set a wrong example for other contributors to "ignore Chinese documentation"

Impact Scope:

Direct impact: Chinese contributors and developers using Chinese AI models
Indirect impact: Project internationalization strategy and documentation consistency
Impact level: Project-level (documentation standards)

Severity: MAJOR

Improvement Suggestions:

# Option 1: Create AGENTS_ZH.md (recommended)
# Need to translate AGENTS.md to Chinese, keeping structure consistent

# Option 2: Create symlink to English version (temporary solution)
ln -s AGENTS.md AGENTS_ZH.md
# But this is not a good long-term solution

# Option 3: Defer to follow-up PR
# Clearly state in PR description that Chinese version will be added in follow-up PR
# Need to create Issue to track

Rationale: The project has clear bilingual documentation standards, and the PR description promised a Chinese version. Suggest supplementing before merge, or explicitly stating the plan for the Chinese version in the PR description.

Issue 2: Startup script path error

Location: AGENTS.md:233-237

### Run Job (Zeta)
```bash
sh bin/seatunnel.sh --config config/v2.batch.config.template -e local


** **Related Context:
- 实际 seatunnel.sh 位置：`seatunnel-core/seatunnel-starter/src/main/bin/seatunnel.sh`
- config 模板位置：`config/v2.batch.config.template` ✓ 存在
- 文档声称的路径：`bin/seatunnel.sh` ✗ 不存在

** **Problem Description:
文档中提供的运行命令使用了错误的路径。`bin/` 目录下只有 `install-plugin.sh`，没有 `seatunnel.sh`。实际的 `seatunnel.sh` 位于 `seatunnel-core/seatunnel-starter/src/main/bin/`。

这会导致：
1. 新手直接复制命令会失败
2. AI 助手可能生成错误的运行脚本
3. 降低文档的可信度

** **Potential Risks:
- 风险 1：开发者尝试运行命令时会报错 "No such file or directory"
- 风险 2：浪费开发者时间排查路径问题
- 风险 3：AI 助手可能学习到错误的路径模式

** **Impact Scope:
- 直接影响：尝试运行 SeaTunnel 的开发者
- 间接影响：文档可信度
- 影响面：使用 Zeta 引擎的用户

** **Severity:** MAJOR

** **Improvement Suggestions:
```markdown
### Run Job (Zeta)

**After building from source**, you can run a job using:

```bash

# ## Issue 3: Incomplete directory structure description

** **Location:** `AGENTS.md:60-74`
```markdown
## Repository Structure

```text
seatunnel/
├── seatunnel-api/              # Core API definitions
├── seatunnel-connectors-v2/    # Source & Sink connectors (main contribution area)
├── seatunnel-transforms-v2/    # Transform plugins (including LLM)
├── seatunnel-engine/           # Zeta engine & Web UI
├── seatunnel-core/             # Job submission & CLI entry points
├── seatunnel-translation/      # Flink & Spark adapters
├── seatunnel-formats/          # Data formats (JSON, Avro, etc.)
├── seatunnel-e2e/              # End-to-End integration tests
├── docs/                       # Documentation (en & zh)
└── config/                     # Default configurations


** **Related Context:
- 实际存在的模块（通过 `ls -d seatunnel-*` 确认）：
  - `seatunnel-api` ✓
  - `seatunnel-connectors-v2` ✓
  - `seatunnel-transforms-v2` ✓
  - `seatunnel-engine` ✓
  - `seatunnel-core` ✓
  - `seatunnel-translation` ✓
  - `seatunnel-formats` ✓
  - `seatunnel-e2e` ✓
  - `seatunnel-common` ✗（未列出）
  - `seatunnel-config` ✗（未列出）
  - `seatunnel-dist` ✗（未列出）
  - `seatunnel-examples` ✗（未列出）
  - `seatunnel-plugin-discovery` ✗（未列出）
  - `seatunnel-shade` ✗（未列出）
  - `seatunnel-ci-tools` ✗（未列出）

** **Problem Description:
文档中列出的目录结构不完整，遗漏了 7 个子模块。虽然文档说明是"Overview of key modules"，但：
1. 没有明确说明这是"非完整列表"
2. `seatunnel-common` 是一个核心模块（公共工具类），被遗漏不太合理
3. `seatunnel-dist` 与打包分发相关，也可能被贡献者接触到

** **Potential Risks:
- 风险 1：AI 助手可能认为只有这 9 个模块，在实际修改时遗漏其他模块
- 风险 2：贡献者在需要修改 `seatunnel-common` 时，可能不知道这个模块的存在
- 风险 3：可能误导 AI 助手进行模块推荐

** **Impact Scope:
- 直接影响：需要修改被遗漏模块的贡献者
- 间接影响：AI 助手的模块推荐准确性
- 影响面：多模块

** **Severity:** MINOR

** **Improvement Suggestions:
```markdown
## Repository Structure

```text
seatunnel/
├── seatunnel-api/              # Core API definitions
├── seatunnel-common/           # Common utilities and shared code
├── seatunnel-config/           # Configuration module
├── seatunnel-connectors-v2/    # Source & Sink connectors (main contribution area)
├── seatunnel-transforms-v2/    # Transform plugins (including LLM)
├── seatunnel-engine/           # Zeta engine & Web UI
├── seatunnel-core/             # Job submission & CLI entry points
├── seatunnel-translation/      # Flink & Spark adapters
├── seatunnel-formats/          # Data formats (JSON, Avro, etc.)
├── seatunnel-e2e/              # End-to-End integration tests
├── seatunnel-dist/             # Distribution packaging
├── seatunnel-shade/            # Shaded dependencies
├── seatunnel-plugin-discovery/ # Plugin discovery mechanism
├── seatunnel-ci-tools/         # CI/CD tools
├── seatunnel-examples/         # Example configurations
├── docs/                       # Documentation (en & zh)
└── config/                     # Default configurations

Key modules for contributors:

Connector-V2: Most contributions happen here
Transform-V2: Data transformation plugins
Zeta (seatunnel-engine): Engine development
E2E: Integration tests


** **Reasoning:
1. 列出所有模块，提供完整的视图
2. 添加简短描述说明每个模块的用途
3. 强调"Key modules for contributors"，帮助贡献者聚焦
4. 保持简洁，但提供完整信息

---

# ## Issue 4: Git commit format differs from actual history

** **Location:** `AGENTS.md:22-58`

** **Related Context:
- 文档声称的格式：`[Type][Module] Description`
- 实际历史中的变体：
  - `[Docs] Add LLM contribution guide`（无模块）✓ 符合
  - `[Fix][Connector-V2][Hive] ...`（三级标签）⚠️ 变体
  - `[Fix] [connector-hbase] ...`（小写+空格）⚠️ 变体
  - `[Feature] [Connector-file] ...`（空格）⚠️ 变体
  - `[Bug][Connector-V2] ...`（使用 Bug）⚠️ 类型不一致

** **Problem Description:
文档中定义的提交格式是"理想格式"，但项目实际历史中存在多种变体：
1. 模块标签有时使用小写（如 `connector-hbase` vs `Connector-V2`）
2. 有时使用三级标签（如 `[Fix][Connector-V2][Hive]`）
3. 有时标签之间有空格（`[Fix] [connector-hbase]`）
4. 类型有时使用 `Bug` 而非 `Fix`
5. 有些提交没有模块标签（如 `[Docs] ...`）

文档没有说明这些变体是否可接受，可能导致 AI 助手或贡献者困惑。

** **Potential Risks:
- 风险 1：AI 助手可能拒绝接受历史中的合法提交格式
- 风险 2：贡献者可能认为历史提交格式"错误"
- 风险 3：缺乏明确的规范说明，导致新贡献者不知道该遵循哪个格式

** **Impact Scope:
- 直接影响：Git 提交信息格式
- 间接影响：提交历史的一致性
- 影响面：项目级

** **Severity:** MINOR

** **Improvement Suggestions:
```markdown
## Git Commit Message Convention

SeaTunnel follows a **standardized commit message format** to maintain a clean and searchable history.

**Recommended Format**:

[Type][Module] Description


**Allowed Variations**:
- `[Type] Description` (without module tag, for general changes)
- `[Type][Module][Sub-module] Description` (for specific sub-components)

**Type Values** (case-sensitive, prefer these):
* `Feature`  – New features
* `Fix`      – Bug fixes (also `Bug` is accepted but discouraged)
* `Improve`  – Improvements to existing behavior
* `Docs`     – Documentation-only changes
* `Test`     – Test cases or test framework changes
* `Chore`    – Build, dependency, or maintenance tasks

**Module Values** (use consistent casing):
* `Connector-V2`  – seatunnel-connectors-v2
* `Zeta`          – seatunnel-engine (Zeta engine)
* `Core`          – seatunnel-core
* `API`           – seatunnel-api
* `Transform-V2`  – seatunnel-transforms-v2
* `Format`        – seatunnel-formats
* `Translation`   – seatunnel-translation
* `E2E`           – seatunnel-e2e

**Examples**:
* `[Fix][Connector-V2] Fix MySQL source split enumeration bug`
* `[Feature][Transform-V2] Add LLM transform plugin`
* `[Docs] Update quick start guide`
* `[Improve][Core] Optimize jar package loading speed`

**Historical Note**: Some historical commits use variations like lowercase module names (`connector-hbase`), extra spaces (`[Fix] [module]`), or `Bug` instead of `Fix`. New commits should follow the recommended format above.

** **Reasoning:

明确区分"推荐格式"和"允许变体"
说明历史差异，避免混淆
提供清晰的指导，告诉新贡献者应该怎么做
保持向后兼容性，承认历史格式的合理性

## Issue 5: Documentation location may not be prominent enough

** Location: Root directory AGENTS.md

** **Related Context:

文件放置在项目根目录
与 README.md 同级
没有在 README.md 中引用或提及
没有在 docs/ 目录中建立索引

** **Problem Description:
虽然 AGENTS.md 放在根目录有助于 LLM 工具发现，但对于人类贡献者来说：

没有在 README 中提及，可能不容易被发现
不在 docs/ 目录下，可能不被视为"正式文档"
没有在文档导航中集成

** **Potential Risks:

风险 1：人类贡献者可能不知道这个文档的存在
风险 2：文档可能长期不更新，因为没人知道要更新它
风险 3：与 docs/ 目录的隔离可能导致内容重复或冲突

** **Impact Scope:

直接影响：文档的可发现性
间接影响：文档的维护和更新
影响面：项目级

** Severity: MINOR

** **Improvement Suggestions:

** **Option 1: Add reference in README.md (recommended)

# Add to the "Contributing" section in README.md

## Contributing

We welcome contributions! Please check out:
- [Contribution Guide](docs/en/contribution/...) (for humans)
- [LLM Context Guide](AGENTS.md) (for AI assistants like Claude, GPT, etc.)

** **Option 2: Add link in docs/ as well

# Create docs/en/AGENTS.md (symlink or redirect)
ln -s ../../AGENTS.md docs/en/AGENTS.md
# Create docs/zh/AGENTS_ZH.md
ln -s ../../AGENTS_ZH.md docs/zh/AGENTS_ZH.md

** **Option 3: Add to documentation sidebar
在 docs/sidebars.js 中添加：

{
  type: 'link',
  label: 'For AI Assistants',
  href: '/en/AGENTS',
}

** **Reasoning:

提高文档的可发现性
让人类贡献者也知道这个资源
建立与正式文档体系的连接
便于未来维护和更新

## Issue 6: Missing CI/CD workflow documentation

** Location: AGENTS.md (overall)

** **Related Context:

文档提到了本地验证命令（./mvnw spotless:apply, ./mvnw verify, ./mvnw test）
但没有提及 CI/CD 流程
项目有 .github/workflows/ 目录（GitHub Actions）

** **Problem Description:
现代开源项目通常有 CI/CD 流程自动检查 PR。文档中：

没有说明 PR 提交后 CI 会自动运行哪些检查
没有说明如何查看 CI 结果
没有说明 CI 失败后如何处理
没有说明如何在本地模拟 CI 检查

** **Potential Risks:

风险 1：贡献者可能不知道 CI 会自动检查
风险 2：贡献者可能不知道如何理解 CI 错误
风险 3：AI 助手可能生成无法通过 CI 的代码

** **Impact Scope:

直接影响：PR 提交流程
间接影响：PR 合并效率
影响面：所有贡献者

** Severity: MINOR

** **Improvement Suggestions:

## CI/CD Process

**Automated Checks**:
When you open a PR, GitHub Actions will automatically run:
- Spotless format check
- Compilation (verify)
- Unit tests
- E2E tests (for some modules)

**Viewing CI Results**:
1. Go to your PR page
2. Scroll to the "Checks" section at the bottom
3. Click on the failed job to see detailed logs

**Common CI Failures**:
- **Spotless check failed**: Run `./mvnw spotless:apply` and push again
- **Test failed**: Check the test logs, fix the issue, and push
- **Compilation failed**: Check the compiler error logs

**Running CI Locally** (optional):
```bash

# ## Issue 7: E2E test command may be inaccurate

** **Location:** `AGENTS.md:193-203`

** **Related Context:
- 文档声称：
  ```bash
  ./mvnw -DskipUT -DskipIT=false verify

这个命令会跳过单元测试（skipUT），但运行集成测试

** **Problem Description:

命令名称叫 "E2E Tests"，但实际使用 -DskipIT=false，这是运行集成测试，不一定是 E2E 测试
E2E 测试可能位于 seatunnel-e2e 模块，需要特定的命令运行
文档没有说明如何运行特定模块的 E2E 测试

** **Potential Risks:

风险 1：AI 助手可能无法正确区分 E2E 测试和集成测试
风险 2：贡献者可能无法运行特定的 E2E 测试
风险 3：测试运行失败时不知道如何排查

** **Impact Scope:

直接影响：测试运行
间接影响：PR 质量保证
影响面：Connector-V2 和其他需要 E2E 测试的模块

** Severity: MINOR

** **Improvement Suggestions:

### E2E Tests

* Located in `seatunnel-e2e`
* Uses Testcontainers for real environment testing
* Extend `TestSuiteBase` for consistent test setup

**Run all E2E tests**:
```bash
./mvnw -DskipUT -DskipIT=false verify

Run specific E2E test:

davidzollo · 2026-01-29T12:32:35Z

Can you remind the AI to add comments for important methods? I find out that the comments in the current code are rare.

yzeng1618 · 2026-01-30T02:34:28Z

AGENTS.md

+* `[Fix][Zeta] Fix checkpoint timeout under heavy backpressure`
+* `[Feature][Transform-V2] Add LLM transform plugin`
+* `[Improve][Core] Optimize jar package loading speed`
+* `[Docs] Update quick start guide`


A small detail: the document states the strict format is [Type][Module], but the example has [Docs] ... (no Module). Let's standardize this part.

The DOC doesn't have [Module], so I think this is fine. I feel there's no need to add anything here

I think we could add DOC doesn't have [Module] to this description, but it’s totally fine to leave it out—this isn’t something we need to overthink.

Then I think it’s better to use Doc as [Module] as before.

yzeng1618 · 2026-01-30T02:36:11Z

Isn’t Chinese documentation in the plan?

@chl-wxp I took a look at the loading logic of commonly used tools. It's all unnecessary to load Chinese items like AGENT/GPT in the root directory

This can be removed.

yzeng1618

LGTM

davidzollo · 2026-02-03T12:51:03Z

AGENTS.md

+```bash
+./mvnw -DskipUT -DskipIT=false verify
+```


# Run specific connector E2E tests ./mvnw test -pl seatunnel-e2e/seatunnel-connector-v2-e2e/connector-xxx-e2e # Run all E2E tests (warning: time-consuming) ./mvnw -DskipUT -DskipIT=false verify

davidzollo · 2026-02-03T12:52:59Z

AGENTS.md

+# Unit tests (strongly recommended)
+./mvnw test
+```


Suggested change

# Unit tests (strongly recommended)

./mvnw test

```

# Unit tests (strongly recommended)

./mvnw test

# Check for compilation errors in affected modules

./mvnw clean package -pl <affected-module> -am

davidzollo · 2026-02-03T13:09:07Z

Suggested addition

## Adding a New Connector

When contributing a new connector, you MUST update:

1. `plugin-mapping.properties` - Add connector mapping
2. `seatunnel-dist/pom.xml` - Include in distribution
3. `seatunnel-e2e/` - Add E2E test cases
4. `config/plugin_config` - Add plugin configuration
5. `docs/en/connector-v2/` - Add English documentation
6. `docs/zh/connector-v2/` - Add Chinese documentation

Suggested new section:

## Common Pitfalls to Avoid

1. **Don't use wildcard imports** - Spotless will fail
2. **Don't forget license headers** - All new files need ASF headers
3. **Don't skip `-nsu` flag** - Ensures build reproducibility
4. **Don't modify existing Option names** - Breaking change for users
5. **Don't commit without running spotless:apply** - CI will fail
6. **Don't use `System.out.println`** - Use proper logging framework

What Works Well

Clear structure - Well-organized sections
Good examples - Commit message format examples are helpful
Backward compatibility emphasis - Critical for production systems
Repository structure - Accurate and helpful
License header template - Essential for Apache projects

AGENTS.md

zhangshenghang

+1

davidzollo

+1

corgy-w added 2 commits January 28, 2026 23:09

[Docs] Add LLM contribution guide (AGENTS.md)

b9ee3bc

[Docs] Add LLM contribution guide

e0255d2

davidzollo reviewed Jan 28, 2026

View reviewed changes

davidzollo previously approved these changes Jan 28, 2026

View reviewed changes

github-actions bot added approved reviewed labels Jan 28, 2026

corgy-w added 2 commits January 29, 2026 02:25

[Docs] Add LLM contribution guide

5dcdf60

[Docs] Add LLM contribution guide

625e41b

corgy-w dismissed davidzollo’s stale review via 625e41b January 28, 2026 18:25

github-actions bot removed approved reviewed labels Jan 28, 2026

corgy-w added 2 commits January 29, 2026 02:29

[Docs] Add LLM contribution guide

bcfdd86

[Docs] Add LLM contribution guide

aea828d

zhangshenghang reviewed Jan 29, 2026

View reviewed changes

AGENTS.md Outdated Show resolved Hide resolved

Use variable for plugin version in install command

6bebc5b

[Docs] Update

4880376

yzeng1618 reviewed Jan 30, 2026

View reviewed changes

yzeng1618 approved these changes Jan 30, 2026

View reviewed changes

github-actions bot added the reviewed label Jan 30, 2026

davidzollo reviewed Feb 3, 2026

View reviewed changes

AGENTS.md Outdated Show resolved Hide resolved

Delete GPT.md

ec34c5e

update

5d643f9

zhangshenghang approved these changes Feb 9, 2026

View reviewed changes

github-actions bot added the approved label Feb 9, 2026

davidzollo approved these changes Feb 9, 2026

View reviewed changes

davidzollo merged commit 2df5480 into apache:dev Feb 9, 2026
5 checks passed

Conversation

corgy-w commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Uh oh!

davidzollo Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

davidzollo commented Jan 28, 2026

Uh oh!

davidzollo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chl-wxp commented Jan 29, 2026

Uh oh!

corgy-w commented Jan 29, 2026

Uh oh!

DanielCarter-stack commented Jan 29, 2026

Issue 1: Missing Chinese version of documentation AGENTS_ZH.md

Issue 2: Startup script path error

## Issue 5: Documentation location may not be prominent enough

## Issue 6: Missing CI/CD workflow documentation

Uh oh!

davidzollo commented Jan 29, 2026

Uh oh!

yzeng1618 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

corgy-w Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

yzeng1618 Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

corgy-w Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

corgy-w Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

yzeng1618 commented Jan 30, 2026

Uh oh!

yzeng1618 left a comment

Choose a reason for hiding this comment

Uh oh!

davidzollo Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

davidzollo Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidzollo commented Feb 3, 2026

What Works Well

Uh oh!

Uh oh!

zhangshenghang left a comment

Choose a reason for hiding this comment

Uh oh!

davidzollo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

corgy-w commented Jan 28, 2026 •

edited

Loading

Issue 1: Missing Chinese version of documentation `AGENTS_ZH.md`

yzeng1618 Jan 30, 2026 •

edited

Loading

davidzollo Feb 3, 2026 •

edited

Loading