Skip to content

Commit d1d7eed

Browse files
bvt123claude
andcommitted
chore: remove stale YDB, YTSaurus, Greenplum, OpenSearch, S3 provider references
- Remove YDB debezium emitter/receiver and tests - Remove YTSaurus logging, KV wrapper, and recipe helpers - Remove Greenplum and OpenSearch connection code - Remove S3 example (s3sqs2ch) and docs references - Remove Elasticsearch and Delta docs references - Clean up error codes for removed providers - Update .mapping.json to remove deleted file entries - Add .claude/ and reports/ to .gitignore Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent b316038 commit d1d7eed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+4781
-9821
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,5 @@ report
99
docs-html
1010
_docs-lint
1111
trcli
12+
.claude/
13+
reports/

.mapping.json

Lines changed: 3968 additions & 4240 deletions
Large diffs are not rendered by default.

AGENTS.md

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
# AGENTS.md - AI Agent & Contributor Guidelines
2+
3+
This document provides essential context for AI agents (Claude, Copilot, etc.) and human contributors working on the Transferia codebase.
4+
5+
## Project Overview
6+
7+
**Transferia** is an open-source, cloud-native ELT (Extract, Load, Transform) ingestion engine built in Go. It enables seamless, high-performance data movement between diverse database systems at scale.
8+
9+
### Key Capabilities
10+
- **Snapshot**: One-time bulk data transfer with table-level consistency
11+
- **Replication**: Continuous CDC (Change Data Capture) streaming
12+
- **Transformations**: Row-level transformations during transfer (rename, mask, filter, SQL)
13+
- **Multi-source**: PostgreSQL, MySQL, MongoDB, Kafka, S3, and more
14+
- **Multi-destination**: ClickHouse, PostgreSQL, Kafka, S3, and more
15+
16+
## Repository Structure
17+
18+
```
19+
transferia/
20+
├── cmd/trcli/ # CLI entry point (replicate, upload, check, validate)
21+
├── pkg/
22+
│ ├── abstract/ # Core interfaces (Source, Sink, Storage, Transfer)
23+
│ ├── providers/ # Database adapters (postgres, mysql, mongo, clickhouse, kafka)
24+
│ ├── dataplane/ # Runtime execution engine
25+
│ ├── middlewares/ # Cross-cutting concerns (retry, filter, metrics)
26+
│ ├── transformer/ # Pluggable transformers
27+
│ ├── parsers/ # Data format parsers (JSON, Avro, Parquet)
28+
│ ├── coordinator/ # Multi-node coordination (memory, S3)
29+
│ └── connection/ # Connection management
30+
├── internal/ # Internal packages (logger, config, metrics)
31+
├── tests/
32+
│ ├── e2e-core/ # End-to-end tests (pg2ch, mysql2ch, mongo2ch)
33+
│ ├── helpers/ # Test utilities and helpers
34+
│ ├── canon/ # Type/schema validation tests
35+
│ └── storage/ # Provider storage tests
36+
├── recipe/ # Test container recipes
37+
├── examples/ # Configuration examples
38+
└── docs/ # Documentation
39+
```
40+
41+
## Core Abstractions
42+
43+
### Key Interfaces (pkg/abstract/)
44+
45+
1. **Storage** - One-time data reader for snapshots
46+
2. **Source** - Streaming data reader for CDC replication
47+
3. **Sink** - Data writer with async push semantics
48+
4. **Transformer** - Row-level data transformation
49+
5. **ChangeItem** - Core unit of data transfer (represents a row operation)
50+
51+
### Provider Pattern
52+
53+
All providers in `pkg/providers/` follow this structure:
54+
```go
55+
type Provider struct {
56+
logger log.Logger
57+
registry metrics.Registry
58+
cp coordinator.Coordinator
59+
transfer *model.Transfer
60+
}
61+
```
62+
63+
Providers register via `init()` with:
64+
- `providers.Register(ProviderType, New)`
65+
- `model.RegisterSource/RegisterDestination`
66+
- `abstract.RegisterProviderName`
67+
68+
## Coding Guidelines
69+
70+
### Error Handling
71+
72+
- Use `xerrors.Errorf("context: %w", err)` for wrapping
73+
- Domain-specific error types exist:
74+
- `FatalError` - Stops transfer, forbids restart
75+
- `RetriablePartUploadError` - Transient, eligible for retry
76+
- `TableUploadError` - Specific upload failures
77+
- Never ignore errors silently; log if not returning
78+
79+
### Logging
80+
81+
- Use structured logging via the `logger` package (Zap-based)
82+
- Prefer `logger.Log.Info("message", log.String("key", value))` over `Infof`
83+
- Log levels: DEBUG, INFO, WARNING, ERROR, FATAL
84+
- Never log credentials or sensitive data
85+
86+
### Testing
87+
88+
- Place tests in `tests/` directory, not alongside code
89+
- Use testcontainers via `recipe/` package for integration tests
90+
- Follow the pattern: `TestSnapshotAndIncrement`, `TestReplication`
91+
- Use helpers: `helpers.Activate()`, `helpers.CompareStorages()`
92+
- Wait helpers for async operations: `WaitEqualRowsCount()`, `WaitCond()`
93+
94+
### Concurrency
95+
96+
- Always use context for cancellation
97+
- Use `sync.WaitGroup` for goroutine lifecycle
98+
- Prefer buffered channels to avoid deadlocks
99+
- Use `sync.Once` for one-time cleanup operations
100+
- Avoid mutex in hot paths; consider atomics
101+
102+
## Security Considerations
103+
104+
### Critical Rules
105+
106+
1. **Never hardcode credentials** - Use environment variables
107+
2. **Always validate TLS certificates** - Don't set `InsecureSkipVerify: true` in production
108+
3. **Sanitize SQL inputs** - Use parameterized queries, never string concatenation
109+
4. **Redact secrets in logs** - Never log passwords, tokens, or keys
110+
5. **Validate database filters** - User-provided filters can be injection vectors
111+
112+
### Known Security Debt
113+
114+
- Some TLS configurations default to `InsecureSkipVerify` when no cert provided
115+
- `SecretString` type alias provides no actual protection
116+
- Test credentials exist in recipe files (acceptable for tests only)
117+
118+
## Provider-Specific Notes
119+
120+
### PostgreSQL
121+
- Most complex provider with full replication support
122+
- Uses pgx library with custom type mapping
123+
- Has DBLog support for alternative loading
124+
- System tables: `__consumer_keeper`, `__data_transfer_lsn`
125+
126+
### MySQL
127+
- Supports both file-based and GTID position tracking
128+
- Character set handling (UTF-8MB3 vs MB4)
129+
- System tables: `__table_transfer_progress`, `__tm_keeper`
130+
131+
### MongoDB
132+
- Simpler, document-oriented provider
133+
- Uses cluster time instead of LSN for position
134+
- System collection: `__dt_cluster_time`
135+
136+
### ClickHouse
137+
- **Snapshot-only** - No replication source support
138+
- Implements different interfaces (`Abstract2Provider`, `AsyncSinker`)
139+
- HTTP and Native protocol support
140+
141+
## Common Tasks
142+
143+
### Adding a New Provider
144+
145+
1. Create package in `pkg/providers/newprovider/`
146+
2. Implement required interfaces (Storage, Source, Sink as needed)
147+
3. Register in `init()` function
148+
4. Add test recipes in `recipe/`
149+
5. Create e2e tests in `tests/e2e-core/`
150+
151+
### Adding a Transformer
152+
153+
1. Add implementation in `pkg/transformer/registry/`
154+
2. Register with transformer registry
155+
3. Update configuration model if needed
156+
4. Add tests
157+
158+
### Running Tests
159+
160+
```bash
161+
# Quick core tests
162+
make test-core
163+
164+
# Full CDC test suite
165+
make test-cdc-full
166+
167+
# Specific wave
168+
make test-cdc-wave WAVE=providers
169+
170+
# Specific layer
171+
make test-layer LAYER=e2e-core DB=pg2ch
172+
```
173+
174+
## Build Commands
175+
176+
```bash
177+
make build # Build trcli binary
178+
make docker # Build Docker image
179+
make clean # Remove artifacts
180+
make lint # Run linters
181+
```
182+
183+
## Important Files
184+
185+
- `pkg/abstract/model/transfer.go` - Transfer model definition
186+
- `pkg/abstract/endpoint.go` - Provider interface definitions
187+
- `pkg/providers/provider.go` - Provider registration
188+
- `cmd/trcli/config/model.go` - CLI configuration model
189+
- `.golangci.yml` - Linter configuration
190+
191+
## Code Style
192+
193+
- Follow Go idioms and effective Go guidelines
194+
- Use `gofmt` and linters (`.golangci.yml` configured)
195+
- Interface names: clear verbs (Source, Sink, Transformer)
196+
- File names: lowercase with underscores
197+
- Keep functions focused; extract when > 50 lines
198+
- Comment non-obvious logic; skip obvious comments
199+
200+
## Architecture Decisions
201+
202+
1. **Compile-time plugins** - No runtime plugin loading; all providers compiled in
203+
2. **Middleware pattern** - Cross-cutting concerns via composable middlewares
204+
3. **Marker interfaces** - Capability detection via `Is*()` marker methods
205+
4. **Coordinator abstraction** - Memory (single-node) or S3 (distributed) coordination
206+
5. **Wave-based testing** - Tests organized in dependency waves for efficient CI
207+
208+
## Getting Help
209+
210+
- See `/docs/` for detailed documentation
211+
- Check `/examples/` for configuration patterns
212+
- Review existing provider implementations for patterns
213+
- Test recipes in `/recipe/` show infrastructure setup

0 commit comments

Comments
 (0)