go-etl

go-etl is a toolset for extracting, transforming, and loading data sources, providing powerful data synchronization capabilities.

go-etl will provide the following ETL capabilities:

The ability to extract and load data from mainstream databases is implemented in the storage package
The ability to extract and load data from data streams in a two-dimensional table-like format is implemented in the stream package
Similar data synchronization capabilities to datax, implemented in the datax package

Since I have limited energy, everyone is welcome to submit issues to discuss go-etl, let's make progress together!

Data Synchronization Tool

This data synchronization tool has the synchronization capability for the following data sources.

Type	Data Source	Reader	Writer	Document
Relational Database	MySQL/Mariadb/Tidb/TDSQL MySQL	√	√	Read、Write
	Postgres/Greenplum	√	√	Read、Write
	DB2 LUW	√	√	Read、Write
	SQL Server	√	√	Read、Write
	Oracle	√	√	Read、Write
	Sqlite3	√	√	Read、Write
	Dameng	√	√	Read、Write
Unstructured Data Stream	CSV	√	√	Read、Write
	XLSX（excel）	√	√	Read、Write

Getting Started

Quick Start (3 Minutes)

Start from Binary Program

You can download the 64-bit binary executable for Windows or Linux operating systems from the latest releases.

Start data synchronization with the go-etl Data Synchronization User Manual.

Start from Docker Image

Pull Docker Image

docker pull go-etl:v0.2.3

Start Container

docker run -d -p 6080:6080 --name etl -v /data:/usr/local/go-etl/data go-etl:v0.2.3

Enter Container

docker exec -it etl bash

Execute Sync in Container

docker exec -it etl release/bin/go-etl -c data/config.json

Start from Performance Testing

If you want to directly obtain performance-related information, you can deploy according to the Prometheus Monitoring Deployment Manual to acquire relevant performance metrics and even performance visualization charts.

Start from Source Code

Linux

Compilation Dependencies

golang 1.20 and later versions
gcc 4.8 and later versions

Build

cd ${GO_PATH}/src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
make dependencies
make release

Removing DB2 Dependency

Before compilation, it is necessary to use export IGNORE_PACKAGES=db2

export IGNORE_PACKAGES=db2
cd ${GO_PATH}/src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
make dependencies
make release

Windows

Compilation Dependencies

A MinGW-w64 environment with GCC 7.2.0 or higher is required for compilation.
golang 1.20 and later versions
The minimum compilation environment is Windows 7.

Build

cd ${GO_PATH}\src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
release.bat

Removing DB2 Dependency

Before compilation, it is necessary to use set IGNORE_PACKAGES=db2

cd ${GO_PATH}\src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
set IGNORE_PACKAGES=db2
release.bat

Compilation Output

    +---datax---|---plugin---+---reader--mysql---|--README.md
    |                        | .......
    |                        |
    |                        |---writer--mysql---|--README.md
    |                        | .......
    |
    +---bin----go-etl
    +---exampales---+---csvpostgres----config.json
    |               |---db2------------config.json
    |               | .......
    |
    +---README_USER.md

The datax/plugin directory contains the documentation for various plugins.
The bin directory houses the data synchronization program, named go-etl.
The examples directory includes configuration files for data synchronization in different scenarios.
README_USER.md is the user manual or guide in English.

Start from Compiled Docker Image

Use the following commands to get the go-etl project (version v0.2.3):

git clone https://github.com/Breeze0806/go-etl.git
cd go-etl
git describe --abbrev=0 --tags

Build the Docker image with the following command:

docker build . -t go-etl:v0.2.3

Start the container:

docker run -d -p 6080:6080 --name etl -v /data:/usr/local/go-etl/data go-etl:v0.2.3

Enter the container:

docker exec -it etl bash

Note that currently, sqlite3, DB2, and Oracle are not directly supported and require downloading the corresponding ODBC and configuring environment variables.

Batch Sync

Use a wizard CSV file to batch sync multiple tables.

1. Create data source config config.json - same as single sync

2. Create wizard file wizard.csv - each row defines a source-target table pair:

source_table,target_table
table1,table1_copy
table2,table2_copy

3. Generate batch configs and run script:

Linux:

./go-etl -c config.json -w wizard.csv; ./run.sh

Windows:

.\go-etl.exe -c config.json -w wizard.csv; run.bat

Docker:

docker exec -it etl release/bin/go-etl -c data/config.json -w data/wizard.csv; docker exec -it etl bash run.sh

Data Synchronization Development Guide

Refer to the go-etl Data Synchronization Developer Documentation to assist with your development.

Module Introduction

datax

This package provides an interface similar to Alibaba's DataX to implement an offline data synchronization framework in Go.

readerPlugin(reader)—> Framework(Exchanger+Transformer) ->writerPlugin(writer)

Built using a Framework + plugin architecture. Data source reading and writing are abstracted into Reader/Writer plugins and integrated into the overall synchronization framework.

Reader: The Reader is the data acquisition module, responsible for collecting data from the data source and sending it to the Framework.
Writer: The Writer is the data writing module, responsible for continuously fetching data from the Framework and writing it to the destination.
Framework: The Framework connects the reader and writer, serving as a data transmission channel, and handles core technical aspects such as buffering, flow control, concurrency, and data transformation.

For detailed information, please refer to the go-etl Data Synchronization Developer Documentation.

element

The data types and data type conversions in go-etl have been implemented. For more information, please refer to the go-etl Data Type Descriptions.

storage

database

Basic integration for databases has been implemented, abstracting the database dialect (Dialect) interface. For specific implementation details, please refer to the Database Storage Developer Guide.

stream

Primarily used for parsing byte streams, such as files, message queues, Elasticsearch, etc. The byte stream format can be CSV, JSON, XML, etc.

file

Focused on file parsing, including CSV, Excel, etc. It abstracts the InputStream and OutputStream interfaces. For specific implementation details, refer to the Developer Guide for Tabular File Storage.

tools

A collection of utilities for compilation, adding licenses, etc.

datax

build

go generate ./...

Release command used to register developer-created reader and writer plugins into the program's code.

Additionally, this command inserts compilation information such as software version, Git version, Go compiler version, and compilation time into the command line.

plugin

A plugin template creation tool for data sources. It's used to create a new reader or writer template, in conjunction with the release command, to reduce the developer's workload.

release

A packaging tool for the data synchronization program and user documentation.

license

Automatically adds a license to Go code files and formats the code using gofmt -s -w.

go run tools/license/main.go

Contributing

Contributions are welcome! Please read our Contributing Guide for details on how to contribute to this project.

Ways to Contribute

Report bugs and issues
Suggest new features
Submit pull requests
Improve documentation
Share your use cases

Getting Help

Check the Documentation
Check the User Manual
Submit a GitHub Issue for discussion

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
.github		.github
.vscode		.vscode
cmd/datax		cmd/datax
config		config
datax		datax
docker		docker
element		element
schedule		schedule
storage		storage
tools		tools
.gitignore		.gitignore
Authors		Authors
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTING_zh-CN.md		CONTRIBUTING_zh-CN.md
Dockerfile		Dockerfile
Dockerfile.release		Dockerfile.release
Dockerfile.release-no-db2		Dockerfile.release-no-db2
LICENSE		LICENSE
Makefile		Makefile
QUICK_START.md		QUICK_START.md
QUICK_START_zh-CN.md		QUICK_START_zh-CN.md
README.md		README.md
README_USER.md		README_USER.md
README_USER_zh-CN.md		README_USER_zh-CN.md
README_zh-CN.md		README_zh-CN.md
authors.sh		authors.sh
cover.sh		cover.sh
go.mod		go.mod
go.sum		go.sum
release.bat		release.bat

Folders and files

Latest commit

History

Repository files navigation

go-etl

Data Synchronization Tool

Getting Started

Quick Start (3 Minutes)

Start from Binary Program

Start from Docker Image

Start from Performance Testing

Start from Source Code

Linux

Compilation Dependencies

Build

Removing DB2 Dependency

Windows

Compilation Dependencies

Build

Removing DB2 Dependency

Compilation Output

Start from Compiled Docker Image

Batch Sync

Data Synchronization Development Guide

Module Introduction

datax

element

storage

database

stream

file

tools

datax

build

plugin

release

license

Contributing

Ways to Contribute

Getting Help

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 14

Contributors

Uh oh!

Languages