English | 简体中文
go-etl is a toolset for extracting, transforming, and loading data sources, providing powerful data synchronization capabilities.
go-etl will provide the following ETL capabilities:
- The ability to extract and load data from mainstream databases is implemented in the storage package
- The ability to extract and load data from data streams in a two-dimensional table-like format is implemented in the stream package
- Similar data synchronization capabilities to datax, implemented in the datax package
Since I have limited energy, everyone is welcome to submit issues to discuss go-etl, let's make progress together!
This data synchronization tool has the synchronization capability for the following data sources.
| Type | Data Source | Reader | Writer | Document |
|---|---|---|---|---|
| Relational Database | MySQL/Mariadb/Tidb/TDSQL MySQL | √ | √ | Read、Write |
| Postgres/Greenplum | √ | √ | Read、Write | |
| DB2 LUW | √ | √ | Read、Write | |
| SQL Server | √ | √ | Read、Write | |
| Oracle | √ | √ | Read、Write | |
| Sqlite3 | √ | √ | Read、Write | |
| Dameng | √ | √ | Read、Write | |
| Unstructured Data Stream | CSV | √ | √ | Read、Write |
| XLSX(excel) | √ | √ | Read、Write |
You can download the 64-bit binary executable for Windows or Linux operating systems from the latest releases.
Start data synchronization with the go-etl Data Synchronization User Manual.
Pull Docker Image
docker pull go-etl:v0.2.3Start Container
docker run -d -p 6080:6080 --name etl -v /data:/usr/local/go-etl/data go-etl:v0.2.3Enter Container
docker exec -it etl bashExecute Sync in Container
docker exec -it etl release/bin/go-etl -c data/config.jsonIf you want to directly obtain performance-related information, you can deploy according to the Prometheus Monitoring Deployment Manual to acquire relevant performance metrics and even performance visualization charts.
- golang 1.20 and later versions
- gcc 4.8 and later versions
cd ${GO_PATH}/src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
make dependencies
make releaseBefore compilation, it is necessary to use export IGNORE_PACKAGES=db2
export IGNORE_PACKAGES=db2
cd ${GO_PATH}/src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
make dependencies
make release- A MinGW-w64 environment with GCC 7.2.0 or higher is required for compilation.
- golang 1.20 and later versions
- The minimum compilation environment is Windows 7.
cd ${GO_PATH}\src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
release.batBefore compilation, it is necessary to use set IGNORE_PACKAGES=db2
cd ${GO_PATH}\src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
set IGNORE_PACKAGES=db2
release.bat +---datax---|---plugin---+---reader--mysql---|--README.md
| | .......
| |
| |---writer--mysql---|--README.md
| | .......
|
+---bin----go-etl
+---exampales---+---csvpostgres----config.json
| |---db2------------config.json
| | .......
|
+---README_USER.md
- The datax/plugin directory contains the documentation for various plugins.
- The bin directory houses the data synchronization program, named go-etl.
- The examples directory includes configuration files for data synchronization in different scenarios.
- README_USER.md is the user manual or guide in English.
Use the following commands to get the go-etl project (version v0.2.3):
git clone https://github.com/Breeze0806/go-etl.git
cd go-etl
git describe --abbrev=0 --tagsBuild the Docker image with the following command:
docker build . -t go-etl:v0.2.3Start the container:
docker run -d -p 6080:6080 --name etl -v /data:/usr/local/go-etl/data go-etl:v0.2.3Enter the container:
docker exec -it etl bashNote that currently, sqlite3, DB2, and Oracle are not directly supported and require downloading the corresponding ODBC and configuring environment variables.
Use a wizard CSV file to batch sync multiple tables.
1. Create data source config config.json - same as single sync
2. Create wizard file wizard.csv - each row defines a source-target table pair:
source_table,target_table
table1,table1_copy
table2,table2_copy3. Generate batch configs and run script:
Linux:
./go-etl -c config.json -w wizard.csv; ./run.shWindows:
.\go-etl.exe -c config.json -w wizard.csv; run.batDocker:
docker exec -it etl release/bin/go-etl -c data/config.json -w data/wizard.csv; docker exec -it etl bash run.shRefer to the go-etl Data Synchronization Developer Documentation to assist with your development.
This package provides an interface similar to Alibaba's DataX to implement an offline data synchronization framework in Go.
readerPlugin(reader)—> Framework(Exchanger+Transformer) ->writerPlugin(writer)
Built using a Framework + plugin architecture. Data source reading and writing are abstracted into Reader/Writer plugins and integrated into the overall synchronization framework.
- Reader: The Reader is the data acquisition module, responsible for collecting data from the data source and sending it to the Framework.
- Writer: The Writer is the data writing module, responsible for continuously fetching data from the Framework and writing it to the destination.
- Framework: The Framework connects the reader and writer, serving as a data transmission channel, and handles core technical aspects such as buffering, flow control, concurrency, and data transformation.
For detailed information, please refer to the go-etl Data Synchronization Developer Documentation.
The data types and data type conversions in go-etl have been implemented. For more information, please refer to the go-etl Data Type Descriptions.
Basic integration for databases has been implemented, abstracting the database dialect (Dialect) interface. For specific implementation details, please refer to the Database Storage Developer Guide.
Primarily used for parsing byte streams, such as files, message queues, Elasticsearch, etc. The byte stream format can be CSV, JSON, XML, etc.
Focused on file parsing, including CSV, Excel, etc. It abstracts the InputStream and OutputStream interfaces. For specific implementation details, refer to the Developer Guide for Tabular File Storage.
A collection of utilities for compilation, adding licenses, etc.
go generate ./...Release command used to register developer-created reader and writer plugins into the program's code.
Additionally, this command inserts compilation information such as software version, Git version, Go compiler version, and compilation time into the command line.
A plugin template creation tool for data sources. It's used to create a new reader or writer template, in conjunction with the release command, to reduce the developer's workload.
A packaging tool for the data synchronization program and user documentation.
Automatically adds a license to Go code files and formats the code using gofmt -s -w.
go run tools/license/main.goContributions are welcome! Please read our Contributing Guide for details on how to contribute to this project.
- Report bugs and issues
- Suggest new features
- Submit pull requests
- Improve documentation
- Share your use cases
- Check the Documentation
- Check the User Manual
- Submit a GitHub Issue for discussion