Skip to content

Breeze0806/go-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

174 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-etl

LICENSE Language Build Go Report Card GitHub release GitHub release date Coverage Status GoDoc Mentioned in Awesome Go Docker Version

English | 简体中文

go-etl is a toolset for extracting, transforming, and loading data sources, providing powerful data synchronization capabilities.

go-etl will provide the following ETL capabilities:

  • The ability to extract and load data from mainstream databases is implemented in the storage package
  • The ability to extract and load data from data streams in a two-dimensional table-like format is implemented in the stream package
  • Similar data synchronization capabilities to datax, implemented in the datax package

Since I have limited energy, everyone is welcome to submit issues to discuss go-etl, let's make progress together!

Data Synchronization Tool

This data synchronization tool has the synchronization capability for the following data sources.

Type Data Source Reader Writer Document
Relational Database MySQL/Mariadb/Tidb/TDSQL MySQL ReadWrite
Postgres/Greenplum ReadWrite
DB2 LUW ReadWrite
SQL Server ReadWrite
Oracle ReadWrite
Sqlite3 ReadWrite
Dameng ReadWrite
Unstructured Data Stream CSV ReadWrite
XLSX(excel) ReadWrite

Getting Started

Quick Start (3 Minutes)

Start from Binary Program

You can download the 64-bit binary executable for Windows or Linux operating systems from the latest releases.

Start data synchronization with the go-etl Data Synchronization User Manual.

Start from Docker Image

Pull Docker Image

docker pull go-etl:v0.2.3

Start Container

docker run -d -p 6080:6080 --name etl -v /data:/usr/local/go-etl/data go-etl:v0.2.3

Enter Container

docker exec -it etl bash

Execute Sync in Container

docker exec -it etl release/bin/go-etl -c data/config.json

Start from Performance Testing

If you want to directly obtain performance-related information, you can deploy according to the Prometheus Monitoring Deployment Manual to acquire relevant performance metrics and even performance visualization charts.

Start from Source Code

Linux
Compilation Dependencies
  1. golang 1.20 and later versions
  2. gcc 4.8 and later versions
Build
cd ${GO_PATH}/src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
make dependencies
make release
Removing DB2 Dependency

Before compilation, it is necessary to use export IGNORE_PACKAGES=db2

export IGNORE_PACKAGES=db2
cd ${GO_PATH}/src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
make dependencies
make release
Windows
Compilation Dependencies
  1. A MinGW-w64 environment with GCC 7.2.0 or higher is required for compilation.
  2. golang 1.20 and later versions
  3. The minimum compilation environment is Windows 7.
Build
cd ${GO_PATH}\src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
release.bat
Removing DB2 Dependency

Before compilation, it is necessary to use set IGNORE_PACKAGES=db2

cd ${GO_PATH}\src
git clone https://github.com/Breeze0806/go-etl.git "github.com/Breeze0806/go-etl"
cd github.com/Breeze0806/go-etl
set IGNORE_PACKAGES=db2
release.bat
Compilation Output
    +---datax---|---plugin---+---reader--mysql---|--README.md
    |                        | .......
    |                        |
    |                        |---writer--mysql---|--README.md
    |                        | .......
    |
    +---bin----go-etl
    +---exampales---+---csvpostgres----config.json
    |               |---db2------------config.json
    |               | .......
    |
    +---README_USER.md
  • The datax/plugin directory contains the documentation for various plugins.
  • The bin directory houses the data synchronization program, named go-etl.
  • The examples directory includes configuration files for data synchronization in different scenarios.
  • README_USER.md is the user manual or guide in English.

Start from Compiled Docker Image

Use the following commands to get the go-etl project (version v0.2.3):

git clone https://github.com/Breeze0806/go-etl.git
cd go-etl
git describe --abbrev=0 --tags

Build the Docker image with the following command:

docker build . -t go-etl:v0.2.3

Start the container:

docker run -d -p 6080:6080 --name etl -v /data:/usr/local/go-etl/data go-etl:v0.2.3

Enter the container:

docker exec -it etl bash

Note that currently, sqlite3, DB2, and Oracle are not directly supported and require downloading the corresponding ODBC and configuring environment variables.

Batch Sync

Use a wizard CSV file to batch sync multiple tables.

1. Create data source config config.json - same as single sync

2. Create wizard file wizard.csv - each row defines a source-target table pair:

source_table,target_table
table1,table1_copy
table2,table2_copy

3. Generate batch configs and run script:

Linux:

./go-etl -c config.json -w wizard.csv; ./run.sh

Windows:

.\go-etl.exe -c config.json -w wizard.csv; run.bat

Docker:

docker exec -it etl release/bin/go-etl -c data/config.json -w data/wizard.csv; docker exec -it etl bash run.sh

Data Synchronization Development Guide

Refer to the go-etl Data Synchronization Developer Documentation to assist with your development.

Module Introduction

datax

This package provides an interface similar to Alibaba's DataX to implement an offline data synchronization framework in Go.

readerPlugin(reader)—> Framework(Exchanger+Transformer) ->writerPlugin(writer)  

Built using a Framework + plugin architecture. Data source reading and writing are abstracted into Reader/Writer plugins and integrated into the overall synchronization framework.

  • Reader: The Reader is the data acquisition module, responsible for collecting data from the data source and sending it to the Framework.
  • Writer: The Writer is the data writing module, responsible for continuously fetching data from the Framework and writing it to the destination.
  • Framework: The Framework connects the reader and writer, serving as a data transmission channel, and handles core technical aspects such as buffering, flow control, concurrency, and data transformation.

For detailed information, please refer to the go-etl Data Synchronization Developer Documentation.

element

The data types and data type conversions in go-etl have been implemented. For more information, please refer to the go-etl Data Type Descriptions.

storage

database

Basic integration for databases has been implemented, abstracting the database dialect (Dialect) interface. For specific implementation details, please refer to the Database Storage Developer Guide.

stream

Primarily used for parsing byte streams, such as files, message queues, Elasticsearch, etc. The byte stream format can be CSV, JSON, XML, etc.

file

Focused on file parsing, including CSV, Excel, etc. It abstracts the InputStream and OutputStream interfaces. For specific implementation details, refer to the Developer Guide for Tabular File Storage.

tools

A collection of utilities for compilation, adding licenses, etc.

datax

build
go generate ./...

Release command used to register developer-created reader and writer plugins into the program's code.

Additionally, this command inserts compilation information such as software version, Git version, Go compiler version, and compilation time into the command line.

plugin

A plugin template creation tool for data sources. It's used to create a new reader or writer template, in conjunction with the release command, to reduce the developer's workload.

release

A packaging tool for the data synchronization program and user documentation.

license

Automatically adds a license to Go code files and formats the code using gofmt -s -w.

go run tools/license/main.go

Contributing

Contributions are welcome! Please read our Contributing Guide for details on how to contribute to this project.

Ways to Contribute

  • Report bugs and issues
  • Suggest new features
  • Submit pull requests
  • Improve documentation
  • Share your use cases

Getting Help