Introduction

DataYoga is a framework for building and running streaming or batch data pipelines. DataYoga uses low-code to easily define data pipelines using a declarative markup language using YAML files.

Concepts

Job - A Job is composed of a series of Steps that reads information from a source, performs transformations, and write to a target. Many sources and targets are supported, including relational databases, non relational databases, file formats, cloud storage, and HTTP servers.

Step - Each Step runs a Block that uses specific business logic. The output of each Step is fed into the next Step, creating a chain of transformations.

Blocks - The Block defines the business logic. Blocks can:

Read and write from relational and non relational databases
Read, write, and parse data from local storage and cloud storage
Perform transformations, modify structure, add computed fields, rename fields, or remove fields
Enrich data from external sources and APIs

DataYoga Runtime

DataYoga provides a standalone stream processing engine, the DataYoga Runtime that validates and run Transformation Jobs. The Runtime provides:

Validation
Error handling
Metrics and observability
Credentials management

The Runtime supports multiple stream processing strategies including buffering and rate limiting. It supports both async processing, multi-threading, and multi-processing to enable maximum throughput with a low footprint.

Quickstart

pip install datayoga

Verify that the installation completed successfully by running this command:

datayoga --version

Create New DataYoga Project

To create a new DataYoga project, use the init command:

datayoga init hello_world
cd hello_world

Directory structure

Run Your First Job

Let's run our first job. It is pre-defined in the samples folder as part of the init command:

datayoga run sample.hello

If all goes well, you should see some startup logs, and eventually:

{"id": "1", "fname": "john", "lname": "doe", "credit_card": "1234-1234-1234-1234", "country_code": "972", "country_name": "israel", "gender": "M", "full_name": "John Doe", "greeting": "Hello Mr. John Doe"}
{"id": "2", "fname": "jane", "lname": "doe", "credit_card": "1000-2000-3000-4000", "country_code": "972", "country_name": "israel", "gender": "F", "full_name": "Jane Doe", "greeting": "Hello Ms. Jane Doe"}
{"id": "3", "fname": "bill", "lname": "adams", "credit_card": "9999-8888-7777-666", "country_code": "1", "country_name": "usa", "gender": "M", "full_name": "Bill Adams", "greeting": "Hello Mr. Bill Adams"}

That's it! You've created your first job that loads data from CSV, runs it through a series of transformation steps, and shows the data to the standard output. A good start. Read on for a more detailed tutorial or check out the reference to see the different block types currently available.

Name		Name	Last commit message	Last commit date
Latest commit History 1,256 Commits
.github/workflows		.github/workflows
.vscode		.vscode
cli		cli
core		core
demo		demo
dev		dev
docs		docs
examples		examples
integration-tests		integration-tests
schemas		schemas
.gitignore		.gitignore
.prettierignore		.prettierignore
LICENSE		LICENSE
README.md		README.md
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Concepts

DataYoga Runtime

Quickstart

Create New DataYoga Project

Run Your First Job

About

Uh oh!

Releases

Packages

Languages

License

ZdravkoDonev-redis/datayoga

Folders and files

Latest commit

History

Repository files navigation

Introduction

Concepts

DataYoga Runtime

Quickstart

Create New DataYoga Project

Run Your First Job

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages