|
| 1 | +<div align="center"> |
| 2 | +<br/> |
| 3 | +<p align="center"> |
| 4 | + <i>This repository is part of <a href="https://sdv.dev">The Synthetic Data Vault Project</a>, a project from <a href="https://datacebo.com">DataCebo</a>.</i> |
| 5 | +</p> |
| 6 | + |
| 7 | +[](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) |
| 8 | +[](https://pypi.python.org/pypi/deepecho) |
| 9 | +[](https://github.com/sdv-dev/DeepEcho/actions?query=workflow%3A%22Run+Tests%22+branch%3Amain) |
| 10 | +[](https://pepy.tech/project/deepecho) |
| 11 | +[](https://codecov.io/gh/sdv-dev/DeepEcho) |
| 12 | +[](https://mybinder.org/v2/gh/sdv-dev/DeepEcho/main?filepath=tutorials/timeseries_data) |
| 13 | +[](https://bit.ly/sdv-slack-invite) |
| 14 | + |
| 15 | +<div align="left"> |
| 16 | +<br/> |
| 17 | +<p align="center"> |
| 18 | +<a href="https://github.com/sdv-dev/DeepEcho"> |
| 19 | +<img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/DeepEcho-DataCebo.png"></img> |
| 20 | +</a> |
| 21 | +</p> |
| 22 | +</div> |
| 23 | + |
| 24 | +</div> |
| 25 | + |
| 26 | +# Overview |
| 27 | + |
| 28 | +**DeepEcho** is a **Synthetic Data Generation** Python library for **mixed-type**, **multivariate |
| 29 | +time series**. It provides: |
| 30 | + |
| 31 | +1. Multiple models based both on **classical statistical modeling** of time series and the latest |
| 32 | + in **Deep Learning** techniques. |
| 33 | +2. A robust [benchmarking framework](https://github.com/sdv-dev/SDGym) for evaluating these methods |
| 34 | + on multiple datasets and with multiple metrics. |
| 35 | +3. Ability for **Machine Learning researchers** to submit new methods following our `model` and |
| 36 | + `sample` API and get evaluated. |
| 37 | + |
| 38 | +| Important Links | | |
| 39 | +| --------------------------------------------- | -------------------------------------------------------------------- | |
| 40 | +| :computer: **[Website]** | Check out the SDV Website for more information about the project. | |
| 41 | +| :orange_book: **[SDV Blog]** | Regular publshing of useful content about Synthetic Data Generation. | |
| 42 | +| :book: **[Documentation]** | Quickstarts, User and Development Guides, and API Reference. | |
| 43 | +| :octocat: **[Repository]** | The link to the Github Repository of this library. | |
| 44 | +| :keyboard: **[Development Status]** | This software is in its Pre-Alpha stage. | |
| 45 | +| [![][Slack Logo] **Community**][Community] | Join our Slack Workspace for announcements and discussions. | |
| 46 | +| [![][MyBinder Logo] **Tutorials**][Tutorials] | Run the SDV Tutorials in a Binder environment. | |
| 47 | + |
| 48 | +[Website]: https://sdv.dev |
| 49 | +[SDV Blog]: https://sdv.dev/blog |
| 50 | +[Documentation]: https://sdv.dev/SDV |
| 51 | +[Repository]: https://github.com/sdv-dev/DeepEcho |
| 52 | +[License]: https://github.com/sdv-dev/DeepEcho/blob/main/LICENSE |
| 53 | +[Development Status]: https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha |
| 54 | +[Slack Logo]: https://github.com/sdv-dev/SDV/blob/stable/docs/images/slack.png |
| 55 | +[Community]: https://bit.ly/sdv-slack-invite |
| 56 | +[MyBinder Logo]: https://github.com/sdv-dev/SDV/blob/stable/docs/images/mybinder.png |
| 57 | +[Tutorials]: https://mybinder.org/v2/gh/sdv-dev/DeepEcho/main?filepath=tutorials |
| 58 | + |
| 59 | +# Install |
| 60 | + |
| 61 | +**DeepEcho** is part of the **SDV** project and is automatically installed alongside it. For |
| 62 | +details about this process please visit the [SDV Installation Guide]( |
| 63 | +https://sdv.dev/SDV/getting_started/install.html) |
| 64 | + |
| 65 | +Optionally, **DeepEcho** can also be installed as a standalone library using the following commands: |
| 66 | + |
| 67 | +**Using `pip`:** |
| 68 | + |
| 69 | +```bash |
| 70 | +pip install deepecho |
| 71 | +``` |
| 72 | + |
| 73 | +**Using `conda`:** |
| 74 | + |
| 75 | +```bash |
| 76 | +conda install -c pytorch -c conda-forge deepecho |
| 77 | +``` |
| 78 | + |
| 79 | +For more installation options please visit the [DeepEcho installation Guide](INSTALL.md) |
| 80 | + |
| 81 | +# Quickstart |
| 82 | + |
| 83 | +**DeepEcho** is included as part of [SDV](https://sdv.dev/SDV) to model and sample synthetic |
| 84 | +time series. In most cases, usage through SDV is recommeded, since it provides additional |
| 85 | +functionalities which are not available here. For more details about how to use DeepEcho |
| 86 | +whithin SDV, please visit the corresponding User Guide: |
| 87 | + |
| 88 | +* [SDV TimeSeries User Guide](https://sdv.dev/SDV/user_guides/timeseries/par.html) |
| 89 | + |
| 90 | +## Standalone usage |
| 91 | + |
| 92 | +**DeepEcho** can also be used as a standalone library. |
| 93 | + |
| 94 | +In this short quickstart, we show how to learn a mixed-type multivariate time series |
| 95 | +dataset and then generate synthetic data that resembles it. |
| 96 | + |
| 97 | +We will start by loading the data and preparing the instance of our model. |
| 98 | + |
| 99 | +```python3 |
| 100 | +from deepecho import PARModel |
| 101 | +from deepecho.demo import load_demo |
| 102 | + |
| 103 | +# Load demo data |
| 104 | +data = load_demo() |
| 105 | + |
| 106 | +# Define data types for all the columns |
| 107 | +data_types = { |
| 108 | + 'region': 'categorical', |
| 109 | + 'day_of_week': 'categorical', |
| 110 | + 'total_sales': 'continuous', |
| 111 | + 'nb_customers': 'count', |
| 112 | +} |
| 113 | + |
| 114 | +model = PARModel(cuda=False) |
| 115 | +``` |
| 116 | + |
| 117 | +If we want to use different settings for our model, like increasing the number |
| 118 | +of epochs or enabling CUDA, we can pass the arguments when creating the model: |
| 119 | + |
| 120 | +```python # keep this as python (without the 3) to avoid using it in test-readme |
| 121 | +model = PARModel(epochs=1024, cuda=True) |
| 122 | +``` |
| 123 | + |
| 124 | +Notice that for smaller datasets like the one used on this demo, CUDA usage introduces |
| 125 | +more overhead than the gains it obtains from parallelization, so the process in this |
| 126 | +case is more efficient without CUDA, even if it is available. |
| 127 | + |
| 128 | +Once we have created our instance, we are ready to learn the data and generate |
| 129 | +new synthetic data that resembles it: |
| 130 | + |
| 131 | +```python3 |
| 132 | +# Learn a model from the data |
| 133 | +model.fit( |
| 134 | + data=data, |
| 135 | + entity_columns=['store_id'], |
| 136 | + context_columns=['region'], |
| 137 | + data_types=data_types, |
| 138 | + sequence_index='date' |
| 139 | +) |
| 140 | + |
| 141 | +# Sample new data |
| 142 | +model.sample(num_entities=5) |
| 143 | +``` |
| 144 | + |
| 145 | +The output will be a table with synthetic time series data with the same properties to |
| 146 | +the demo data that we used as input. |
| 147 | + |
| 148 | +# What's next? |
| 149 | + |
| 150 | +For more details about **DeepEcho** and all its possibilities and features, please check and |
| 151 | +run the [tutorials](tutorials). |
| 152 | + |
| 153 | +If you want to see how we evaluate the performance and quality of our models, please have a |
| 154 | +look at the [SDGym Benchmarking framework](https://github.com/sdv-dev/SDGym). |
| 155 | + |
| 156 | +Also, please feel welcome to visit [our contributing guide](CONTRIBUTING.rst) in order to help |
| 157 | +us developing new features or cool ideas! |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | + |
| 162 | +<div align="center"> |
| 163 | +<a href="https://datacebo.com"><img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/stable/docs/images/DataCebo.png"></img></a> |
| 164 | +</div> |
| 165 | +<br/> |
| 166 | +<br/> |
| 167 | + |
| 168 | +[The Synthetic Data Vault Project](https://sdv.dev) was first created at MIT's [Data to AI Lab]( |
| 169 | +https://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we |
| 170 | +created [DataCebo](https://datacebo.com) in 2020 with the goal of growing the project. |
| 171 | +Today, DataCebo is the proud developer of SDV, the largest ecosystem for |
| 172 | +synthetic data generation & evaluation. It is home to multiple libraries that support synthetic |
| 173 | +data, including: |
| 174 | + |
| 175 | +* 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data. |
| 176 | +* 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular, |
| 177 | + multi table and time series data. |
| 178 | +* 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data |
| 179 | + generation models. |
| 180 | + |
| 181 | +[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully |
| 182 | +integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries |
| 183 | +for specific needs. |
0 commit comments