This repository was archived by the owner on Jul 16, 2024. It is now read-only.
Commit ff6aa37
feat: Add batch replayer (#227)
* Add BatchReplayer and PartitionedDataset classes
BatchReplayer has the same function as the current data-generator. However, it adds the missing file header and use Step Function + Lambda instead of other Athena. This is more cost-effective, but it requires data to be pre-partitioned with a manifest file.
* Restore projen tasks
* feat: Split lambda functions into 2 steps and write output files in paraellel
* fix: Fix bug that Map index isn't passed to Lambda correctly
* feat: write dataframe into multiple files that doesn't exceed given max size
* fix: remove custom workspace setting
* fix: main merge
Co-authored-by: Vincent Gromakowski <gromav@amazon.com>1 parent 38dd20b commit ff6aa37
File tree
17 files changed
+3382
-2164
lines changed- core
- .projen
- src
- data-generator
- resources/lambdas
- find-file-paths
- write-in-batch
- datasets
- test/unit/data-generator
17 files changed
+3382
-2164
lines changedSome generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| |||
0 commit comments