|
1 | 1 | ## Eclipse Deeplearning4j: Data pipeline, DataVec Examples
|
2 | 2 |
|
3 | 3 | This project contains a set of examples that demonstrate how raw data in various formats can be loaded, split and preprocessed to build serializable (and hence reproducible) ETL pipelines using the DataVec library.
|
4 |
| - |
5 |
| -[Go back](../README.md) to the main repository page to explore other features/functionality of the **Eclipse DeeplearningJ** ecosystem. File an issue [here](https://github.com/eclipse/deeplearning4j-examples/issues) to request new features. |
6 |
| - |
| 4 | + |
| 5 | +[Go back](../README.md) to the main repository page to explore other features/functionality of the **Eclipse Deeplearning4J** ecosystem. File an issue [here](https://github.com/eclipse/deeplearning4j-examples/issues) to request new features. |
| 6 | + |
7 | 7 | The examples in this project and what they demonstrate are briefly described below. This is also the recommended order to explore them in.
|
8 | 8 |
|
9 | 9 | ### Loading Data
|
10 |
| -InputSplit and its implementations are utility classes for defining and managing a catalog of loadable locations (paths/files), in memory, that can later be exposed through an Iterator. In simple terms, they define where your data comes from or should be saved to, when building a data pipeline with DataVec. |
| 10 | +InputSplit and its implementations are utility classes for defining and managing a catalog of loadable locations (paths/files), in memory, that can later be exposed through an Iterator. In simple terms, they define where your data comes from or should be saved to, when building a data pipeline with DataVec. |
11 | 11 |
|
12 |
| -* [Ex01_FileSplitExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex01_FileSplitExample.java) |
| 12 | +* [Ex01_FileSplitExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex01_FileSplitExample.java) |
13 | 13 | Using FileSplit which loads files in a given location. Constructor overloading allows for varying functionality like filtering files to load, loading recursively etc
|
14 |
| -* [Ex02_CollectionSplitExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex02_CollectionSplitExample.java) |
| 14 | +* [Ex02_CollectionSplitExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex02_CollectionSplitExample.java) |
15 | 15 | Create a split from a collection of URIs
|
16 |
| -* [Ex03_NumberedFileInputSplitExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex03_NumberedFileInputSplitExample.java) |
| 16 | +* [Ex03_NumberedFileInputSplitExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex03_NumberedFileInputSplitExample.java) |
17 | 17 | Create a split from numbered files, following a common pattern like file1.txt, file2.txt ... file100.txt
|
18 |
| -* [Ex04_TransformSplitExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex04_TransformSplitExample.java) |
| 18 | +* [Ex04_TransformSplitExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex04_TransformSplitExample.java) |
19 | 19 | Maps URIs of a given split to new URIs. Useful when features and labels are in different files sharing a common naming scheme, and the name of the output file can be determined given the name of the input file. Eg. a-in.csv and a-out.csv
|
20 |
| -* [Ex05_SamplingBaseInputSplitExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex05_SamplingBaseInputSplitExample.java) |
| 20 | +* [Ex05_SamplingBaseInputSplitExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex05_SamplingBaseInputSplitExample.java) |
21 | 21 | Generate several splits from the main split say for training, validation and testing.
|
22 |
| -* [Ex06_KFoldIteratorFromDataSet.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex06_KFoldIteratorFromDataSet.java) |
| 22 | +* [Ex06_KFoldIteratorFromDataSet.java](./src/main/java/org/deeplearning4j/datapipelineexamples/loading/Ex06_KFoldIteratorFromDataSet.java) |
23 | 23 | Generate a K-Fold iterator from a dataset
|
24 | 24 |
|
25 | 25 | ### Cleaning, Transforming and Analysing Data
|
26 |
| -* [IrisCSVTransform.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/IrisCSVTransform.java) |
| 26 | +* [IrisCSVTransform.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/IrisCSVTransform.java) |
27 | 27 | A basic example that introduces users to important concepts like Schema and TransformProcess with categoricalToInteger.
|
28 |
| -* [CSVMixedDataTypesLocal.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/CSVMixedDataTypesLocal.java) |
| 28 | +* [CSVMixedDataTypesLocal.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/CSVMixedDataTypesLocal.java) |
29 | 29 | Common preprocessing steps like removing unnecessary columns, filtering based on column value, replacing invalid values, parsing date time etc
|
30 |
| -* [CSVMixedDataTypes.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/CSVMixedDataTypes.java) |
| 30 | +* [CSVMixedDataTypes.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/CSVMixedDataTypes.java) |
31 | 31 | Same as the above but with Apache Spark
|
32 |
| -* [PrintSchemasAtEachStep.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/debugging/PrintSchemasAtEachStep.java) |
| 32 | +* [PrintSchemasAtEachStep.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/debugging/PrintSchemasAtEachStep.java) |
33 | 33 | How to print schema at each step which would be useful for debugging transform scripts in a complicated pipeline
|
34 |
| -* [IrisAnalysis.java](./src/main/java/org/deeplearning4j/datapipelineexamples/analysis/IrisAnalysis.java) |
| 34 | +* [IrisAnalysis.java](./src/main/java/org/deeplearning4j/datapipelineexamples/analysis/IrisAnalysis.java) |
35 | 35 | Basic Analysis of the dataset saved and presented as an html file
|
36 |
| -* [IrisNormalizer.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/IrisNormalizer.java) |
| 36 | +* [IrisNormalizer.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/IrisNormalizer.java) |
37 | 37 | Proper useage of preprocessors with min max scaler
|
38 |
| -* [JoinExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/JoinExample.java) |
| 38 | +* [JoinExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/JoinExample.java) |
39 | 39 | Perform joins on datasets
|
40 |
| -* [PivotExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/PivotExample.java) |
41 |
| -Combine multiple independent records by key. |
42 |
| -* [WebLogDataExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/WebLogDataExample.java) |
| 40 | +* [PivotExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/PivotExample.java) |
| 41 | +Combine multiple independent records by key. |
| 42 | +* [WebLogDataExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/basic/WebLogDataExample.java) |
43 | 43 | Preprocessing/aggregation operations on some web log data
|
44 |
| -* [CustomReduceExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/custom/CustomReduceExample.java) |
| 44 | +* [CustomReduceExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/custom/CustomReduceExample.java) |
45 | 45 | Custom Reduction example for operations on some simple CSV data that involve a custom reduction.
|
46 |
| -* [MultiOpReduceExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/custom/MultiOpReduceExample.java) |
| 46 | +* [MultiOpReduceExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/transform/custom/MultiOpReduceExample.java) |
47 | 47 | Reduce example with multiple ops on one column
|
48 | 48 |
|
49 | 49 | ### Formats
|
50 |
| -* [CSVtoMapFileConversion.java](./src/main/java/org/deeplearning4j/datapipelineexamples/formats/hdfs/conversion/CSVtoMapFileConversion.java) |
| 50 | +* [CSVtoMapFileConversion.java](./src/main/java/org/deeplearning4j/datapipelineexamples/formats/hdfs/conversion/CSVtoMapFileConversion.java) |
51 | 51 | A simple example on how to convert a CSV text file to a Hadoop MapFile format for better performance and the convenience of randomization supported by the MapFileRecordReader
|
52 |
| -* [SVMLightExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/formats/svmlight/SVMLightExample.java) |
| 52 | +* [SVMLightExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/formats/svmlight/SVMLightExample.java) |
53 | 53 | MNIST SVMLight example
|
54 |
| -* [ImagePipelineExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/formats/image/ImagePipelineExample.java) |
| 54 | +* [ImagePipelineExample.java](./src/main/java/org/deeplearning4j/datapipelineexamples/formats/image/ImagePipelineExample.java) |
55 | 55 | An imagepipeline that also demonstrates using transforms to augment a small dataset
|
0 commit comments