You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace sparkdl's ImageSchema with Spark2.3's version (#85)
Use Spark 2.3's ImageSchema as image interface.
The biggest change is using opposite ordering of color channels - BGR instead of RGB, requires extra reordering in various places. The change affects mostly the transformers and the udf creation functionality. Some noteworthy decisions:
- For DeepImageFeaturizer & DeepImagePredictor, we preserved ability to read and resize images in python using PIL to match Keras. Those image read & resize utilities are not recommended for external use as it's likely to cause confusion.
- For KerasImageFileTransformer and the keras udf creator, we assume the preprocessing function & model inputs work on RGB images since Keras works with RGB images.
- For TFImageTransformer, we added a param to specify the channel ordering expected by the tf.graph’s input layer. Having this param explicitly raises awareness that you could be doing the wrong thing, and makes the code easier to reason about.
Also needed a few tweaks to run with spark 2.3 - notably UDFs are now referenced by SQL identifier and can not have dash as part of the name.
[TODO]
- In order to run on spark < 2.3, the image schema files have been copied here and need to be removed in the future once Spark 2.3 is released.
- During this work we discovered that ImageSchema-related utilities in Spark 2.3 should support float32 types and a bit more info about the modes. Once that is done we can remove some code from this PR and use the Spark 2.3 version instead.
Copy file name to clipboardExpand all lines: README.md
+21-9Lines changed: 21 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,14 +80,23 @@ To try running the examples below, check out the Databricks notebook [DeepLearni
80
80
81
81
### Working with images in Spark
82
82
83
-
The first step to applying deep learning on images is the ability to load the images. Deep Learning Pipelines includes utility functions that can load millions of images into a Spark DataFrame and decode them automatically in a distributed fashion, allowing manipulation at scale.
83
+
The first step to applying deep learning on images is the ability to load the images. Spark and Deep Learning Pipelines include utility functions that can load millions of images into a Spark DataFrame and decode them automatically in a distributed fashion, allowing manipulation at scale.
image_df =imageIO.readImagesWithCustomFn("/data/myimages",decode_f=<your image library, see imageIO.PIL_decode>)
88
97
```
89
98
90
-
The resulting DataFrame contains a string column named "filePath" containing the path to each image file, and a image struct ("`SpImage`") column named "image" containing the decoded image data.
99
+
The resulting DataFrame contains a string column named "image" containing an image struct with schema == ImageSchema.
@@ -127,11 +136,13 @@ Spark DataFrames are a natural construct for applying deep learning models to a
127
136
There are many well-known deep learning models for images. If the task at hand is very similar to what the models provide (e.g. object recognition with ImageNet classes), or for pure exploration, one can use the Transformer `DeepImagePredictor` by simply specifying the model name.
128
137
129
138
```python
130
-
from sparkdl import readImages, DeepImagePredictor
@@ -140,7 +151,8 @@ Spark DataFrames are a natural construct for applying deep learning models to a
140
151
Deep Learning Pipelines provides a Transformer that will apply the given TensorFlow Graph to a DataFrame containing a column of images (e.g. loaded using the utilities described in the previous section). Here is a very simple example of how a TensorFlow Graph can be used with the Transformer. In practice, the TensorFlow Graph will likely be restored from files before calling `TFImageTransformer`.
141
152
142
153
```python
143
-
from sparkdl import readImages, TFImageTransformer
154
+
from sparkdl.image.image import ImageSchema
155
+
from sparkdl import TFImageTransformer
144
156
import sparkdl.graph.utils as tfx
145
157
from sparkdl.transformers import utils
146
158
import tensorflow as tf
@@ -155,7 +167,7 @@ Spark DataFrames are a natural construct for applying deep learning models to a
0 commit comments