You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now that we have prepared our catalog, we simply pass the DataFrame or CSV string to the `raster` DataSource to load the imagery. The `catalog_col_names` parameter gives the columns that contain the URI's to be read.
If a band is passed into `band_indexes` that exceeds the number of bands in the raster, a projected raster column will still be generated in the schema but the column will be full of `null` values.
181
176
182
-
You can also pass a `catalog` and `band_indexes` together into the `raster` reader. This will create a projected raster column for the combination of all items passed into `catalog_col_names` and `band_indexes`. Again if a band in `band_indexes` exceeds the number of bands in a raster, it will have a `null` value for the corresponding column.
177
+
You can also pass a _catalog_ and `band_indexes` together into the `raster` reader. This will create a projected raster column for the combination of all items in `catalog_col_names` and `band_indexes`. Again if a band in `band_indexes` exceeds the number of bands in a raster, it will have a `null` value for the corresponding column.
183
178
184
179
Here is a trivial example with a _catalog_ over multiband rasters. We specify two columns containing URIs and two bands, resulting in four projected raster columns.
Returns a Spark DataFrame from raster data files specified by URIs.
121
+
Each row in the returned DataFrame will contain a column with struct of (CRS, Extent, Tile) for each item in
122
+
`catalog_col_names`.
123
+
Multiple bands from the same raster file are spread across rows of the DataFrame. See `band_indexes` param.
124
+
If bands from a scene are stored in separate files, provide a DataFrame to the `source` parameter.
125
+
126
+
For more details and example usage, consult https://rasterframes.io/raster-read.html
127
+
128
+
:param source: a string, list of strings, list of lists of strings, a Pandas DataFrame or a Spark DataFrame giving URIs to the raster data to read.
129
+
:param catalog_col_names: required if `source` is a DataFrame or CSV string. It is a list of strings giving the names of columns containing URIs to read.
130
+
:param band_indexes: list of integers indicating which bands, zero-based, to read from the raster files specified; default is to read only the first band.
131
+
:param tile_dimensions: tuple or list of two indicating the default tile dimension as (columns, rows).
132
+
:param lazy_tiles: If true (default) only generate minimal references to tile contents; if false, fetch tile cell values.
133
+
:param options: Additional keyword arguments to pass to the Spark DataSource.
134
+
"""
120
135
121
136
frompandasimportDataFrameasPdDataFrame
122
137
138
+
if'catalog'inoptions:
139
+
source=options['catalog'] # maintain back compatibility with 0.8.0
140
+
123
141
defto_csv(comp):
124
142
ifisinstance(comp, str):
125
143
returncomp
@@ -135,37 +153,75 @@ def temp_name():
135
153
band_indexes= [0]
136
154
137
155
options.update({
138
-
"bandIndexes": to_csv(band_indexes),
139
-
"tileDimensions": to_csv(tile_dimensions),
140
-
"lazyTiles": lazy_tiles
156
+
"band_indexes": to_csv(band_indexes),
157
+
"tile_dimensions": to_csv(tile_dimensions),
158
+
"lazy_tiles": lazy_tiles
141
159
})
142
160
161
+
# Parse the `source` argument
162
+
path=None# to pass into `path` param
163
+
ifisinstance(source, list):
164
+
ifall([isinstance(i, str) foriinsource]):
165
+
path=None
166
+
catalog=None
167
+
options.update(dict(paths='\n'.join([str(i) foriinsource]))) # pass in "uri1\nuri2\nuri3\n..."
168
+
ifall([isinstance(i, list) foriinsource]):
169
+
# list of lists; we will rely on pandas to:
170
+
# - coerce all data to str (possibly using objects' __str__ or __repr__)
171
+
# - ensure data is not "ragged": all sublists are same len
172
+
path=None
173
+
catalog_col_names= ['proj_raster_{}'.format(i) foriinrange(len(source[0]))] # assign these names
174
+
catalog=PdDataFrame(source,
175
+
columns=catalog_col_names,
176
+
dtype=str,
177
+
)
178
+
elifisinstance(source, str):
179
+
if'\n'insourceor'\r'insource:
180
+
# then the `source` string is a catalog as a CSV (header is required)
181
+
path=None
182
+
catalog=source
183
+
else:
184
+
# interpret source as a single URI string
185
+
path=source
186
+
catalog=None
187
+
else:
188
+
# user has passed in some other type, we will try to interpret as a catalog
189
+
catalog=source
190
+
143
191
ifcatalogisnotNone:
144
192
ifcatalog_col_namesisNone:
145
193
raiseException("'catalog_col_names' required when DataFrame 'catalog' specified")
0 commit comments