Skip to content

Commit ada9d89

Browse files
gadomskiTom Augspurger
andauthored
New dataset: FWS National Wetlands Inventory (#102)
* dataset: add fws-nwi Co-authored-by: Tom Augspurger <[email protected]>
1 parent 38352af commit ada9d89

File tree

8 files changed

+425
-0
lines changed

8 files changed

+425
-0
lines changed

datasets/fws-nwi/Dockerfile

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
FROM ubuntu:20.04
2+
3+
# Setup timezone info
4+
ENV TZ=UTC
5+
6+
ENV LC_ALL=C.UTF-8
7+
ENV LANG=C.UTF-8
8+
9+
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
10+
11+
RUN apt-get update && apt-get install -y software-properties-common
12+
13+
RUN add-apt-repository ppa:ubuntugis/ppa && \
14+
apt-get update && \
15+
apt-get install -y build-essential python3-dev python3-pip \
16+
jq unzip ca-certificates wget curl git && \
17+
apt-get autoremove && apt-get autoclean && apt-get clean
18+
19+
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 10
20+
21+
# See https://github.com/mapbox/rasterio/issues/1289
22+
ENV CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
23+
24+
# Install Python 3.8
25+
RUN curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh" \
26+
&& bash "Mambaforge-$(uname)-$(uname -m).sh" -b -p /opt/conda \
27+
&& rm -rf "Mambaforge-$(uname)-$(uname -m).sh"
28+
29+
ENV PATH /opt/conda/bin:$PATH
30+
ENV LD_LIBRARY_PATH /opt/conda/lib/:$LD_LIBRARY_PATH
31+
32+
RUN mamba install -y -c conda-forge python=3.8 gdal=3.3.3 pip setuptools cython numpy==1.21.5
33+
34+
RUN python -m pip install --upgrade pip
35+
36+
# Install common packages
37+
COPY requirements-task-base.txt /tmp/requirements.txt
38+
RUN python -m pip install --no-build-isolation -r /tmp/requirements.txt
39+
40+
#
41+
# Copy and install packages
42+
#
43+
44+
COPY pctasks/core /opt/src/pctasks/core
45+
RUN cd /opt/src/pctasks/core && \
46+
pip install .
47+
48+
COPY pctasks/cli /opt/src/pctasks/cli
49+
RUN cd /opt/src/pctasks/cli && \
50+
pip install .
51+
52+
COPY pctasks/task /opt/src/pctasks/task
53+
RUN cd /opt/src/pctasks/task && \
54+
pip install .
55+
56+
COPY pctasks/client /opt/src/pctasks/client
57+
RUN cd /opt/src/pctasks/client && \
58+
pip install .
59+
60+
COPY pctasks/ingest /opt/src/pctasks/ingest
61+
RUN cd /opt/src/pctasks/ingest && \
62+
pip install .
63+
64+
COPY pctasks/dataset /opt/src/pctasks/dataset
65+
RUN cd /opt/src/pctasks/dataset && \
66+
pip install .
67+
68+
COPY ./datasets/fws-nwi/requirements.txt /opt/src/datasets/fws-nwi/requirements.txt
69+
RUN python3 -m pip install -r /opt/src/datasets/fws-nwi/requirements.txt
70+
71+
# Setup Python Path to allow import of test modules
72+
ENV PYTHONPATH=/opt/src:$PYTHONPATH
73+
74+
WORKDIR /opt/src

datasets/fws-nwi/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# planetary-computer-tasks dataset: fws-nwi
2+
3+
## Building the Docker image
4+
5+
To build and push a custom docker image to our container registry:
6+
7+
```shell
8+
az acr build -r {the registry} --subscription {the subscription} -t pctasks-fws-nwi:latest -f datasets/fws-nwi/Dockerfile .
9+
```
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
The Wetlands Data Layer is the product of over 45 years of work by the National Wetlands Inventory (NWI) and its collaborators and currently contains more than 35 million wetland and deepwater features. This dataset, covering the conterminous United States, Hawaii, Puerto Rico, the Virgin Islands, Guam, the major Northern Mariana Islands and Alaska, continues to grow at a rate of 50 to 100 million acres annually as data are updated. The data layer is updated twice a year and these changes are reflected on the mapper and in the data downloads.
2+
3+
**NOTE:** Due to the variation in use and analysis of this data by the end user, each state's wetlands data extends beyond the state boundary. Each state includes wetlands data that intersect the 1:24,000 quadrangles that contain part of that state (1:2,000,000 source data). This allows the user to clip the data to their specific analysis datasets. Beware that two adjacent states will contain some of the same data along their borders.
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
{
2+
"type": "Collection",
3+
"id": "fws-nwi",
4+
"stac_version": "1.0.0",
5+
"description": "{{ collection.description }}",
6+
"links": [
7+
{
8+
"rel": "describedby",
9+
"href": "https://www.fws.gov/wetlands/Data/metadata/FWS_Wetlands.xml",
10+
"type": "application/xml",
11+
"title": "Wetlands metadata"
12+
},
13+
{
14+
"rel": "about",
15+
"href": "https://www.fws.gov/sites/default/files/documents/national-wetlands-inventory-fact-sheet.pdf",
16+
"type": "application/pdf",
17+
"title": "Project Fact Sheet"
18+
},
19+
{
20+
"rel": "about",
21+
"href": "https://www.fws.gov/program/national-wetlands-inventory",
22+
"type": "text/html",
23+
"title": "Project Landing Page"
24+
},
25+
{
26+
"rel": "license",
27+
"href": "http://www.usa.gov/publicdomain/label/1.0/",
28+
"type": "text/html",
29+
"title": "US Public Domain"
30+
}
31+
],
32+
"stac_extensions": [
33+
"https://stac-extensions.github.io/item-assets/v1.0.0/schema.json"
34+
],
35+
"item_assets": {
36+
"zip": {
37+
"type": "application/zip",
38+
"roles": [
39+
"data",
40+
"archive",
41+
"source"
42+
]
43+
}
44+
},
45+
"msft:short_description": "Vector dataset containing wetlands boundaries and identification across the United States.",
46+
"msft:storage_account": "ai4edataeuwest",
47+
"msft:container": "fws-nwi",
48+
"title": "FWS National Wetlands Inventory",
49+
"extent": {
50+
"spatial": {
51+
"bbox": [
52+
[
53+
-64.54958,
54+
13.16667,
55+
144.6,
56+
71.99633
57+
],
58+
[
59+
144.6,
60+
13.16667,
61+
180.0,
62+
71.99633
63+
],
64+
[
65+
-180.0,
66+
13.16667,
67+
-64.54958,
68+
71.99633
69+
]
70+
]
71+
},
72+
"temporal": {
73+
"interval": [
74+
[
75+
"2022-10-01T00:00:00Z",
76+
"2022-10-01T00:00:00Z"
77+
]
78+
]
79+
}
80+
},
81+
"license": "proprietary",
82+
"keywords": [
83+
"USFWS",
84+
"Wetlands",
85+
"United States"
86+
],
87+
"providers": [
88+
{
89+
"name": "U.S. Fish and Wildlife Service",
90+
"description": "The U.S. Fish and Wildlife Service is the principal federal agency tasked with providing information to the public on the extent and status of the nation's wetland and deepwater habitats, as well as changes to these habitats over time.",
91+
"roles": [
92+
"producer",
93+
"licensor"
94+
],
95+
"url": "https://www.fws.gov",
96+
"email": "[email protected]"
97+
},
98+
{
99+
"name": "Microsoft",
100+
"roles": [
101+
"host"
102+
],
103+
"url": "https://planetarycomputer.microsoft.com"
104+
}
105+
],
106+
"summaries": {
107+
"fws_nwi:state": [
108+
"Alabama",
109+
"Alaska",
110+
"Arizona",
111+
"Arkansas",
112+
"California",
113+
"Colorado",
114+
"Connecticut",
115+
"Delaware",
116+
"District of Columbia",
117+
"Florida",
118+
"Georgia",
119+
"Hawaii",
120+
"Idaho",
121+
"Illinois",
122+
"Indiana",
123+
"Iowa",
124+
"Kansas",
125+
"Kentucky",
126+
"Louisiana",
127+
"Maine",
128+
"Maryland",
129+
"Massachusetts",
130+
"Michigan",
131+
"Minnesota",
132+
"Mississippi",
133+
"Missouri",
134+
"Montana",
135+
"Nebraska",
136+
"Nevada",
137+
"New Hampshire",
138+
"New Jersey",
139+
"New Mexico",
140+
"New York",
141+
"North Carolina",
142+
"North Dakota",
143+
"Ohio",
144+
"Oklahoma",
145+
"Oregon",
146+
"Pacific Trust Islands",
147+
"Pennsylvania",
148+
"Puerto Rico and Virgin Islands",
149+
"Rhode Island",
150+
"South Carolina",
151+
"South Dakota",
152+
"Tennessee",
153+
"Texas",
154+
"Utah",
155+
"Vermont",
156+
"Virginia",
157+
"Washington",
158+
"West Virginia",
159+
"Wisconsin",
160+
"Wyoming"
161+
],
162+
"fws_nwi:state_code": [
163+
"AL",
164+
"AK",
165+
"AZ",
166+
"AR",
167+
"CA",
168+
"CO",
169+
"CT",
170+
"DE",
171+
"DC",
172+
"FL",
173+
"GA",
174+
"HI",
175+
"ID",
176+
"IL",
177+
"IN",
178+
"IA",
179+
"KS",
180+
"KY",
181+
"LA",
182+
"ME",
183+
"MD",
184+
"MA",
185+
"MI",
186+
"MN",
187+
"MS",
188+
"MO",
189+
"MT",
190+
"NE",
191+
"NV",
192+
"NH",
193+
"NJ",
194+
"NM",
195+
"NY",
196+
"NC",
197+
"ND",
198+
"OH",
199+
"OK",
200+
"OR",
201+
"PacTrust",
202+
"PA",
203+
"PRVI",
204+
"RI",
205+
"SC",
206+
"SD",
207+
"TN",
208+
"TX",
209+
"UT",
210+
"VT",
211+
"VA",
212+
"WA",
213+
"WV",
214+
"WI",
215+
"WY"
216+
],
217+
"fws_nwi:content": [
218+
"riparian",
219+
"historic_wetlands",
220+
"wetlands"
221+
]
222+
},
223+
"assets": {
224+
"thumbnail": {
225+
"href": "https://ai4edatasetspublicassets.blob.core.windows.net/assets/pc_thumbnails/fws-nwi-thumb.png",
226+
"type": "image/png",
227+
"title": "FWS National Wetlands Inventory thumbnail",
228+
"roles": [
229+
"thumbnail"
230+
]
231+
}
232+
}
233+
}

datasets/fws-nwi/dataset.yaml

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
id: fws-nwi
2+
image: ${{ args.registry }}/pctasks-fws-nwi:latest
3+
args:
4+
- registry
5+
code:
6+
src: ${{ local.path(./fws_nwi.py) }}
7+
requirements: ${{ local.path(./requirements.txt) }}
8+
environment:
9+
AZURE_TENANT_ID: ${{ secrets.task-tenant-id }}
10+
AZURE_CLIENT_ID: ${{ secrets.task-client-id }}
11+
AZURE_CLIENT_SECRET: ${{ secrets.task-client-secret }}
12+
collections:
13+
- id: fws-nwi
14+
template: ${{ local.path(./collection) }}
15+
class: fws_nwi:FwsNwiCollection
16+
asset_storage:
17+
- uri: blob://ai4edataeuwest/fws-nwi/
18+
chunks:
19+
options:
20+
extensions: [.zip]
21+
chunk_length: 1
22+
chunk_storage:
23+
uri: blob://ai4edataeuwest/fws-nwi-etl-data/

datasets/fws-nwi/fws_nwi.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
from pathlib import Path
2+
from tempfile import TemporaryDirectory
3+
from typing import List
4+
5+
from pystac import Item
6+
from stactools.fws_nwi import stac
7+
from stactools.fws_nwi.constants import ZIPFILE_ASSET_KEY
8+
9+
from pctasks.core.storage import StorageFactory
10+
from pctasks.dataset.collection import Collection
11+
12+
GEOPARQUET_CONTAINER = "blob://ai4edataeuwest/fws-nwi/geoparquet"
13+
14+
15+
class FwsNwiCollection(Collection):
16+
@classmethod
17+
def create_item(cls, asset_uri: str, storage_factory: StorageFactory) -> List[Item]:
18+
with TemporaryDirectory() as temporary_directory:
19+
storage, path = storage_factory.get_storage_for_file(asset_uri)
20+
local_path = Path(temporary_directory, Path(path).name)
21+
storage.download_file(path, str(local_path))
22+
item = stac.create_item(Path(local_path), Path(temporary_directory))
23+
24+
geoparquet_storage = storage_factory.get_storage(
25+
f"{GEOPARQUET_CONTAINER}/{item.id}"
26+
)
27+
28+
for key, asset in item.assets.items():
29+
if key == ZIPFILE_ASSET_KEY:
30+
item.assets[key].href = storage.get_url(path)
31+
else:
32+
geoparquet_path = Path(asset.href)
33+
geoparquet_storage.upload_file(asset.href, geoparquet_path.name)
34+
item.assets[key].href = geoparquet_storage.get_url(
35+
geoparquet_path.name
36+
)
37+
return [item]

datasets/fws-nwi/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
stactools-fws-nwi == 0.2.0

0 commit comments

Comments
 (0)