Skip to content

Commit f1b94af

Browse files
777arcMarc Lichtmanghidalgo3tanyamarton
authored
Add hls2 dataset (#320)
* first cut * storage account name * switch to using two collections * rename to hls2 * fix depth * working! * switched blob:// to href:// * filled in item_assets for both * add msft:group_id * readme * readme * verify all assets exist in blob * rename * default render configs for hls2 * Tanya's render configs * Document publishing HLS2 collection configs * readme * readme * disable integration tests for now * don't use pydantic json serialization * format template and add thumbnails * fix CLI issues * all sentinel-2 render_options added to hls2-s30 * remove no_data=0 * convert hls2-l30 render_options to match hls2-s30 * Add sentinel-2c and make all platforms arrays * test as unit tests with pytest * set collection ground sampling distance * description update * Code change to support python 3.12 * format code * GOES-19 CMI * mosaics * Add ESA as a producer * Undo goes changes * delete goes19 only workflow * clean up readme and dockerfile --------- Co-authored-by: Marc Lichtman <marclichtman@microsoft.com> Co-authored-by: Gustavo Hidalgo <zambrano.hidalgo@gmail.com> Co-authored-by: Tanya Marton <tanyamarton@microsoft.com> Co-authored-by: Gustavo Hidalgo <guhidalgo@microsoft.com>
1 parent 4fcf776 commit f1b94af

File tree

12 files changed

+4863
-0
lines changed

12 files changed

+4863
-0
lines changed

Dockerfile.task_base

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
FROM mcr.microsoft.com/azurelinux/base/python:3.12
22

3+
RUN tdnf install ca-certificates git azure-cli -y \
4+
&& tdnf clean all
5+
36
# Setup timezone info
47
ENV TZ=UTC
58

datasets/hls2/README.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# HLS v2
2+
3+
The Sentinel and Landsat portions of this dataset are in two collections: `hls2-l30` and `hls2-s30`.
4+
5+
All management operations must be repeated for both collections to maintain parity.
6+
7+
## Collection Publish
8+
9+
```bash
10+
cd datasets/hls2
11+
pctasks dataset ingest-collection -d dataset.yaml -c hls2-l30 -a registry pccomponents --submit
12+
pctasks dataset ingest-collection -d dataset.yaml -c hls2-s30 -a registry pccomponents --submit
13+
pctasks runs status <workflow id from output>
14+
<wait for it to succeede>
15+
curl https://planetarycomputer.microsoft.com/api/stac/v1/collections/hls2-l30
16+
curl https://planetarycomputer.microsoft.com/api/stac/v1/collections/hls2-s30
17+
```
18+
19+
Notes:
20+
21+
- any future updates, do the command with the `-u` flag for "upsert"
22+
- `-c` is needed in this case because we have 2 different collections in a single `dataset.yaml`.
23+
- `--limit` limits the number of STAC Items being processed
24+
25+
## Ingestion Workflow Publish
26+
27+
Publish the ingestion update workflows with:
28+
29+
```bash
30+
pctasks dataset process-items -d dataset.yaml -c hls2-s30 initial-ingest -a registry pccomponents.azurecr.io --is-update-workflow --workflow-id hls2-s30-update --upsert
31+
pctasks dataset process-items -d dataset.yaml -c hls2-l30 initial-ingest -a registry pccomponents.azurecr.io --is-update-workflow --workflow-id hls2-l30-update --upsert
32+
```
33+
34+
It's important to match the `--workflow-id` with the ID used by the cron jobs!
35+
36+
Once this is done, you only need to run it again if you change code in `hls2.py` or PCTasks code itself.
37+
Under normal operations, the cron jobs will take care of submitting the update workflow.
38+
39+
## Manually Process Items
40+
41+
```bash
42+
pctasks dataset process-items -d dataset.yaml -c hls2-l30 test-ingest -a registry pccomponents.azurecr.io --limit 100 --submit
43+
44+
# or do the whole thing with
45+
pctasks dataset process-items -d dataset.yaml -c hls2-l30 initial-ingest -a registry pccomponents.azurecr.io --upsert --submit
46+
pctasks dataset process-items -d dataset.yaml -c hls2-s30 initial-ingest -a registry pccomponents.azurecr.io --upsert --submit
47+
```
48+
49+
The last one was left running overnight to ingest all currently onboarded items around April 08, 2025.
50+
51+
or for test-
52+
53+
```bash
54+
pctasks profile list
55+
pctasks profile set openpctest
56+
pctasks dataset process-items -d dataset.yaml -c hls2-l30 test-ingest -a registry pccomponentstest.azurecr.io --submit
57+
pctasks dataset process-items -d dataset.yaml -c hls2-s30 test-ingest -a registry pccomponentstest.azurecr.io --submit
58+
pctasks runs status <workflow id from output>
59+
pctasks runs get run-log <workflow id from output>
60+
pctasks runs get task-log <workflow id from output> create-splits create-splits -p 0
61+
curl https://planetarycomputer.microsoft.com/api/stac/v1/collections/hls2-l30/items
62+
```
63+
64+
To get the workflow to work with cron, use:
65+
66+
```bash
67+
pctasks workflow update datasets/workflows/stac-geoparquet.yaml
68+
```
69+
70+
within cronjob-
71+
"workflow_id": "hls2-s30-update"
72+
"workflow_id": "hls2-l30-update"
73+
74+
## Publishing Collection Configurations
75+
76+
You will need access to the `mspc` CLI, then you may run:
77+
78+
```shell
79+
repo_root=$(git rev-parse --show-toplevel)
80+
environment=test # or green/blue depending on what is active
81+
82+
mspc collections render-configs publish $environment $repo_root hls2-l30
83+
mspc collections render-configs publish $environment $repo_root hls2-s30
84+
```
85+
86+
You can also update monthly mosaics in the collection configuration with:
87+
88+
```shell
89+
mspc collections render-configs update-monthly-mosaics $repo_root hls2-l30 2023-03-01 2025-04-01
90+
mspc collections render-configs update-monthly-mosaics $repo_root hls2-s30 2023-03-01 2025-04-01
91+
```

0 commit comments

Comments
 (0)