Skip to content

Commit 38949ad

Browse files
committed
💥 Let subglacial lake integration tests use dvc tracked parq files
Tracking the df_dhdt_*.parquet files via DVC and storing them on DAGsHub! No more of that one step behind GitHub release nonsense like in 22eb2d1. Only tracking four files (<1GB each) because DAGsHub has a 10GB limit (I think). Default dvc remote has been set to https://dagshub.com/weiji14/deepicedrain. Also updated integration tests in deepicedrain/features/subglacial_lakes.feature to run on 1 extra cycle (cycle 9), which was the whole point of this exercise.
1 parent ca79a32 commit 38949ad

10 files changed

+67
-39
lines changed

.dvc/config

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
1+
[core]
2+
remote = origin
13
['remote "origin"']
24
url = https://dagshub.com/weiji14/deepicedrain.dvc

.github/workflows/python-app.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,14 @@ jobs:
5858
- name: Install deepicedrain package
5959
run: poetry install
6060

61+
# Pull test data from dvc remote (DAGsHub)
62+
- name: Pull test data from dvc remote
63+
run: |
64+
dvc pull ATLXI/df_dhdt_slessor_downstream.parquet \
65+
ATLXI/df_dhdt_whillans_upstream.parquet \
66+
ATLXI/df_dhdt_whillans_downstream.parquet
67+
ls -lhR ATLXI/
68+
6169
- name: Display virtualenv and installed package information
6270
run: |
6371
conda info

.gitignore

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,11 @@ MANIFEST
2929

3030
# Data files and folders
3131
**/*.h5
32-
ATL06.003
32+
ATL06.00?
3333
ATL11.00?
3434
ATL11.00?z123
35-
ATLXI
35+
ATLXI/df_*.parquet
36+
ATLXI/ds_*.zarr
3637
Quantarctica3
3738

3839
# Subglacial Lake grid files and figures
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
outs:
2+
- md5: 666db6ba87ac356a2a56721ac8ce1cb1
3+
size: 920013073
4+
path: df_dhdt_amundsen_sea_embayment.parquet
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
outs:
2+
- md5: 806ee203c238f9265c6c731db132eaa2
3+
size: 286877799
4+
path: df_dhdt_slessor_downstream.parquet
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
outs:
2+
- md5: 1acbe196854251c1b0cf7b0679419193
3+
size: 483136682
4+
path: df_dhdt_whillans_downstream.parquet
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
outs:
2+
- md5: ad02a0f68976f62de35195647f0d3e8e
3+
size: 369369447
4+
path: df_dhdt_whillans_upstream.parquet

deepicedrain/features/subglacial_lakes.feature

Lines changed: 20 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@ Feature: Mapping Antarctic subglacial lakes
1111

1212
Examples:
1313
| location | this_many |
14-
| slessor_downstream | 15 |
14+
| whillans_downstream | 16 |
15+
| slessor_downstream | 31 |
1516

1617

1718
Scenario Outline: Subglacial Lake Animation
@@ -22,21 +23,21 @@ Feature: Mapping Antarctic subglacial lakes
2223

2324
Examples:
2425
| lake_name | location | cycles | azimuth | elevation |
25-
# | Mercer XV | whillans_downstream | 3-8 | 157.5 | 45 |
26-
# | Whillans 7 | whillans_upstream | 3-8 | 157.5 | 45 |
27-
| Whillans 6 | whillans_upstream | 3-8 | 157.5 | 45 |
28-
# | Whillans X | whillans_upstream | 3-8 | 157.5 | 45 |
29-
# | Whillans XI | whillans_upstream | 3-8 | 157.5 | 45 |
30-
# | Whillans IX | whillans_upstream | 3-8 | 157.5 | 45 |
31-
# | Lake 12 | whillans_downstream | 3-8 | 157.5 | 45 |
32-
# | Kamb 8 | whillans_upstream | 3-8 | 157.5 | 45 |
33-
# | Kamb 1 | whillans_upstream | 3-8 | 157.5 | 45 |
34-
| Kamb 34 | whillans_upstream | 4-7 | 157.5 | 45 |
26+
# | Whillans 7 | whillans_upstream | 3-9 | 157.5 | 45 |
27+
| Whillans 6 | whillans_upstream | 3-9 | 157.5 | 45 |
28+
# | Whillans IX | whillans_upstream | 3-9 | 157.5 | 45 |
29+
# | Whillans X | whillans_upstream | 3-9 | 157.5 | 45 |
30+
# | Whillans XI | whillans_downstream | 3-9 | 157.5 | 45 |
31+
# | Subglacial Lake Engelhardt | whillans_downstream | 3-9 | 157.5 | 45 |
32+
# | Lake 12 | whillans_downstream | 3-9 | 157.5 | 45 |
33+
# | Kamb 8 | whillans_upstream | 3-9 | 157.5 | 45 |
34+
# | Kamb 1 | whillans_upstream | 3-9 | 157.5 | 45 |
35+
| Kamb 34 | whillans_upstream | 4-9 | 157.5 | 45 |
3536
# | Kamb 12 | siple_coast | 3-8 | 157.5 | 45 |
3637
# | MacAyeal 1 | siple_coast | 3-8 | 157.5 | 60 |
37-
# | Slessor 45 | slessor_downstream | 3-8 | 202.5 | 60 |
38-
# | Slessor 23 | slessor_downstream | 3-8 | 202.5 | 60 |
39-
| Recovery IV | slessor_downstream | 3-8 | 247.5 | 45 |
38+
# | Slessor 45 | slessor_downstream | 3-9 | 202.5 | 60 |
39+
# | Slessor 23 | slessor_downstream | 3-9 | 202.5 | 60 |
40+
| Recovery IV | slessor_downstream | 3-9 | 247.5 | 45 |
4041

4142

4243
Scenario Outline: Subglacial Lake Mega-Cluster Animation
@@ -47,11 +48,11 @@ Feature: Mapping Antarctic subglacial lakes
4748

4849
Examples:
4950
| lake_name | location | cycles | azimuth | elevation |
50-
# | Lake 78 | whillans_downstream | 3-8 | 157.5 | 45 |
51-
# | Subglacial Lake Conway | whillans_downstream | 3-8 | 157.5 | 45 |
52-
| Subglacial Lake Mercer | whillans_downstream | 3-8 | 157.5 | 45 |
53-
# | Subglacial Lake Whillans | whillans_downstream | 3-8 | 157.5 | 45 |
54-
# | Recovery 2 | slessor_downstream | 3-8 | 202.5 | 45 |
51+
# | Lake 78 | whillans_downstream | 3-9 | 157.5 | 45 |
52+
# | Subglacial Lake Conway | whillans_downstream | 3-9 | 157.5 | 45 |
53+
| Subglacial Lake Mercer | whillans_downstream | 3-9 | 157.5 | 45 |
54+
# | Subglacial Lake Whillans | whillans_downstream | 3-9 | 157.5 | 45 |
55+
# | Recovery 2 | slessor_downstream | 3-9 | 202.5 | 45 |
5556

5657
Scenario Outline: Subglacial Lake Crossover Anomalies
5758
Given some altimetry data over <lake_name> at <location>

deepicedrain/tests/conftest.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -49,12 +49,11 @@ def lake_altimetry_data(lake_name: str, location: str, context) -> pd.DataFrame:
4949
file and subset it to a specific lake region.
5050
"""
5151
context.lake_name: str = lake_name
52-
# TODO use intake_parquet after https://github.com/intake/intake-parquet/issues/18
53-
with fsspec.open(
54-
f"simplecache::https://github.com/weiji14/deepicedrain/releases/download/v0.4.0/df_dhdt_{location}.parquet",
55-
simplecache=dict(cache_storage="ATLXI", same_names=True),
56-
) as openfile:
57-
dataframe: pd.DataFrame = pd.read_parquet(openfile)
52+
# Data files are version controlled using DVC and stored on
53+
# https://dagshub.com/weiji14/deepicedrain/src/main/ATLXI
54+
# They will also be uploaded as assets every release at e.g.
55+
# https://github.com/weiji14/deepicedrain/releases
56+
dataframe: pd.DataFrame = pd.read_parquet(path=f"ATLXI/df_dhdt_{location}.parquet")
5857

5958
# Get lake outline from intake catalog
6059
lake_catalog = deepicedrain.catalog.subglacial_lakes()
@@ -63,6 +62,7 @@ def lake_altimetry_data(lake_name: str, location: str, context) -> pd.DataFrame:
6362
.query("lakename == @lake_name")[["ids", "transect"]]
6463
.iloc[0]
6564
)
65+
context.transect_id: str = transect_id
6666
context.lake: pd.Series = (
6767
lake_catalog.read()
6868
.loc[lake_ids]

deepicedrain/tests/test_subglacial_lake_finder.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -26,20 +26,20 @@ def basin_altimetry_data(location):
2626
Load up some pre-processed ICESat-2 ATL11 altimetry data with x, y,
2727
dhdt_slope and referencegroundtrack columns from a Parquet file.
2828
"""
29-
# TODO use intake_parquet after https://github.com/intake/intake-parquet/issues/18
30-
with fsspec.open(
31-
f"simplecache::https://github.com/weiji14/deepicedrain/releases/download/v0.4.0/df_dhdt_{location}.parquet",
32-
simplecache=dict(cache_storage="ATLXI", same_names=True),
33-
) as openfile:
34-
_dataframe: xpd.DataFrame = xpd.read_parquet(
35-
openfile, columns=["x", "y", "dhdt_slope", "referencegroundtrack"]
36-
)
37-
# Take only 1/4 of the data for speed
38-
_dataframe: xpd.DataFrame = _dataframe.loc[: len(_dataframe) / 4]
29+
# Data files are version controlled using DVC and stored on
30+
# https://dagshub.com/weiji14/deepicedrain/src/main/ATLXI
31+
# They will also be uploaded as assets every release at e.g.
32+
# https://github.com/weiji14/deepicedrain/releases
33+
_dataframe: xpd.DataFrame = xpd.read_parquet(
34+
f"ATLXI/df_dhdt_{location}.parquet",
35+
columns=["x", "y", "dhdt_slope", "referencegroundtrack"],
36+
)
37+
# Take only 1/4 of the data for speed
38+
_dataframe: xpd.DataFrame = _dataframe.loc[: len(_dataframe) / 4]
3939

40-
# Filter to points > 2 * Median(dhdt)
40+
# Filter to points > 3 * Median(dhdt)
4141
abs_dhdt: xpd.Series = _dataframe.dhdt_slope.abs()
42-
dataframe: xpd.DataFrame = _dataframe.loc[abs_dhdt > 2 * abs_dhdt.median()]
42+
dataframe: xpd.DataFrame = _dataframe.loc[abs_dhdt > 3 * abs_dhdt.median()]
4343

4444
return dataframe
4545

0 commit comments

Comments
 (0)