Skip to content

Commit 31625a6

Browse files
committed
example notebook to publish to earthCODE
1 parent 89fc79d commit 31625a6

File tree

1 file changed

+377
-0
lines changed

1 file changed

+377
-0
lines changed
Lines changed: 377 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,377 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "1a10328e-1686-4ba4-9c16-4e47ebeb023e",
6+
"metadata": {
7+
"tags": []
8+
},
9+
"source": [
10+
"## A demo notebook to publish datacubes and workflow to EarthCODE catalog\n",
11+
"### A DeepESDL example notebook\n",
12+
"\n",
13+
"Please, also refer to the [DeepESDL documentation](https://deepesdl.readthedocs.io/en/latest/guide/jupyterlab/) and visit the platform's [website](https://www.earthsystemdatalab.net/) for further information!\n",
14+
"\n",
15+
"Brockmann Consult, 2025\n",
16+
"\n",
17+
"-----------------\n",
18+
"\n",
19+
"**This notebook runs with the python environment `users-deep-code-test`, please checkout the documentation for [help on changing the environment](https://deepesdl.readthedocs.io/en/latest/guide/jupyterlab/#python-environment-selection-of-the-jupyter-kerne).**"
20+
]
21+
},
22+
{
23+
"cell_type": "markdown",
24+
"id": "2f7e3da7-1c7f-4aaf-87ec-409dccb339aa",
25+
"metadata": {},
26+
"source": [
27+
"### 📘 Pre-requisite:\n",
28+
"Before using the deep-code CLI or API to publish metadata, users must configure GitHub access by creating a .gitaccess file in the working directory from which deep-code is executed.\n",
29+
"\n",
30+
"1. Generate a Personal Access Token (PAT) from your GitHUB account:\n",
31+
" 1. Navigate to GitHub → Settings → Developer settings → Personal access tokens.\n",
32+
" 2. Click “Generate new token”.\n",
33+
" 3. Choose the following scopes to ensure full access:\n",
34+
" - repo (Full control of repositories — includes fork, pull, push, and read)\n",
35+
" 4. Generate the token and copy it immediately — GitHub won’t show it again.\n",
36+
"\n",
37+
"2. Create a .gitaccess File\n",
38+
"\n",
39+
"In the same directory where you run the deep-code commands, create a file named .gitaccess with the following content:\n",
40+
"```\n",
41+
"github-username: your-git-user\n",
42+
"github-token: personal access token\n",
43+
"```\n",
44+
"Replace your-git-user and your-personal-access-token with your actual GitHub username and token.\n",
45+
"\n",
46+
"This file is required to allow deep-code to fork the Open Science Metadata repository, commit metadata changes, and open a pull request to the EarthCODE Catalog."
47+
]
48+
},
49+
{
50+
"cell_type": "code",
51+
"execution_count": null,
52+
"id": "dc774651-bef8-4483-9a83-7fcacdc88797",
53+
"metadata": {
54+
"tags": []
55+
},
56+
"outputs": [],
57+
"source": [
58+
"import os\n",
59+
"import xcube\n",
60+
"import warnings\n",
61+
"import deep_code\n",
62+
"\n",
63+
"from xcube.webapi.viewer import Viewer\n",
64+
"from xcube.core.store import new_data_store\n",
65+
"from deep_code.tools.lint import LintDataset\n",
66+
"from deep_code.tools.publish import Publisher"
67+
]
68+
},
69+
{
70+
"cell_type": "code",
71+
"execution_count": null,
72+
"id": "626af94e-9397-4e1f-83e8-9fb1de5d9e0b",
73+
"metadata": {
74+
"tags": []
75+
},
76+
"outputs": [],
77+
"source": [
78+
"warnings.filterwarnings('ignore')"
79+
]
80+
},
81+
{
82+
"cell_type": "markdown",
83+
"id": "deb30310-feb2-4422-8f0a-27ca1dca4a9f",
84+
"metadata": {},
85+
"source": [
86+
"## Generate starter configuration templates for publishing to EarthCODE openscience catalog."
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": null,
92+
"id": "bcd6d1aa-f1d6-4008-94ee-820c2553db37",
93+
"metadata": {
94+
"tags": []
95+
},
96+
"outputs": [],
97+
"source": [
98+
"!deep-code generate-config"
99+
]
100+
},
101+
{
102+
"cell_type": "markdown",
103+
"id": "da3294a9-f7e7-40ed-8080-f32296239c25",
104+
"metadata": {},
105+
"source": [
106+
"## Here we create a small dataset from xcube-cmems store"
107+
]
108+
},
109+
{
110+
"cell_type": "code",
111+
"execution_count": null,
112+
"id": "dcc6255f-b680-419c-acf1-38968c104790",
113+
"metadata": {
114+
"tags": []
115+
},
116+
"outputs": [],
117+
"source": [
118+
"store = new_data_store(\"cmems\")\n",
119+
"store"
120+
]
121+
},
122+
{
123+
"cell_type": "code",
124+
"execution_count": null,
125+
"id": "8003fe01-2186-4a60-9ec6-b6ce51d2a185",
126+
"metadata": {
127+
"tags": []
128+
},
129+
"outputs": [],
130+
"source": [
131+
"ds = store.open_data(\n",
132+
" \"DMI-BALTIC-SST-L3S-NRT-OBS_FULL_TIME_SERIE\",\n",
133+
" variable_names=[\"sea_surface_temperature\"],\n",
134+
" bbox=[9, 53, 20, 62],\n",
135+
" time_range=(\"2022-01-01\", \"2022-01-05\"),\n",
136+
")\n",
137+
"ds"
138+
]
139+
},
140+
{
141+
"cell_type": "markdown",
142+
"id": "b8b699a2-e4e4-4616-be82-d9b5381a6b08",
143+
"metadata": {
144+
"tags": []
145+
},
146+
"source": [
147+
"## Lint your in-memory dataset for metadata correctness and completness, before publishing to EarthCODE open science catalog"
148+
]
149+
},
150+
{
151+
"cell_type": "code",
152+
"execution_count": null,
153+
"id": "16598aff-6967-4690-85b9-efb672369cf1",
154+
"metadata": {
155+
"tags": []
156+
},
157+
"outputs": [],
158+
"source": [
159+
"linter = LintDataset(dataset=ds)\n",
160+
"linter.lint_dataset()"
161+
]
162+
},
163+
{
164+
"cell_type": "markdown",
165+
"id": "e631c4c0-14bb-4602-ae41-cc93c8cf835e",
166+
"metadata": {},
167+
"source": [
168+
"## Fix the errors from the linter"
169+
]
170+
},
171+
{
172+
"cell_type": "markdown",
173+
"id": "836122b8-f1c7-4690-8b8e-b463859765ef",
174+
"metadata": {
175+
"tags": []
176+
},
177+
"source": [
178+
"Adding gcmd_keyword_url connects your data to a semantic network of Earth science concepts, enabling:\n",
179+
"\n",
180+
"- Better automated discovery\n",
181+
"\n",
182+
"- Stronger metadata interoperability\n",
183+
"\n",
184+
"- Alignment with international FAIR standards\n",
185+
"\n",
186+
"To find the the gcmd url for your variable, please use, https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/all?gtm_scheme=all"
187+
]
188+
},
189+
{
190+
"cell_type": "code",
191+
"execution_count": null,
192+
"id": "f7b34a8b-447b-4be5-b0f2-0c70cc9352e3",
193+
"metadata": {
194+
"tags": []
195+
},
196+
"outputs": [],
197+
"source": [
198+
"ds.attrs[\"description\"] = (\n",
199+
" \"This is a extracted dataset from copernicus marine data store\" \n",
200+
")\n",
201+
"\n",
202+
"ds[\"sea_surface_temperature\"].attrs[\"gcmd_keyword_url\"] = \"https://gcmd.earthdata.nasa.gov/KeywordViewer/scheme/all/e4d58a7f-7eaa-4f75-996a-18238c698063?gtm_keyword=SEA%20SURFACE%20FOUNDATION%20TEMPERATURE&gtm_scheme=Earth%20Science\""
203+
]
204+
},
205+
{
206+
"cell_type": "markdown",
207+
"id": "8993c3c2-1a08-4ec8-a462-f5d7d1eac9e4",
208+
"metadata": {},
209+
"source": [
210+
"## Write the dataset to the team s3 bucket"
211+
]
212+
},
213+
{
214+
"cell_type": "code",
215+
"execution_count": null,
216+
"id": "2504fc83-416e-469e-b98f-9612061283fa",
217+
"metadata": {
218+
"tags": []
219+
},
220+
"outputs": [],
221+
"source": [
222+
"S3_USER_STORAGE_KEY = os.environ[\"S3_USER_STORAGE_KEY\"]\n",
223+
"S3_USER_STORAGE_SECRET = os.environ[\"S3_USER_STORAGE_SECRET\"]\n",
224+
"S3_USER_STORAGE_BUCKET = os.environ[\"S3_USER_STORAGE_BUCKET\"]"
225+
]
226+
},
227+
{
228+
"cell_type": "code",
229+
"execution_count": null,
230+
"id": "06a2a119-9158-46eb-8aab-8618d9293023",
231+
"metadata": {
232+
"tags": []
233+
},
234+
"outputs": [],
235+
"source": [
236+
"team_store = new_data_store(\n",
237+
" \"s3\", \n",
238+
" root=S3_USER_STORAGE_BUCKET, \n",
239+
" storage_options=dict(\n",
240+
" anon=False, \n",
241+
" key=S3_USER_STORAGE_KEY, \n",
242+
" secret=S3_USER_STORAGE_SECRET\n",
243+
" )\n",
244+
")"
245+
]
246+
},
247+
{
248+
"cell_type": "code",
249+
"execution_count": null,
250+
"id": "f1da860d-6e03-42fc-a098-5f8e95d70399",
251+
"metadata": {
252+
"tags": []
253+
},
254+
"outputs": [],
255+
"source": [
256+
"team_store.write_data(ds, \"cmems_sst_v2.zarr\", replace=True)"
257+
]
258+
},
259+
{
260+
"cell_type": "markdown",
261+
"id": "ba56daec-42f3-41b9-b2a0-8510cb0e73f5",
262+
"metadata": {},
263+
"source": [
264+
"The user workflow which is the JNB has to be pushed to git repository: https://github.com/deepesdl/cube-gen/blob/main/Permafrost/Create-CCI-Permafrost-cube-EarthCODE.ipynb"
265+
]
266+
},
267+
{
268+
"cell_type": "markdown",
269+
"id": "628dac5f-84ca-4205-9947-449a7e409be7",
270+
"metadata": {
271+
"tags": []
272+
},
273+
"source": [
274+
"# 📘 Publishing Metadata to the EarthCODE Catalogue"
275+
]
276+
},
277+
{
278+
"cell_type": "markdown",
279+
"id": "b7aa7bc9-83d5-4e43-9f7f-a4da18e8cacd",
280+
"metadata": {},
281+
"source": [
282+
"Once the dataset and workflow metadata are prepared and validated, users can initiate the publishing process using the deep-code CLI. The following command automates the entire workflow:"
283+
]
284+
},
285+
{
286+
"cell_type": "markdown",
287+
"id": "41fcdb38-3e17-4042-b4e5-25a1be46b581",
288+
"metadata": {},
289+
"source": [
290+
"## 🔹 The below command performs the following steps:\n",
291+
"\n",
292+
"1. Generates valid STAC and OGC API Records based on the provided configuration files\n",
293+
"\n",
294+
"2. Forks the open-science-catalog-metadata repository on GitHub\n",
295+
"\n",
296+
"3. Inserts the generated records into the correct directory structure\n",
297+
"\n",
298+
"4. Creates a Pull Request (PR) for review by the Open Science Catalog steward"
299+
]
300+
},
301+
{
302+
"cell_type": "markdown",
303+
"id": "e763ef5b-c4c2-4991-a612-7f2bff9a676d",
304+
"metadata": {},
305+
"source": [
306+
"## publish using the python function"
307+
]
308+
},
309+
{
310+
"cell_type": "code",
311+
"execution_count": null,
312+
"id": "c7e3a5a8-d74d-4bd4-9b76-45ace4b2d341",
313+
"metadata": {
314+
"tags": []
315+
},
316+
"outputs": [],
317+
"source": [
318+
"# publish using the python function\n",
319+
"publisher = Publisher(\n",
320+
" dataset_config_path=\"dataset-config.yaml\",\n",
321+
" workflow_config_path=\"workflow-config.yaml\",\n",
322+
" environment=\"staging\",\n",
323+
")\n",
324+
"publisher.publish_all()"
325+
]
326+
},
327+
{
328+
"cell_type": "markdown",
329+
"id": "09bcacf5-f435-4af3-8984-5ddf421504b8",
330+
"metadata": {},
331+
"source": [
332+
"## publish using cli"
333+
]
334+
},
335+
{
336+
"cell_type": "code",
337+
"execution_count": null,
338+
"id": "1aa9e003-661c-40ba-b2ad-72d1eee48a83",
339+
"metadata": {
340+
"tags": []
341+
},
342+
"outputs": [],
343+
"source": [
344+
"!deep-code publish dataset-config.yaml workflow-config.yaml -e staging"
345+
]
346+
},
347+
{
348+
"cell_type": "code",
349+
"execution_count": null,
350+
"id": "406f5d35-03eb-4a6c-ba99-f70ee1101038",
351+
"metadata": {},
352+
"outputs": [],
353+
"source": []
354+
}
355+
],
356+
"metadata": {
357+
"kernelspec": {
358+
"display_name": "users-deepesdl-deep-code",
359+
"language": "python",
360+
"name": "conda-env-users-deepesdl-deep-code-py"
361+
},
362+
"language_info": {
363+
"codemirror_mode": {
364+
"name": "ipython",
365+
"version": 3
366+
},
367+
"file_extension": ".py",
368+
"mimetype": "text/x-python",
369+
"name": "python",
370+
"nbconvert_exporter": "python",
371+
"pygments_lexer": "ipython3",
372+
"version": "3.11.13"
373+
}
374+
},
375+
"nbformat": 4,
376+
"nbformat_minor": 5
377+
}

0 commit comments

Comments
 (0)