Skip to content

Commit 693bf0b

Browse files
Add template tutorial including custom template registry. (TGSAI#726)
* update dimension handling and variable naming in templates * standardize variable naming in template_registry docs * Remove PyCharm metadata from `quickstart.ipynb` for cleanup * add tutorial on creating and registering custom templates in MDIO * add primary foreground color styling to HTML table elements * update custom template tutorial with revised class name, dimensions, coords, and dataset attributes * simplify logical coordinate names in custom template tutorial * refine dataset description in custom template tutorial * update metadata model_dump mode to JSON in xarray_builder * update metadata model_dump mode to JSON in xarray_builder * refine and enhance explanations in custom template tutorial for improved clarity and detail * refine custom template tutorial: update dataset modeling details and fix minor typo * clean up custom template tutorial: adjust markdown formatting and remove obsolete metadata --------- Co-authored-by: Altay Sansal <[email protected]> Co-authored-by: Brian Michell <[email protected]>
1 parent 482650b commit 693bf0b

File tree

7 files changed

+349
-574
lines changed

7 files changed

+349
-574
lines changed

docs/template_registry.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,10 @@ print(list_templates())
2424
# e.g. ["Seismic2DPostStackTime", "Seismic3DPostStackDepth", ...]
2525

2626
# Grab a template by name
27-
tpl = get_template("Seismic3DPostStackTime")
27+
template = get_template("Seismic3DPostStackTime")
2828

2929
# Customize your copy (safe)
30-
tpl.add_units({"amplitude": "unitless"})
30+
template.add_units({"amplitude": "unitless"})
3131
```
3232

3333
## Common tasks
@@ -37,8 +37,8 @@ tpl.add_units({"amplitude": "unitless"})
3737
```python
3838
from mdio.builder.template_registry import get_template
3939

40-
tpl = get_template("Seismic2DPostStackDepth")
41-
# Use/modify tpl freely — it’s your copy
40+
template = get_template("Seismic2DPostStackDepth")
41+
# Use/modify template freely — it’s your copy
4242
```
4343

4444
### List available templates
Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "85114119ae7a4db0",
6+
"metadata": {},
7+
"source": [
8+
"# Create and Register a Custom Template\n",
9+
"\n",
10+
"```{article-info}\n",
11+
":author: Altay Sansal\n",
12+
":date: \"{sub-ref}`today`\"\n",
13+
":read-time: \"{sub-ref}`wordcount-minutes` min read\"\n",
14+
":class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light\n",
15+
"```\n",
16+
"\n",
17+
"```{warning}\n",
18+
"Most SEG-Y files correspond to standard seismic data types or field configurations. We recommend using\n",
19+
"the built-in templates from the registry whenever possible. Create a custom template only when your file\n",
20+
"is unusual and cannot be represented by existing templates. In many cases, you can simply customize the\n",
21+
"SEG-Y header byte mapping during ingestion without defining a new template.\n",
22+
"```\n",
23+
"\n",
24+
"In this tutorial we will walk through the Template Registry and show how to:\n",
25+
"\n",
26+
"- Discover available templates in the registry\n",
27+
"- Define and register your own template\n",
28+
"- Build a dataset model and convert it to an Xarray Dataset using your custom template\n",
29+
"\n",
30+
"If this is your first time with MDIO, you may want to skim the Quickstart first."
31+
]
32+
},
33+
{
34+
"cell_type": "markdown",
35+
"id": "a793f2cfb58f09cc",
36+
"metadata": {},
37+
"source": [
38+
"## What is a Template and a Template Registry?\n",
39+
"\n",
40+
"A template defines how an MDIO dataset is structured: names of dimensions and coordinates, the default variable name, chunking hints, and attributes to be stored. Since many seismic datasets share common structures (e.g., 3D post-stack, 2D post-stack, pre-stack CDP/shot, etc.), MDIO ships with a pre-populated template registry and APIs to fetch or register templates.\n",
41+
"\n",
42+
"Fetching a template from it returns a copied instance you can freely customize without affecting others."
43+
]
44+
},
45+
{
46+
"cell_type": "code",
47+
"execution_count": null,
48+
"id": "c7a760a019930d4e",
49+
"metadata": {},
50+
"outputs": [],
51+
"source": [
52+
"from mdio.builder.template_registry import get_template\n",
53+
"from mdio.builder.template_registry import get_template_registry\n",
54+
"from mdio.builder.template_registry import list_templates\n",
55+
"\n",
56+
"registry = get_template_registry()\n",
57+
"registry # pretty HTML in notebooks"
58+
]
59+
},
60+
{
61+
"cell_type": "markdown",
62+
"id": "810dbba2b6dba787",
63+
"metadata": {},
64+
"source": [
65+
"We can list all registered templates and get a list as well."
66+
]
67+
},
68+
{
69+
"cell_type": "code",
70+
"execution_count": null,
71+
"id": "38eb1da635c7be0f",
72+
"metadata": {},
73+
"outputs": [],
74+
"source": [
75+
"list_templates()"
76+
]
77+
},
78+
{
79+
"cell_type": "markdown",
80+
"id": "d87bd9ec781a8a8e",
81+
"metadata": {},
82+
"source": [
83+
"## Defining a Minimal Custom Template\n",
84+
"\n",
85+
"To define a custom template, subclass `AbstractDatasetTemplate` and set:\n",
86+
"\n",
87+
"- `_name`: a public name for the template\n",
88+
"- `_dim_names`: names for each axis of your data variable (the last axis is the trace/time or trace/depth axis)\n",
89+
"- `_physical_coord_names` and `_logical_coord_names`: optional additional coordinate variables to store along the spatial grid\n",
90+
"- `_load_dataset_attributes()`: optional attributes stored at the dataset level\n",
91+
"\n",
92+
"Below we create a special template that can hold interval velocity field with multiple anisotropy parameters for a depth seismic volume.\n",
93+
"\n",
94+
"The dimensions, dimension-coordinates and non-dimension coordinates will automatically get created using the method\n",
95+
"from the base class. However, since we want more variables, we override `_add_variables` to add them."
96+
]
97+
},
98+
{
99+
"cell_type": "code",
100+
"execution_count": null,
101+
"id": "cfc9d9b0e1b67a76",
102+
"metadata": {},
103+
"outputs": [],
104+
"source": [
105+
"from mdio.builder.schemas import compressors\n",
106+
"from mdio.builder.schemas.chunk_grid import RegularChunkGrid\n",
107+
"from mdio.builder.schemas.chunk_grid import RegularChunkShape\n",
108+
"from mdio.builder.schemas.dtype import ScalarType\n",
109+
"from mdio.builder.schemas.v1.variable import VariableMetadata\n",
110+
"from mdio.builder.templates.base import AbstractDatasetTemplate\n",
111+
"\n",
112+
"\n",
113+
"class AnisotropicVelocityTemplate(AbstractDatasetTemplate):\n",
114+
" \"\"\"A custom template that has unusual dimensions and coordinates.\"\"\"\n",
115+
"\n",
116+
" def __init__(self, data_domain: str = \"depth\") -> None:\n",
117+
" super().__init__(data_domain)\n",
118+
" # Dimension order matters; the last dimension is the depth\n",
119+
" self._dim_names = (\"inline\", \"crossline\", self.trace_domain)\n",
120+
" # Additional coordinates: these are added on top of dimension coordinates\n",
121+
" self._physical_coord_names = (\"cdp_x\", \"cdp_y\")\n",
122+
" self._var_chunk_shape = (128, 128, 128)\n",
123+
" self._units = {}\n",
124+
"\n",
125+
" @property\n",
126+
" def _name(self) -> str: # public name for the registry\n",
127+
" return \"AnisotropicVelocity3DDepth\"\n",
128+
"\n",
129+
" @property\n",
130+
" def _default_variable_name(self) -> str: # public name for the registry\n",
131+
" return \"velocity\"\n",
132+
"\n",
133+
" def _load_dataset_attributes(self) -> dict:\n",
134+
" return {\"surveyType\": \"3D\", \"gatherType\": \"line\"}\n",
135+
"\n",
136+
" def _add_variables(self) -> None:\n",
137+
" \"\"\"Add the variables including default and extra.\"\"\"\n",
138+
" for name in [\"velocity\", \"epsilon\", \"delta\"]:\n",
139+
" chunk_grid = RegularChunkGrid(configuration=RegularChunkShape(chunk_shape=self.full_chunk_shape))\n",
140+
" unit = self.get_unit_by_key(name)\n",
141+
" self._builder.add_variable(\n",
142+
" name=name,\n",
143+
" dimensions=self._dim_names,\n",
144+
" data_type=ScalarType.FLOAT32,\n",
145+
" compressor=compressors.Blosc(cname=compressors.BloscCname.zstd),\n",
146+
" coordinates=self.physical_coordinate_names,\n",
147+
" metadata=VariableMetadata(chunk_grid=chunk_grid, units_v1=unit),\n",
148+
" )\n",
149+
"\n",
150+
"\n",
151+
"AnisotropicVelocityTemplate()"
152+
]
153+
},
154+
{
155+
"cell_type": "markdown",
156+
"id": "15e61310ed0ffd97",
157+
"metadata": {},
158+
"source": [
159+
"## Registering the Custom Template\n",
160+
"\n",
161+
"The registry returns a deep copy of the template on every fetch. To make the template discoverable by name, register it first, then retrieve it with `get_template`."
162+
]
163+
},
164+
{
165+
"cell_type": "code",
166+
"execution_count": null,
167+
"id": "a4e1847b20da6768",
168+
"metadata": {},
169+
"outputs": [],
170+
"source": [
171+
"from mdio.builder.template_registry import register_template\n",
172+
"\n",
173+
"register_template(AnisotropicVelocityTemplate())\n",
174+
"print(\"Registered:\", \"AnisotropicVelocity3DDepth\" in list_templates())\n",
175+
"\n",
176+
"custom_template = get_template(\"AnisotropicVelocity3DDepth\")\n",
177+
"custom_template"
178+
]
179+
},
180+
{
181+
"cell_type": "markdown",
182+
"id": "83b0772f1913c652",
183+
"metadata": {},
184+
"source": [
185+
"You can also set units at any time. For this demo we’ll set metric units. The spatial units will be inferred from the SEG-Y binary header during ingestion, but we can override them here. Ingestion will honor what is in the template."
186+
]
187+
},
188+
{
189+
"cell_type": "code",
190+
"execution_count": null,
191+
"id": "d7dca50d72d2f93",
192+
"metadata": {},
193+
"outputs": [],
194+
"source": [
195+
"from mdio.builder.schemas.v1.units import LengthUnitModel\n",
196+
"from mdio.builder.schemas.v1.units import SpeedUnitModel\n",
197+
"\n",
198+
"custom_template.add_units(\n",
199+
" {\n",
200+
" \"depth\": LengthUnitModel(length=\"m\"),\n",
201+
" \"cdp_x\": LengthUnitModel(length=\"m\"),\n",
202+
" \"cdp_y\": LengthUnitModel(length=\"m\"),\n",
203+
" \"velocity\": SpeedUnitModel(speed=\"m/s\"),\n",
204+
" }\n",
205+
")\n",
206+
"custom_template"
207+
]
208+
},
209+
{
210+
"cell_type": "markdown",
211+
"id": "367ade9824e72bc3",
212+
"metadata": {},
213+
"source": [
214+
"## Changing chunk size (chunks) on an existing template\n",
215+
"\n",
216+
"Often you will want to tweak the chunking strategy for performance. You can do this in two ways:\n",
217+
"\n",
218+
"- When defining a subclass, set a default in the constructor (e.g., `self._var_chunk_shape = (...)`).\n",
219+
"- On an existing template instance, assign to the `full_chunk_shape` property once you know your final\n",
220+
" dataset sizes (the tuple length must match the number of data dimensions).\n",
221+
"\n",
222+
"Below is a tiny demo showing how to modify the chunk shape on a fetched template. We first build the\n",
223+
"template with known sizes to satisfy validation, then update `full_chunk_shape`.\n",
224+
"\n",
225+
"```{note}\n",
226+
"In the SEG-Y to MDIO conversion workflow, MDIO infers the final grid shape from the SEG-Y headers. It’s\n",
227+
"common to set or adjust `full_chunk_shape` right before calling `segy_to_mdio`, using the same sizes\n",
228+
"you expect for the final array.\n",
229+
"```"
230+
]
231+
},
232+
{
233+
"cell_type": "code",
234+
"execution_count": null,
235+
"id": "75939231b58c204a",
236+
"metadata": {},
237+
"outputs": [],
238+
"source": [
239+
"mdio_ds = custom_template.build_dataset(name=\"demo-only\", sizes=(300, 500, 1001))\n",
240+
"# pick smaller chunks than the full array for better parallelism and IO\n",
241+
"custom_template.full_chunk_shape = (64, 64, 64)\n",
242+
"print(\"Chunk shape set to:\", custom_template.full_chunk_shape)\n",
243+
"\n",
244+
"custom_template"
245+
]
246+
},
247+
{
248+
"cell_type": "markdown",
249+
"id": "a76f17cdf235de13",
250+
"metadata": {},
251+
"source": [
252+
"## Making Dummy Xarray Dataset\n",
253+
"\n",
254+
"We can now take the MDIO Dataset model and convert it to Xarray with our configuration. If ingesting from SEG-Y, this step\n",
255+
"gets executed automatically by the converter before populating the data.\n",
256+
"\n",
257+
"Note that the whole dataset will be populated with the fill values."
258+
]
259+
},
260+
{
261+
"cell_type": "code",
262+
"execution_count": null,
263+
"id": "ce3dcf9c7946ea07",
264+
"metadata": {},
265+
"outputs": [],
266+
"source": [
267+
"from mdio.builder.xarray_builder import to_xarray_dataset\n",
268+
"\n",
269+
"to_xarray_dataset(mdio_ds)"
270+
]
271+
},
272+
{
273+
"cell_type": "markdown",
274+
"id": "fc05aa3c81f8465c",
275+
"metadata": {},
276+
"source": [
277+
"## Recap: Key APIs Used\n",
278+
"\n",
279+
"- Template registry helpers: `get_template_registry`, `list_templates`, `register_template`, `get_template`\n",
280+
"- Base template to subclass: `AbstractDatasetTemplate`\n",
281+
"- Make Xarray Dataset from MDIO Data Model: `to_xarray_dataset`\n",
282+
"\n",
283+
"With these pieces, you can standardize how your seismic data is represented in MDIO and keep ingestion code concise and repeatable.\n"
284+
]
285+
},
286+
{
287+
"cell_type": "code",
288+
"execution_count": null,
289+
"id": "a15848ab5c0811d6",
290+
"metadata": {},
291+
"outputs": [],
292+
"source": []
293+
}
294+
],
295+
"metadata": {
296+
"mystnb": {
297+
"execution_mode": "force"
298+
}
299+
},
300+
"nbformat": 4,
301+
"nbformat_minor": 5
302+
}

docs/tutorials/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,5 @@ creation
1313
compression
1414
rechunking
1515
corrupt_files
16+
custom_template
1617
```

0 commit comments

Comments
 (0)