|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "85114119ae7a4db0", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Create and Register a Custom Template\n", |
| 9 | + "\n", |
| 10 | + "```{article-info}\n", |
| 11 | + ":author: Altay Sansal\n", |
| 12 | + ":date: \"{sub-ref}`today`\"\n", |
| 13 | + ":read-time: \"{sub-ref}`wordcount-minutes` min read\"\n", |
| 14 | + ":class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light\n", |
| 15 | + "```\n", |
| 16 | + "\n", |
| 17 | + "```{warning}\n", |
| 18 | + "Most SEG-Y files correspond to standard seismic data types or field configurations. We recommend using\n", |
| 19 | + "the built-in templates from the registry whenever possible. Create a custom template only when your file\n", |
| 20 | + "is unusual and cannot be represented by existing templates. In many cases, you can simply customize the\n", |
| 21 | + "SEG-Y header byte mapping during ingestion without defining a new template.\n", |
| 22 | + "```\n", |
| 23 | + "\n", |
| 24 | + "In this tutorial we will walk through the Template Registry and show how to:\n", |
| 25 | + "\n", |
| 26 | + "- Discover available templates in the registry\n", |
| 27 | + "- Define and register your own template\n", |
| 28 | + "- Build a dataset model and convert it to an Xarray Dataset using your custom template\n", |
| 29 | + "\n", |
| 30 | + "If this is your first time with MDIO, you may want to skim the Quickstart first." |
| 31 | + ] |
| 32 | + }, |
| 33 | + { |
| 34 | + "cell_type": "markdown", |
| 35 | + "id": "a793f2cfb58f09cc", |
| 36 | + "metadata": {}, |
| 37 | + "source": [ |
| 38 | + "## What is a Template and a Template Registry?\n", |
| 39 | + "\n", |
| 40 | + "A template defines how an MDIO dataset is structured: names of dimensions and coordinates, the default variable name, chunking hints, and attributes to be stored. Since many seismic datasets share common structures (e.g., 3D post-stack, 2D post-stack, pre-stack CDP/shot, etc.), MDIO ships with a pre-populated template registry and APIs to fetch or register templates.\n", |
| 41 | + "\n", |
| 42 | + "Fetching a template from it returns a copied instance you can freely customize without affecting others." |
| 43 | + ] |
| 44 | + }, |
| 45 | + { |
| 46 | + "cell_type": "code", |
| 47 | + "execution_count": null, |
| 48 | + "id": "c7a760a019930d4e", |
| 49 | + "metadata": {}, |
| 50 | + "outputs": [], |
| 51 | + "source": [ |
| 52 | + "from mdio.builder.template_registry import get_template\n", |
| 53 | + "from mdio.builder.template_registry import get_template_registry\n", |
| 54 | + "from mdio.builder.template_registry import list_templates\n", |
| 55 | + "\n", |
| 56 | + "registry = get_template_registry()\n", |
| 57 | + "registry # pretty HTML in notebooks" |
| 58 | + ] |
| 59 | + }, |
| 60 | + { |
| 61 | + "cell_type": "markdown", |
| 62 | + "id": "810dbba2b6dba787", |
| 63 | + "metadata": {}, |
| 64 | + "source": [ |
| 65 | + "We can list all registered templates and get a list as well." |
| 66 | + ] |
| 67 | + }, |
| 68 | + { |
| 69 | + "cell_type": "code", |
| 70 | + "execution_count": null, |
| 71 | + "id": "38eb1da635c7be0f", |
| 72 | + "metadata": {}, |
| 73 | + "outputs": [], |
| 74 | + "source": [ |
| 75 | + "list_templates()" |
| 76 | + ] |
| 77 | + }, |
| 78 | + { |
| 79 | + "cell_type": "markdown", |
| 80 | + "id": "d87bd9ec781a8a8e", |
| 81 | + "metadata": {}, |
| 82 | + "source": [ |
| 83 | + "## Defining a Minimal Custom Template\n", |
| 84 | + "\n", |
| 85 | + "To define a custom template, subclass `AbstractDatasetTemplate` and set:\n", |
| 86 | + "\n", |
| 87 | + "- `_name`: a public name for the template\n", |
| 88 | + "- `_dim_names`: names for each axis of your data variable (the last axis is the trace/time or trace/depth axis)\n", |
| 89 | + "- `_physical_coord_names` and `_logical_coord_names`: optional additional coordinate variables to store along the spatial grid\n", |
| 90 | + "- `_load_dataset_attributes()`: optional attributes stored at the dataset level\n", |
| 91 | + "\n", |
| 92 | + "Below we create a special template that can hold interval velocity field with multiple anisotropy parameters for a depth seismic volume.\n", |
| 93 | + "\n", |
| 94 | + "The dimensions, dimension-coordinates and non-dimension coordinates will automatically get created using the method\n", |
| 95 | + "from the base class. However, since we want more variables, we override `_add_variables` to add them." |
| 96 | + ] |
| 97 | + }, |
| 98 | + { |
| 99 | + "cell_type": "code", |
| 100 | + "execution_count": null, |
| 101 | + "id": "cfc9d9b0e1b67a76", |
| 102 | + "metadata": {}, |
| 103 | + "outputs": [], |
| 104 | + "source": [ |
| 105 | + "from mdio.builder.schemas import compressors\n", |
| 106 | + "from mdio.builder.schemas.chunk_grid import RegularChunkGrid\n", |
| 107 | + "from mdio.builder.schemas.chunk_grid import RegularChunkShape\n", |
| 108 | + "from mdio.builder.schemas.dtype import ScalarType\n", |
| 109 | + "from mdio.builder.schemas.v1.variable import VariableMetadata\n", |
| 110 | + "from mdio.builder.templates.base import AbstractDatasetTemplate\n", |
| 111 | + "\n", |
| 112 | + "\n", |
| 113 | + "class AnisotropicVelocityTemplate(AbstractDatasetTemplate):\n", |
| 114 | + " \"\"\"A custom template that has unusual dimensions and coordinates.\"\"\"\n", |
| 115 | + "\n", |
| 116 | + " def __init__(self, data_domain: str = \"depth\") -> None:\n", |
| 117 | + " super().__init__(data_domain)\n", |
| 118 | + " # Dimension order matters; the last dimension is the depth\n", |
| 119 | + " self._dim_names = (\"inline\", \"crossline\", self.trace_domain)\n", |
| 120 | + " # Additional coordinates: these are added on top of dimension coordinates\n", |
| 121 | + " self._physical_coord_names = (\"cdp_x\", \"cdp_y\")\n", |
| 122 | + " self._var_chunk_shape = (128, 128, 128)\n", |
| 123 | + " self._units = {}\n", |
| 124 | + "\n", |
| 125 | + " @property\n", |
| 126 | + " def _name(self) -> str: # public name for the registry\n", |
| 127 | + " return \"AnisotropicVelocity3DDepth\"\n", |
| 128 | + "\n", |
| 129 | + " @property\n", |
| 130 | + " def _default_variable_name(self) -> str: # public name for the registry\n", |
| 131 | + " return \"velocity\"\n", |
| 132 | + "\n", |
| 133 | + " def _load_dataset_attributes(self) -> dict:\n", |
| 134 | + " return {\"surveyType\": \"3D\", \"gatherType\": \"line\"}\n", |
| 135 | + "\n", |
| 136 | + " def _add_variables(self) -> None:\n", |
| 137 | + " \"\"\"Add the variables including default and extra.\"\"\"\n", |
| 138 | + " for name in [\"velocity\", \"epsilon\", \"delta\"]:\n", |
| 139 | + " chunk_grid = RegularChunkGrid(configuration=RegularChunkShape(chunk_shape=self.full_chunk_shape))\n", |
| 140 | + " unit = self.get_unit_by_key(name)\n", |
| 141 | + " self._builder.add_variable(\n", |
| 142 | + " name=name,\n", |
| 143 | + " dimensions=self._dim_names,\n", |
| 144 | + " data_type=ScalarType.FLOAT32,\n", |
| 145 | + " compressor=compressors.Blosc(cname=compressors.BloscCname.zstd),\n", |
| 146 | + " coordinates=self.physical_coordinate_names,\n", |
| 147 | + " metadata=VariableMetadata(chunk_grid=chunk_grid, units_v1=unit),\n", |
| 148 | + " )\n", |
| 149 | + "\n", |
| 150 | + "\n", |
| 151 | + "AnisotropicVelocityTemplate()" |
| 152 | + ] |
| 153 | + }, |
| 154 | + { |
| 155 | + "cell_type": "markdown", |
| 156 | + "id": "15e61310ed0ffd97", |
| 157 | + "metadata": {}, |
| 158 | + "source": [ |
| 159 | + "## Registering the Custom Template\n", |
| 160 | + "\n", |
| 161 | + "The registry returns a deep copy of the template on every fetch. To make the template discoverable by name, register it first, then retrieve it with `get_template`." |
| 162 | + ] |
| 163 | + }, |
| 164 | + { |
| 165 | + "cell_type": "code", |
| 166 | + "execution_count": null, |
| 167 | + "id": "a4e1847b20da6768", |
| 168 | + "metadata": {}, |
| 169 | + "outputs": [], |
| 170 | + "source": [ |
| 171 | + "from mdio.builder.template_registry import register_template\n", |
| 172 | + "\n", |
| 173 | + "register_template(AnisotropicVelocityTemplate())\n", |
| 174 | + "print(\"Registered:\", \"AnisotropicVelocity3DDepth\" in list_templates())\n", |
| 175 | + "\n", |
| 176 | + "custom_template = get_template(\"AnisotropicVelocity3DDepth\")\n", |
| 177 | + "custom_template" |
| 178 | + ] |
| 179 | + }, |
| 180 | + { |
| 181 | + "cell_type": "markdown", |
| 182 | + "id": "83b0772f1913c652", |
| 183 | + "metadata": {}, |
| 184 | + "source": [ |
| 185 | + "You can also set units at any time. For this demo we’ll set metric units. The spatial units will be inferred from the SEG-Y binary header during ingestion, but we can override them here. Ingestion will honor what is in the template." |
| 186 | + ] |
| 187 | + }, |
| 188 | + { |
| 189 | + "cell_type": "code", |
| 190 | + "execution_count": null, |
| 191 | + "id": "d7dca50d72d2f93", |
| 192 | + "metadata": {}, |
| 193 | + "outputs": [], |
| 194 | + "source": [ |
| 195 | + "from mdio.builder.schemas.v1.units import LengthUnitModel\n", |
| 196 | + "from mdio.builder.schemas.v1.units import SpeedUnitModel\n", |
| 197 | + "\n", |
| 198 | + "custom_template.add_units(\n", |
| 199 | + " {\n", |
| 200 | + " \"depth\": LengthUnitModel(length=\"m\"),\n", |
| 201 | + " \"cdp_x\": LengthUnitModel(length=\"m\"),\n", |
| 202 | + " \"cdp_y\": LengthUnitModel(length=\"m\"),\n", |
| 203 | + " \"velocity\": SpeedUnitModel(speed=\"m/s\"),\n", |
| 204 | + " }\n", |
| 205 | + ")\n", |
| 206 | + "custom_template" |
| 207 | + ] |
| 208 | + }, |
| 209 | + { |
| 210 | + "cell_type": "markdown", |
| 211 | + "id": "367ade9824e72bc3", |
| 212 | + "metadata": {}, |
| 213 | + "source": [ |
| 214 | + "## Changing chunk size (chunks) on an existing template\n", |
| 215 | + "\n", |
| 216 | + "Often you will want to tweak the chunking strategy for performance. You can do this in two ways:\n", |
| 217 | + "\n", |
| 218 | + "- When defining a subclass, set a default in the constructor (e.g., `self._var_chunk_shape = (...)`).\n", |
| 219 | + "- On an existing template instance, assign to the `full_chunk_shape` property once you know your final\n", |
| 220 | + " dataset sizes (the tuple length must match the number of data dimensions).\n", |
| 221 | + "\n", |
| 222 | + "Below is a tiny demo showing how to modify the chunk shape on a fetched template. We first build the\n", |
| 223 | + "template with known sizes to satisfy validation, then update `full_chunk_shape`.\n", |
| 224 | + "\n", |
| 225 | + "```{note}\n", |
| 226 | + "In the SEG-Y to MDIO conversion workflow, MDIO infers the final grid shape from the SEG-Y headers. It’s\n", |
| 227 | + "common to set or adjust `full_chunk_shape` right before calling `segy_to_mdio`, using the same sizes\n", |
| 228 | + "you expect for the final array.\n", |
| 229 | + "```" |
| 230 | + ] |
| 231 | + }, |
| 232 | + { |
| 233 | + "cell_type": "code", |
| 234 | + "execution_count": null, |
| 235 | + "id": "75939231b58c204a", |
| 236 | + "metadata": {}, |
| 237 | + "outputs": [], |
| 238 | + "source": [ |
| 239 | + "mdio_ds = custom_template.build_dataset(name=\"demo-only\", sizes=(300, 500, 1001))\n", |
| 240 | + "# pick smaller chunks than the full array for better parallelism and IO\n", |
| 241 | + "custom_template.full_chunk_shape = (64, 64, 64)\n", |
| 242 | + "print(\"Chunk shape set to:\", custom_template.full_chunk_shape)\n", |
| 243 | + "\n", |
| 244 | + "custom_template" |
| 245 | + ] |
| 246 | + }, |
| 247 | + { |
| 248 | + "cell_type": "markdown", |
| 249 | + "id": "a76f17cdf235de13", |
| 250 | + "metadata": {}, |
| 251 | + "source": [ |
| 252 | + "## Making Dummy Xarray Dataset\n", |
| 253 | + "\n", |
| 254 | + "We can now take the MDIO Dataset model and convert it to Xarray with our configuration. If ingesting from SEG-Y, this step\n", |
| 255 | + "gets executed automatically by the converter before populating the data.\n", |
| 256 | + "\n", |
| 257 | + "Note that the whole dataset will be populated with the fill values." |
| 258 | + ] |
| 259 | + }, |
| 260 | + { |
| 261 | + "cell_type": "code", |
| 262 | + "execution_count": null, |
| 263 | + "id": "ce3dcf9c7946ea07", |
| 264 | + "metadata": {}, |
| 265 | + "outputs": [], |
| 266 | + "source": [ |
| 267 | + "from mdio.builder.xarray_builder import to_xarray_dataset\n", |
| 268 | + "\n", |
| 269 | + "to_xarray_dataset(mdio_ds)" |
| 270 | + ] |
| 271 | + }, |
| 272 | + { |
| 273 | + "cell_type": "markdown", |
| 274 | + "id": "fc05aa3c81f8465c", |
| 275 | + "metadata": {}, |
| 276 | + "source": [ |
| 277 | + "## Recap: Key APIs Used\n", |
| 278 | + "\n", |
| 279 | + "- Template registry helpers: `get_template_registry`, `list_templates`, `register_template`, `get_template`\n", |
| 280 | + "- Base template to subclass: `AbstractDatasetTemplate`\n", |
| 281 | + "- Make Xarray Dataset from MDIO Data Model: `to_xarray_dataset`\n", |
| 282 | + "\n", |
| 283 | + "With these pieces, you can standardize how your seismic data is represented in MDIO and keep ingestion code concise and repeatable.\n" |
| 284 | + ] |
| 285 | + }, |
| 286 | + { |
| 287 | + "cell_type": "code", |
| 288 | + "execution_count": null, |
| 289 | + "id": "a15848ab5c0811d6", |
| 290 | + "metadata": {}, |
| 291 | + "outputs": [], |
| 292 | + "source": [] |
| 293 | + } |
| 294 | + ], |
| 295 | + "metadata": { |
| 296 | + "mystnb": { |
| 297 | + "execution_mode": "force" |
| 298 | + } |
| 299 | + }, |
| 300 | + "nbformat": 4, |
| 301 | + "nbformat_minor": 5 |
| 302 | +} |
0 commit comments