|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "48cb2534", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Datatype Handling\n", |
| 9 | + "\n", |
| 10 | + "GeoKit uses GDAL internally to perform many of the provided functions and methods. The values of rasters and fields in vector datasets are stored as C data types with specific bit lengths. Using smaller data types with fewer bits can improve performance and reduce memory usage. However, choosing a data type with an insufficient bit length, or the wrong data type, can result in so-called overflow errors for both [integer](https://en.wikipedia.org/wiki/Integer_overflow) and [float](https://en.wikipedia.org/wiki/Floating-point_arithmetic#Range_of_floating-point_numbers) data types.\n", |
| 11 | + "\n", |
| 12 | + "In Python, conversion is usually done automatically. GeoKit also has internal logic to select a safe data type, which prevents these overflow errors. To inspect the logic, use the methods shown in this example.\n", |
| 13 | + "\n", |
| 14 | + "## Get the datatype for individual numbers \n", |
| 15 | + "\n", |
| 16 | + "To obtain a minimum required datatype for an individual number use the \"get_valid_gdal_data_type_as_string\" method" |
| 17 | + ] |
| 18 | + }, |
| 19 | + { |
| 20 | + "cell_type": "code", |
| 21 | + "execution_count": null, |
| 22 | + "id": "4acec25d", |
| 23 | + "metadata": {}, |
| 24 | + "outputs": [], |
| 25 | + "source": [ |
| 26 | + "from geokit.c_data_type_handler import MinimumCDataTypeHandler\n", |
| 27 | + "import numpy as np\n", |
| 28 | + "\n", |
| 29 | + "# To showcase the data type detection based on single numbers\n", |
| 30 | + "for current_number in [\n", |
| 31 | + " 5,\n", |
| 32 | + " 0 - 5,\n", |
| 33 | + " 230,\n", |
| 34 | + " 5000,\n", |
| 35 | + " 60000,\n", |
| 36 | + " 9223372036854775807,\n", |
| 37 | + " 18446744073709551615,\n", |
| 38 | + " True,\n", |
| 39 | + " False,\n", |
| 40 | + " 1 / 3,\n", |
| 41 | + " -1 / 3,\n", |
| 42 | + " np.nan,\n", |
| 43 | + " np.inf,\n", |
| 44 | + " -np.inf,\n", |
| 45 | + " 1.0,\n", |
| 46 | + " 3 * 10**37,\n", |
| 47 | + " 3 * 10**40,\n", |
| 48 | + "]:\n", |
| 49 | + " data_type = MinimumCDataTypeHandler.get_valid_gdal_data_type_as_string(list_of_numbers=[current_number])\n", |
| 50 | + " print(f\"Number: {current_number} -> Data type: {data_type}\")" |
| 51 | + ] |
| 52 | + }, |
| 53 | + { |
| 54 | + "cell_type": "markdown", |
| 55 | + "id": "849e7ee3", |
| 56 | + "metadata": {}, |
| 57 | + "source": [ |
| 58 | + "## Use multiple numbers\n", |
| 59 | + "\n", |
| 60 | + "If you have several numbers of the same type that you want to store, you can simply add them to the list of numbers argument." |
| 61 | + ] |
| 62 | + }, |
| 63 | + { |
| 64 | + "cell_type": "code", |
| 65 | + "execution_count": null, |
| 66 | + "id": "304db3f3", |
| 67 | + "metadata": {}, |
| 68 | + "outputs": [], |
| 69 | + "source": [ |
| 70 | + "data_type = MinimumCDataTypeHandler.get_valid_gdal_data_type_as_string(list_of_numbers=[9, 300])\n", |
| 71 | + "print(f\"common datatype: {[9, 300]} -> Data type: {data_type}\")" |
| 72 | + ] |
| 73 | + }, |
| 74 | + { |
| 75 | + "cell_type": "markdown", |
| 76 | + "id": "2f168f52", |
| 77 | + "metadata": {}, |
| 78 | + "source": [ |
| 79 | + "## All Supported Datatypes\n", |
| 80 | + "\n", |
| 81 | + "GeoKit does not support all C data types, nor all C data types supported by GDAL, because some are rarely found in real-world geodata. Below the supported data is shown:" |
| 82 | + ] |
| 83 | + }, |
| 84 | + { |
| 85 | + "cell_type": "code", |
| 86 | + "execution_count": null, |
| 87 | + "id": "f0f70034", |
| 88 | + "metadata": {}, |
| 89 | + "outputs": [], |
| 90 | + "source": [ |
| 91 | + "from geokit.data_types import _gdal_c_raster_data_types_list, _gdal_c_raster_data_types_with_abbreviations_list\n", |
| 92 | + "\n", |
| 93 | + "print(\"Supported c data types in gdal:\\n\", _gdal_c_raster_data_types_list)\n", |
| 94 | + "print(\"Aliases for Supported c data types in gdal:\\n\", _gdal_c_raster_data_types_with_abbreviations_list)" |
| 95 | + ] |
| 96 | + }, |
| 97 | + { |
| 98 | + "cell_type": "markdown", |
| 99 | + "id": "43b5c193", |
| 100 | + "metadata": {}, |
| 101 | + "source": [ |
| 102 | + "## Set Minimum Datatype\n", |
| 103 | + "\n", |
| 104 | + "In some cases, you may need to specify a particular data type, for example, to ensure compatibility with other software. The 'get_valid_gdal_data_type_as_string' function allows you to specify this minimum data type. However, if the numbers cannot be stored in this data type, the smallest suitable data type is returned. For automation purposes, multiple minimum data types can also be passed." |
| 105 | + ] |
| 106 | + }, |
| 107 | + { |
| 108 | + "cell_type": "code", |
| 109 | + "execution_count": null, |
| 110 | + "id": "7b56a704", |
| 111 | + "metadata": {}, |
| 112 | + "outputs": [], |
| 113 | + "source": [ |
| 114 | + "# Show the minimum datatype to store a 7\n", |
| 115 | + "numbers_to_inspect = [7]\n", |
| 116 | + "minimum_gdal_type_list = None # no minimum data type should be considered\n", |
| 117 | + "data_type = MinimumCDataTypeHandler.get_valid_gdal_data_type_as_string(\n", |
| 118 | + " list_of_numbers=numbers_to_inspect, minimum_gdal_type_list=minimum_gdal_type_list\n", |
| 119 | + ")\n", |
| 120 | + "print(\n", |
| 121 | + " f\"Numbers inspected: {numbers_to_inspect}, minimum data type defined: {minimum_gdal_type_list}, -> Data type: {data_type}\"\n", |
| 122 | + ")\n", |
| 123 | + "\n", |
| 124 | + "# Show the minimum datatype to store a 7 with a minimum data type of GDT_UInt16\n", |
| 125 | + "numbers_to_inspect = [7]\n", |
| 126 | + "minimum_gdal_type_list = [\"GDT_Int16\"]\n", |
| 127 | + "data_type = MinimumCDataTypeHandler.get_valid_gdal_data_type_as_string(\n", |
| 128 | + " list_of_numbers=numbers_to_inspect, minimum_gdal_type_list=minimum_gdal_type_list\n", |
| 129 | + ")\n", |
| 130 | + "print(\n", |
| 131 | + " f\"Numbers inspected: {numbers_to_inspect}, minimum data type defined: {minimum_gdal_type_list}, -> Data type: {data_type}\"\n", |
| 132 | + ")\n", |
| 133 | + "\n", |
| 134 | + "# Show the minimum datatype to store a 300 with a minimum data type of GDT_Int8\n", |
| 135 | + "numbers_to_inspect = [300]\n", |
| 136 | + "minimum_gdal_type_list = [\"GDT_Int8\"]\n", |
| 137 | + "data_type = MinimumCDataTypeHandler.get_valid_gdal_data_type_as_string(\n", |
| 138 | + " list_of_numbers=numbers_to_inspect, minimum_gdal_type_list=minimum_gdal_type_list\n", |
| 139 | + ")\n", |
| 140 | + "print(\n", |
| 141 | + " f\"Numbers inspected: {numbers_to_inspect}, minimum data type defined: {minimum_gdal_type_list}, -> Data type: {data_type}, GDT_Int8 cannot store a 300 thus GDT_Int16 is returned\"\n", |
| 142 | + ")" |
| 143 | + ] |
| 144 | + }, |
| 145 | + { |
| 146 | + "cell_type": "markdown", |
| 147 | + "id": "ff5019c5", |
| 148 | + "metadata": {}, |
| 149 | + "source": [ |
| 150 | + "## Ambiguity due to signed and unsigned integer\n", |
| 151 | + "\n", |
| 152 | + "Some values can be stored as either a signed or unsigned integer. For example, 30,000 can be stored as an Int16 or a Uint16. In case of ambiguity, GeoKit chooses the signed integer. " |
| 153 | + ] |
| 154 | + }, |
| 155 | + { |
| 156 | + "cell_type": "code", |
| 157 | + "execution_count": null, |
| 158 | + "id": "88b277fa", |
| 159 | + "metadata": {}, |
| 160 | + "outputs": [], |
| 161 | + "source": [ |
| 162 | + "numbers_to_inspect = [30000]\n", |
| 163 | + "data_type = MinimumCDataTypeHandler.get_valid_gdal_data_type_as_string(\n", |
| 164 | + " list_of_numbers=numbers_to_inspect,\n", |
| 165 | + ")\n", |
| 166 | + "print(f\"Numbers inspected: {numbers_to_inspect}, -> Data type: {data_type}\")\n", |
| 167 | + "numbers_to_inspect = [30000]\n", |
| 168 | + "minimum_gdal_type_list = [\"GDT_UInt16\"]\n", |
| 169 | + "data_type = MinimumCDataTypeHandler.get_valid_gdal_data_type_as_string(\n", |
| 170 | + " list_of_numbers=numbers_to_inspect, minimum_gdal_type_list=minimum_gdal_type_list\n", |
| 171 | + ")\n", |
| 172 | + "print(\n", |
| 173 | + " f\"Numbers inspected: {numbers_to_inspect}, minimum data type defined: {minimum_gdal_type_list}, -> Data type: {data_type}\"\n", |
| 174 | + ")" |
| 175 | + ] |
| 176 | + }, |
| 177 | + { |
| 178 | + "cell_type": "markdown", |
| 179 | + "id": "d7f06284", |
| 180 | + "metadata": {}, |
| 181 | + "source": [ |
| 182 | + "## Choose Unsigned Integer over Signed Integer in Ambiguous cases\n", |
| 183 | + "\n", |
| 184 | + "In order to choose an unsigned integer over a signed one pass the desired data type \"user_defined_minimum_gdal_type\" " |
| 185 | + ] |
| 186 | + }, |
| 187 | + { |
| 188 | + "cell_type": "code", |
| 189 | + "execution_count": null, |
| 190 | + "id": "1a6e5835", |
| 191 | + "metadata": {}, |
| 192 | + "outputs": [], |
| 193 | + "source": [ |
| 194 | + "numbers_to_inspect = [300]\n", |
| 195 | + "user_defined_minimum_gdal_type = \"GDT_UInt16\"\n", |
| 196 | + "data_type = MinimumCDataTypeHandler.get_valid_gdal_data_type_as_string(\n", |
| 197 | + " list_of_numbers=numbers_to_inspect, user_defined_minimum_gdal_type=user_defined_minimum_gdal_type\n", |
| 198 | + ")\n", |
| 199 | + "print(\n", |
| 200 | + " f\"Numbers inspected: {numbers_to_inspect}, user defined: {user_defined_minimum_gdal_type}, -> Data type: {data_type}\"\n", |
| 201 | + ")" |
| 202 | + ] |
| 203 | + }, |
| 204 | + { |
| 205 | + "cell_type": "markdown", |
| 206 | + "id": "65e41a3b", |
| 207 | + "metadata": {}, |
| 208 | + "source": [ |
| 209 | + "## Detect Unintended Data Conversions\n", |
| 210 | + "\n", |
| 211 | + "If you require a specific output data type you can use user_defined_minimum_gdal_type argument. It prompts a warning in case another data type has been deemed necessary." |
| 212 | + ] |
| 213 | + }, |
| 214 | + { |
| 215 | + "cell_type": "code", |
| 216 | + "execution_count": null, |
| 217 | + "id": "aa6b7f7e", |
| 218 | + "metadata": {}, |
| 219 | + "outputs": [], |
| 220 | + "source": [ |
| 221 | + "numbers_to_inspect = [-60]\n", |
| 222 | + "user_defined_minimum_gdal_type = \"GDT_UInt16\"\n", |
| 223 | + "data_type = MinimumCDataTypeHandler.get_valid_gdal_data_type_as_string(\n", |
| 224 | + " list_of_numbers=numbers_to_inspect, user_defined_minimum_gdal_type=user_defined_minimum_gdal_type\n", |
| 225 | + ")\n", |
| 226 | + "print(\n", |
| 227 | + " f\"Numbers inspected: {numbers_to_inspect}, user defined: {user_defined_minimum_gdal_type}, -> Data type: {data_type}\"\n", |
| 228 | + ")" |
| 229 | + ] |
| 230 | + } |
| 231 | + ], |
| 232 | + "metadata": { |
| 233 | + "kernelspec": { |
| 234 | + "display_name": "geokit_env", |
| 235 | + "language": "python", |
| 236 | + "name": "python3" |
| 237 | + }, |
| 238 | + "language_info": { |
| 239 | + "codemirror_mode": { |
| 240 | + "name": "ipython", |
| 241 | + "version": 3 |
| 242 | + }, |
| 243 | + "file_extension": ".py", |
| 244 | + "mimetype": "text/x-python", |
| 245 | + "name": "python", |
| 246 | + "nbconvert_exporter": "python", |
| 247 | + "pygments_lexer": "ipython3", |
| 248 | + "version": "3.13.5" |
| 249 | + } |
| 250 | + }, |
| 251 | + "nbformat": 4, |
| 252 | + "nbformat_minor": 5 |
| 253 | +} |
0 commit comments