|
| 1 | +--- |
| 2 | +title: Convert data using dataflow conversions |
| 3 | +description: Learn about dataflow conversions for transforming data in Azure IoT Operations. |
| 4 | +author: PatAltimore |
| 5 | +ms.author: patricka |
| 6 | +ms.subservice: azure-data-flows |
| 7 | +ms.topic: concept-article |
| 8 | +ms.date: 08/01/2024 |
| 9 | + |
| 10 | +#CustomerIntent: As an operator, I want to understand how to use dataflow conversions to transform data. |
| 11 | +--- |
| 12 | + |
| 13 | +# Convert data using dataflow conversions |
| 14 | + |
| 15 | +You can use dataflow conversions to transform data in Azure IoT Operations. The *conversion* element in a dataflow is used to compute values for output fields. You can use input fields, available operations, data types, and type conversions in dataflow conversions. |
| 16 | + |
| 17 | +Dataflow *conversion* element is used to compute values for output fields: |
| 18 | + |
| 19 | +```yaml |
| 20 | +- inputs: |
| 21 | + - *.Max # - $1 |
| 22 | + - *.Min # - $2 |
| 23 | + output: ColorProperties.* |
| 24 | + expression: ($1 + $2) / 2 |
| 25 | +``` |
| 26 | +
|
| 27 | +There are several aspects to understand about conversions: |
| 28 | +
|
| 29 | +* Reference to input fields: How to reference values from input fields in the conversion formula. |
| 30 | +* Available operations: Operations that can be utilized in conversions. For example, addition, subtraction, multiplication, and division. |
| 31 | +* Data types: Types of data that a formula can process and manipulate. For example, integer, floating-point, string. |
| 32 | +* Type conversions: How data types are converted between the input field values, the formula evaluation, and the output fields. |
| 33 | +
|
| 34 | +## Input fields |
| 35 | +
|
| 36 | +In conversions, formulas can operate on static values like a number such as *25* or parameters derived from input fields. A mapping defines these input fields that the formula can access. Each field is referenced according to its order in the input list: |
| 37 | +
|
| 38 | +```yaml |
| 39 | +- inputs: |
| 40 | + - *.Max # - $1 |
| 41 | + - *.Min # - $2 |
| 42 | + - *.Mid.Avg # - $3 |
| 43 | + - *.Mid.Mean # - $4 |
| 44 | + output: ColorProperties.* |
| 45 | + expression: ($1, $2, $3, $4) |
| 46 | +``` |
| 47 | +
|
| 48 | +In this example, the conversion results in an array containing the values of `[Max, Min, Mid.Avg, Mid.Mean]`. The comments in the YAML file (`# - $1`, `# - $2`) are optional but help clarify the connection between each field property and its role in the conversion formula. |
| 49 | + |
| 50 | +## Data types |
| 51 | + |
| 52 | +Different serialization formats support various data types. For instance, JSON offers a few primitive types: string, number, boolean, and null. Also included are arrays of these primitive types. In contrast, other serialization formats like Avro have a more complex type system, including integers with multiple bit field lengths and timestamps with different resolutions. For example, milliseconds and microseconds. |
| 53 | + |
| 54 | +When the mapper reads an input property, it converts it into an internal type. This conversion is necessary for holding the data in memory until it's written out into an output field. The conversion to an internal type happens regardless of whether the input and output serialization formats are the same. |
| 55 | + |
| 56 | +The internal representation utilizes the following data types: |
| 57 | + |
| 58 | +| Type | Description | |
| 59 | +|-------------|-------------------------------------------| |
| 60 | +| bool | Logical true/false | |
| 61 | +| integer | Stored as 128-bit signed integer | |
| 62 | +| float | Stored as 64-bit floating point number | |
| 63 | +| string | A UTF-8 string | |
| 64 | +| bytes | Binary data, a string of 8-bit unsigned values | |
| 65 | +| date time | UTC or local time with nanosecond resolution | |
| 66 | +| time | Time of day with nanosecond resolution | |
| 67 | +| duration | A duration with nanosecond resolution | |
| 68 | +| array | An array of any types listed previously | |
| 69 | +| map | A vector of (key, value) pairs of any types listed previously | |
| 70 | + |
| 71 | +### Input record fields |
| 72 | + |
| 73 | +When an input record field is read, its underlying type is converted into one of these internal type variants. The internal representation is versatile enough to handle most input types with minimal or no conversion. However, some input types require conversion or are unsupported. Some examples: |
| 74 | + |
| 75 | +* *Avro's UUID type* is converted to a *string*, as there's no specific *UUID* type in the internal representation. |
| 76 | +* *Avro's Decimal type* isn't supported by the mapper, thus fields of this type can't be included in mappings. |
| 77 | +* *Avro's Duration type* conversion can vary. If the *months* field is set, it's unsupported. If only *days* and *milliseconds* are set, it's converted to the internal *duration* representation. |
| 78 | + |
| 79 | +For some formats, surrogate types are used. For example, JSON doesn't have a *datetime* type and instead stores *datetime* values as strings formatted according to ISO8601. When the mapper reads such a field, the internal representation remains a string. |
| 80 | + |
| 81 | +### Output record fields |
| 82 | + |
| 83 | +The mapper is designed to be flexible by converting internal types into output types to accommodate scenarios where data comes from a serialization format with a limited type system. The following are some examples of how conversions are handled: |
| 84 | + |
| 85 | +* *Numeric types*: These can be converted to other representations, even if it means losing precision. For example, a 64-bit floating-point number (*f64*) can be converted into a 32-bit integer (*i32*). |
| 86 | +* *Strings to numbers*: If the incoming record contains a string like "123" and the output field is a 32-bit integer, the mapper converts and writes the value as a number. |
| 87 | +* *Strings to other types*: |
| 88 | + * If the output field is a *datetime*, the mapper attempts to parse the string as an ISO8601 formatted *datetime*. |
| 89 | + * If the output field is *binary/bytes*, the mapper tries to deserialize the string from a base64 encoded string. |
| 90 | +* *Boolean values*: |
| 91 | + * Converted to 0/1 if the output field is numerical. |
| 92 | + * Converted to "true"/"false" if the output field is string. |
| 93 | + |
| 94 | +### Explicit type conversions |
| 95 | + |
| 96 | +While the automatic conversions operate as one might expect based on common implementation practices, there are instances where the right conversion can't be determined automatically and results in an *unsupported* error. To address these situations, several conversion functions are available to explicitly define how data should be transformed. These functions provide more control over how data is converted and ensure that data integrity is maintained even when automatic methods fall short. |
| 97 | + |
| 98 | +### Using conversion formula with types |
| 99 | + |
| 100 | +In mappings, an optional formula can specify how data from the input is processed before being written to the output field. If no formula is specified, the mapper copies the input field to the output using the internal type and conversion rules. |
| 101 | + |
| 102 | +If a formula is specified, the data types available for use in formulas are limited to: |
| 103 | + |
| 104 | +* Integers |
| 105 | +* Floating-point numbers |
| 106 | +* Strings |
| 107 | +* Booleans |
| 108 | +* Arrays of the above types |
| 109 | +* Missing value |
| 110 | + |
| 111 | +*Map* and *Byte* can't participate in formulas. |
| 112 | + |
| 113 | +Types related to time (*date time*, *time*, and *duration*) are converted into integer values representing time in seconds. After formula evaluation, results are stored in the internal representation and not converted back. For example, a *datetime* converted to seconds remains an integer. If the value is to be used in date-time fields, an explicit conversion method must be applied. For example, converting the value into an ISO8601 string that is automatically converted to the date-time type of the output serialization format. |
| 114 | + |
| 115 | +### Using irregular types |
| 116 | + |
| 117 | +Special considerations apply to types like arrays and *missing value*: |
| 118 | + |
| 119 | +### Arrays |
| 120 | + |
| 121 | +Arrays can be processed using aggregation functions to compute a single value from multiple elements. For example, using the input record: |
| 122 | + |
| 123 | +```json |
| 124 | +{ |
| 125 | + "Measurements": [2.34, 12.3, 32.4] |
| 126 | +} |
| 127 | +``` |
| 128 | + |
| 129 | +With the mapping: |
| 130 | + |
| 131 | +```yaml |
| 132 | +- inputs: |
| 133 | + - Measurements # - $1 |
| 134 | + output: Measurement |
| 135 | + expression: min($1) |
| 136 | +``` |
| 137 | + |
| 138 | +This configuration selects the smallest value from the *Measurements* array for the output field. |
| 139 | + |
| 140 | +It's also possible to use functions that result a new array: |
| 141 | + |
| 142 | +```yaml |
| 143 | +- inputs: |
| 144 | + - Measurements # - $1 |
| 145 | + output: Measurements |
| 146 | + expression: take($1, 10) # taking at max 10 items |
| 147 | +``` |
| 148 | + |
| 149 | +Arrays can also be created from multiple single values: |
| 150 | + |
| 151 | +```yaml |
| 152 | +- inputs: |
| 153 | + - minimum # - - $1 |
| 154 | + - maximum # - - $2 |
| 155 | + - average # - - $3 |
| 156 | + - mean # - - $4 |
| 157 | + output: stats |
| 158 | + expression: ($1, $2, $3, $4) |
| 159 | +``` |
| 160 | + |
| 161 | +This mapping creates an array containing the minimum, maximum, average, and mean. |
| 162 | + |
| 163 | +### Missing value |
| 164 | + |
| 165 | +*Missing value* is a special type used in scenarios such as: |
| 166 | + |
| 167 | +* Handling missing fields in the input by providing an alternative value. |
| 168 | +* Conditionally removing a field based on its presence. |
| 169 | + |
| 170 | +Example mapping using *missing value*`: |
| 171 | + |
| 172 | +```json |
| 173 | +{ |
| 174 | + "Employment": { |
| 175 | + "Position": "Analyst", |
| 176 | + "BaseSalary": 75000, |
| 177 | + "WorkingHours": "Regular" |
| 178 | + } |
| 179 | +} |
| 180 | +``` |
| 181 | + |
| 182 | +The input record contains `BaseSalary` field, but possibly that is optional. Let's say that if the field is missing, a value must be added from a contextualization dataset: |
| 183 | + |
| 184 | +```json |
| 185 | +{ |
| 186 | + "Position": "Analyst", |
| 187 | + "BaseSalary": 70000, |
| 188 | + "WorkingHours": "Regular" |
| 189 | +} |
| 190 | +``` |
| 191 | + |
| 192 | +A mapping can check if the field is present in the input record. If found, the output receives that existing value. Otherwise, the output receives the value from the context dataset. For example: |
| 193 | + |
| 194 | +```yaml |
| 195 | +- inputs: |
| 196 | + - BaseSalary # - - - - - - - - - - $1 |
| 197 | + - $context(position).BaseSalary # - $2 |
| 198 | + output: BaseSalary |
| 199 | + expression: if($1 == (), $2, $1) |
| 200 | +``` |
| 201 | + |
| 202 | +The `conversion` uses the `if` function that has three parameters: |
| 203 | + |
| 204 | +* The first parameter is a condition. In the example, it checks if the `BaseSalary` field of the input field (aliased as `$1`) is the *missing value*. |
| 205 | +* The second parameter is the result of the function if the condition in the first parameter is true. In this example, it's the `BaseSalary` field of the contextualization dataset (aliased as `$2`). |
| 206 | +* The third parameter is the value for the condition if the first parameter is false. |
| 207 | + |
| 208 | +## Available functions |
| 209 | + |
| 210 | +Functions can be used in the conversion formula to perform various operations. |
| 211 | + |
| 212 | +* `min` to select a single item from an array |
| 213 | +* `if` to select between values |
| 214 | +* string manipulation (for example, `uppercase()`) |
| 215 | +* explicit conversion (for example, `ISO8601_datetime`) |
| 216 | +* aggregation (for example, `avg()`) |
| 217 | + |
| 218 | +## Available operations |
| 219 | + |
| 220 | +Dataflow offers a wide range of out-of-the-box (OOTB) conversion functions that allow users to easily perform unit conversions without the need for complex calculations. These predefined functions cover common conversions such as temperature, pressure, length, weight, and volume. The following is a list of the available conversion functions, along with their corresponding formulas and function names: |
| 221 | + |
| 222 | +| Conversion | Formula | Function Name | |
| 223 | +| --- | --- | --- | |
| 224 | +| Celsius to Fahrenheit | F = (C * 9/5) + 32 | cToF | |
| 225 | +| PSI to Bar | Bar = PSI * 0.0689476 | psiToBar | |
| 226 | +| Inch to CM | CM = Inch * 2.54 | inToCm | |
| 227 | +| Foot to Meter | Meter = Foot * 0.3048 | ftToM | |
| 228 | +| Lbs to KG | KG = Lbs * 0.453592 | lbToKg | |
| 229 | +| Gallons to Liters | Liters = Gallons * 3.78541 | galToL | |
| 230 | + |
| 231 | +In addition to these unidirectional conversions, we also support the reverse calculations: |
| 232 | + |
| 233 | +| Conversion | Formula | Function Name | |
| 234 | +| --- | --- | --- | |
| 235 | +| Fahrenheit to Celsius | C = (F - 32) * 5/9 | fToC | |
| 236 | +| Bar to PSI | PSI = Bar / 0.0689476 | barToPsi | |
| 237 | +| CM to Inch | Inch = CM / 2.54 | cmToIn | |
| 238 | +| Meter to Foot | Foot = Meter / 0.3048 | mToFt | |
| 239 | +| KG to Lbs | Lbs = KG / 0.453592 | kgToLb | |
| 240 | +| Liters to Gallons | Gallons = Liters / 3.78541 | lToGal | |
| 241 | + |
| 242 | +These functions are designed to simplify the conversion process, allowing users to input values in one unit and receive the corresponding value in another unit effortlessly. |
| 243 | + |
| 244 | +Additionally, we provide a scaling function to scale the range of value to the user-defined range. Example-`scale($1,0,10,0,100)`the input value is scaled from the range 0 to 10 to the range 0 to 100. |
| 245 | + |
| 246 | +Moreover, users have the flexibility to define their own conversion functions using simple mathematical formulas. Our system supports basic operators such as addition (`+`), subtraction (`-`), multiplication (`*`), and division (`/`). These operators follow standard rules of precedence (for example, multiplication and division are performed before addition and subtraction), which can be adjusted using parentheses to ensure the correct order of operations. This capability empowers users to customize their unit conversions to meet specific needs or preferences, enhancing the overall utility and versatility of the system. |
| 247 | + |
| 248 | + |
| 249 | +For more complex calculations, functions like `sqrt` (which finds the square root of a number) are also available. |
| 250 | + |
| 251 | +### Available arithmetic, comparison, and boolean operators grouped by precedence |
| 252 | + |
| 253 | +| Operator | Description | |
| 254 | +|----------|-------------| |
| 255 | +| ^ | Exponentiation: $1 ^ 3 | |
| 256 | + |
| 257 | +Since `Exponentiation` has the highest precedence, it's executed first unless parentheses override this order: |
| 258 | + |
| 259 | +* `$1 * 2 ^ 3` is interpreted as `$1 * 8` because the `2 ^ 3` part is executed first, before multiplication. |
| 260 | +* `($1 * 2) ^ 3` processes the multiplication before exponentiation. |
| 261 | + |
| 262 | +| Operator | Description | |
| 263 | +|----------|-------------| |
| 264 | +| - | Negation | |
| 265 | +| ! | Logical not | |
| 266 | + |
| 267 | +`Negation` and `Logical not` have high precedence, so they always stick to their immediate neighbor, except when exponentiation is involved: |
| 268 | + |
| 269 | +* `-$1 * 2` negates $1 first, then multiplies. |
| 270 | +* `-($1 * 2)` multiplies, then negates the result |
| 271 | + |
| 272 | +| Operator | Description | |
| 273 | +|----------|-------------| |
| 274 | +| * | Multiplication: $1 * 10 | |
| 275 | +| / | Division: $1 / 25 (Result is an integer if both arguments are integers, otherwise float) | |
| 276 | +| % | Modulo: $1 % 25 | |
| 277 | + |
| 278 | +`Multiplication`, `Division`, and `Modulo`, having the same precedence, are executed from left to right, unless the order is altered by parentheses. |
| 279 | + |
| 280 | +| Operator | Description | |
| 281 | +|----------|-------------| |
| 282 | +| + | Addition for numeric values, concatenation for strings | |
| 283 | +| - | Subtraction | |
| 284 | + |
| 285 | +`Addition` and `Subtraction` are considered weaker operations compared to those in the previous group: |
| 286 | + |
| 287 | +* `$1 + 2 * 3` results in `$1 + 6`, as `2 * 3` is executed first due to the higher precedence of `Multiplication`. |
| 288 | +* `($1 + 2) * 3` prioritizes the `addition` before `multiplication`. |
| 289 | + |
| 290 | +| Operator | Description | |
| 291 | +|----------|-------------| |
| 292 | +| < | Less than | |
| 293 | +| > | Greater than | |
| 294 | +| <= | Less than or equal to | |
| 295 | +| >= | Greater than or equal to | |
| 296 | +| == | Equal to | |
| 297 | +| != | Not equal to | |
| 298 | + |
| 299 | +`Comparisons` operate on numeric, boolean, and string values. Since they have lower precedence than arithmetic operators, no parentheses are needed to compare results effectively: |
| 300 | + |
| 301 | +* `$1 * 2 <= $2` is equivalent to `($1 * 2) <= $2`. |
| 302 | + |
| 303 | +| Operator | Description | |
| 304 | +|----------|-------------| |
| 305 | +| \|\| | Logical OR | |
| 306 | +| && | Logical AND | |
| 307 | + |
| 308 | +Logical operators are used to chain conditions: |
| 309 | + |
| 310 | +* `$1 > 100 && $2 > 200` |
0 commit comments