|
1 | 1 | [//]: # (title: Number Unification)
|
2 | 2 |
|
3 |
| -// TODO |
| 3 | +The concept of unifying numbers is converting them to a common number type without losing information. |
| 4 | + |
| 5 | +This is an internal part of the library for now, but its logic can be encountered in multiple places, such as |
| 6 | +[statistics](summaryStatistics.md), and [reading JSON](read.md#read-from-json). |
| 7 | + |
| 8 | +The following graph shows the hierarchy of number types in Kotlin DataFrame. |
| 9 | + |
| 10 | +<inline-frame src="kdocs/org.jetbrains.kotlinx.dataframe.documentation.UnifyingNumbers.Graph.html" /> |
| 11 | + |
| 12 | +The order is top-down from the most complex type to the simplest one. |
| 13 | + |
| 14 | +For each number type in the graph, it holds that a number of that type can be expressed lossless by |
| 15 | +a number of a more complex type (any of its parents). |
| 16 | +This is either because the more complex type has a larger range or higher precision (in terms of bits). |
| 17 | + |
| 18 | +Nullability, while not displayed everywhere in the graph, is also taken into account. |
| 19 | +This means that `Int?` and `Float` will be unified to `Double?`. |
| 20 | + |
| 21 | +At the bottom of the graph is `Nothing?`. This can be interpreted as `null`. |
| 22 | + |
| 23 | +> There may be parts of the library that "unify" numbers, such as [`readCsv`](read.md#column-type-inference-from-csv), |
| 24 | +> or [`readExcel`](read.md#read-from-excel). |
| 25 | +> However, because they rely on another library (like [Deephaven CSV](https://github.com/deephaven/deephaven-csv)) |
| 26 | +> this may behave slightly differently. |
| 27 | +
|
| 28 | +### Unified Number Type Options |
| 29 | + |
| 30 | +There are variants of this graph that exclude some types, such as `BigDecimal` and `BigInteger`, or |
| 31 | +allow some slightly lossy conversions, like from `Long` to `Double`. |
| 32 | + |
| 33 | +This follows either `UnifiedNumberTypeOptions.PRIMITIVES_ONLY` or |
| 34 | +`UnifiedNumberTypeOptions.DEFAULT`. |
| 35 | + |
| 36 | +For `PRIMITIVES_ONLY`, used by [statistics](summaryStatistics.md), big numbers are excluded from the graph. |
| 37 | +Additionally, `Double` is considered the most complex type, |
| 38 | +meaning `Long`/`ULong` and `Double` can be joined to `Double`, |
| 39 | +potentially losing a little precision(!). |
| 40 | + |
| 41 | +For `DEFAULT`, used by [`readJson`](read.md#read-from-json), big numbers can appear. |
| 42 | +`BigDecimal` is considered the most complex type, meaning that `Long`/`ULong` and `Double` will be joined |
| 43 | +to `BigDecimal` instead. |
| 44 | + |
0 commit comments