You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For `locale`, this means that the one being used by the parser is defined as:
99
-
100
-
↪ The locale given as function argument directly, or in `parserOptions`, if it is not `null`, else
101
-
102
-
↪ The locale set by `DataFrame.parser.locale = ...`, if it is not `null`, else
103
-
104
-
↪ `Locale.getDefault()`, which is the system's default locale that can be changed with `Locale.setDefault()`.
105
-
106
-
### Parsing Doubles
107
-
108
-
DataFrame has a new fast and powerful double parser enabled by default.
109
-
It is based on [the FastDoubleParser library](https://github.com/wrandelshofer/FastDoubleParser) for its
110
-
high performance and configurability
111
-
(in the future, we might expand this support to `Float`, `BigDecimal`, and `BigInteger` as well).
112
-
113
-
The parser is locale-aware; it will use the locale set by the
114
-
[(global)](#global-parser-options)[parser options](#parser-options) to parse the doubles.
115
-
It also has a fallback mechanism built in, meaning it can recognize characters from
116
-
all other locales (and some from [Wikipedia](https://en.wikipedia.org/wiki/Decimal_separator))
117
-
and parse them correctly as long as they don't conflict with the current locale.
118
-
119
-
For example, if your locale uses ',' as decimal separator, it will not recognize ',' as thousands separator, but it will
120
-
recognize ''', ' ', '٬', '_', ' ', etc. as such.
121
-
The same holds for characters like "e", "inf", "×10^", "NaN", etc. (ignoring case).
122
-
123
-
This means you can safely parse `"123'456 789,012.345×10^6"` with a US locale but not `"1.234,5"`.
124
-
125
-
Aside from this, DataFrame also explicitly recognizes "∞", "inf", "infinity", and "infty" as `Double.POSITIVE_INFINITY`
126
-
(as well as their negative counterparts), "nan", "na", and "n/a" as `Double.NaN`,
127
-
and all forms of whitespace are treated equally.
128
-
129
-
If `FastDoubleParser` fails to parse a `String` as `Double`, DataFrame will try
130
-
to parse it using the standard `NumberFormat.parse()` function as a last resort.
131
-
132
-
If you experience any issues with the new parser, you can turn it off by setting
133
-
`useFastDoubleParser = false`, which will use the old `NumberFormat.parse()` function instead.
134
-
135
-
Please [report](https://github.com/Kotlin/dataframe/issues) any issues you encounter.
0 commit comments