You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For `locale`, this means that the one being used by the parser is defined as:
101
+
102
+
↪ The locale given as function argument directly, or in `parserOptions`, if it is not `null`, else
103
+
104
+
↪ The locale set by `DataFrame.parser.locale = ...`, if it is not `null`, else
105
+
106
+
↪ `Locale.getDefault()`, which is the system's default locale that can be changed with `Locale.setDefault()`.
107
+
108
+
### Parsing Doubles
109
+
110
+
DataFrame has a new fast and powerful double parser enabled by default.
111
+
It is based on [the FastDoubleParser library](https://github.com/wrandelshofer/FastDoubleParser) for its
112
+
high performance and configurability
113
+
(in the future, we might expand this support to `Float`, `BigDecimal`, and `BigInteger` as well).
114
+
115
+
The parser is locale-aware; it will use the locale set by the
116
+
[(global)](#global-parser-options)[parser options](#parser-options) to parse the doubles.
117
+
It also has a fallback mechanism built in, meaning it can recognize characters from
118
+
all other locales (and some from [Wikipedia](https://en.wikipedia.org/wiki/Decimal_separator))
119
+
and parse them correctly as long as they don't conflict with the current locale.
120
+
121
+
For example, if your locale uses ',' as decimal separator, it will not recognize ',' as thousands separator, but it will
122
+
recognize ''', ' ', '٬', '_', ' ', etc. as such.
123
+
The same holds for characters like "e", "inf", "×10^", "NaN", etc. (ignoring case).
124
+
125
+
This means you can safely parse `"123'456 789,012.345×10^6"` with a US locale but not `"1.234,5"`.
126
+
127
+
Aside from this, DataFrame also explicitly recognizes "∞", "inf", "infinity", and "infty" as `Double.POSITIVE_INFINITY`
128
+
(as well as their negative counterparts), "nan", "na", and "n/a" as `Double.NaN`,
129
+
and all forms of whitespace are treated equally.
130
+
131
+
If `FastDoubleParser` fails to parse a `String` as `Double`, DataFrame will try
132
+
to parse it using the standard `NumberFormat.parse()` function as a last resort.
133
+
134
+
If you experience any issues with the new parser, you can turn it off by setting
135
+
`useFastDoubleParser = false`, which will use the old `NumberFormat.parse()` function instead.
136
+
137
+
Please [report](https://github.com/Kotlin/dataframe/issues) any issues you encounter.
0 commit comments