Skip to content

Commit 0c415bc

Browse files
committed
revert removal of parse documentation by auto-commit
1 parent 55c3525 commit 0c415bc

File tree

1 file changed

+39
-0
lines changed

1 file changed

+39
-0
lines changed

docs/StardustDocs/topics/parse.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,4 +95,43 @@ DataFrame.parser.locale = Locale.FRANCE
9595
DataFrame.parser.addDateTimePattern("dd.MM.uuuu HH:mm:ss")
9696
```
9797

98+
For `locale`, this means that the one being used by the parser is defined as:
99+
100+
↪ The locale given as function argument directly, or in `parserOptions`, if it is not `null`, else
101+
102+
    ↪ The locale set by `DataFrame.parser.locale = ...`, if it is not `null`, else
103+
104+
        `Locale.getDefault()`, which is the system's default locale that can be changed with `Locale.setDefault()`.
105+
106+
### Parsing Doubles
107+
108+
DataFrame has a new fast and powerful double parser enabled by default.
109+
It is based on [the FastDoubleParser library](https://github.com/wrandelshofer/FastDoubleParser) for its
110+
high performance and configurability
111+
(in the future, we might expand this support to `Float`, `BigDecimal`, and `BigInteger` as well).
112+
113+
The parser is locale-aware; it will use the locale set by the
114+
[(global)](#global-parser-options) [parser options](#parser-options) to parse the doubles.
115+
It also has a fallback mechanism built in, meaning it can recognize characters from
116+
all other locales (and some from [Wikipedia](https://en.wikipedia.org/wiki/Decimal_separator))
117+
and parse them correctly as long as they don't conflict with the current locale.
118+
119+
For example, if your locale uses ',' as decimal separator, it will not recognize ',' as thousands separator, but it will
120+
recognize ''', ' ', '٬', '_', ' ', etc. as such.
121+
The same holds for characters like "e", "inf", "×10^", "NaN", etc. (ignoring case).
122+
123+
This means you can safely parse `"123'456 789,012.345×10^6"` with a US locale but not `"1.234,5"`.
124+
125+
Aside from this, DataFrame also explicitly recognizes "∞", "inf", "infinity", and "infty" as `Double.POSITIVE_INFINITY`
126+
(as well as their negative counterparts), "nan", "na", and "n/a" as `Double.NaN`,
127+
and all forms of whitespace are treated equally.
128+
129+
If `FastDoubleParser` fails to parse a `String` as `Double`, DataFrame will try
130+
to parse it using the standard `NumberFormat.parse()` function as a last resort.
131+
132+
If you experience any issues with the new parser, you can turn it off by setting
133+
`useFastDoubleParser = false`, which will use the old `NumberFormat.parse()` function instead.
134+
135+
Please [report](https://github.com/Kotlin/dataframe/issues) any issues you encounter.
136+
98137
<!---END-->

0 commit comments

Comments
 (0)