Skip to content

Commit 08570f9

Browse files
committed
small clarification of parsing docs
1 parent 1d12cba commit 08570f9

File tree

1 file changed

+18
-11
lines changed

1 file changed

+18
-11
lines changed

docs/StardustDocs/topics/parse.md

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -49,21 +49,26 @@ df.parse { age and weight }
4949

5050
DataFrame supports multiple parser options that can be used to customize the parsing behavior.
5151
These can be supplied to the `parse` function (or any other function that can implicitly parse `Strings`)
52-
as an argument:
52+
as an argument.
53+
54+
For each option you don't supply (or supply `null`) DataFrame will take the value from the
55+
[Global Parser Options](#global-parser-options).
5356

5457
Available parser options:
5558
* `locale: Locale` is used to [parse doubles](#parsing-doubles)
56-
* Default locale is `Locale.getDefault()`
59+
* Global default locale is `Locale.getDefault()`
5760
* `dateTimePattern: String` is used to parse date and time
61+
* Global default supports ISO (local) date-time
5862
* `dateTimeFormatter: DateTimeFormatter` is used to parse date and time
63+
* Is derived from `dateTimePattern` and/or `locale` if `null`
5964
* `nullStrings: List<String>` is used to treat particular strings as `null` value
60-
* Default null strings are **"null"** and **"NULL"**
65+
* Global default null strings are **"null"** and **"NULL"**
6166
* When [reading from CSV](read.md), we include even more defaults, like **""**, and **"NA"**.
6267
See the KDocs there for the exact details
6368
* `skipTypes: Set<KType>` types that should be skipped during parsing
64-
* Empty set by default; parsing can result in any supported type
69+
* Empty set by global default; parsing can result in any supported type
6570
* `useFastDoubleParser: Boolean` is used to enable or disable the [new fast double parser](#parsing-doubles)
66-
* Enabled by default
71+
* Enabled by global default
6772

6873
<!---FUN parseWithOptions-->
6974

@@ -76,11 +81,12 @@ df.parse(options = ParserOptions(locale = Locale.CHINA, dateTimeFormatter = Date
7681

7782
### Global Parser Options
7883

79-
You can also set global parser options that will be used by default in [`read`](read.md), [`convert`](convert.md),
80-
and other `parse` operations.
81-
These can be seen as a global fallback for the `parserOptions` argument.
84+
As mentioned before, you can change the default global parser options that will be used by [`read`](read.md),
85+
[`convert`](convert.md), and other `parse` operations.
86+
Whenever you don't explicitly provide [parser options](#parser-options) to a function call,
87+
DataFrame will use these global options instead.
8288

83-
For example, to change the locale to French and add a custom date-time pattern:
89+
For example, to change the locale to French and add a custom date-time pattern for all following DataFrame calls, do:
8490

8591
<!---FUN globalParserOptions-->
8692

@@ -89,7 +95,7 @@ DataFrame.parser.locale = Locale.FRANCE
8995
DataFrame.parser.addDateTimePattern("dd.MM.uuuu HH:mm:ss")
9096
```
9197

92-
This means that the locale being used by the parser is defined as:
98+
For `locale`, this means that the one being used by the parser is defined as:
9399

94100
↪ The locale given as function argument directly, or in `parserOptions`, if it is not `null`, else
95101

@@ -104,7 +110,8 @@ It is based on [the FastDoubleParser library](https://github.com/wrandelshofer/F
104110
high performance and configurability
105111
(in the future, we might expand this support to `Float`, `BigDecimal`, and `BigInteger` as well).
106112

107-
The parser is locale-aware; it will use the locale set by the [parser options](#parser-options) to parse the doubles.
113+
The parser is locale-aware; it will use the locale set by the
114+
[(global)](#global-parser-options) [parser options](#parser-options) to parse the doubles.
108115
It also has a fallback mechanism built in, meaning it can recognize characters from
109116
all other locales (and some from [Wikipedia](https://en.wikipedia.org/wiki/Decimal_separator))
110117
and parse them correctly as long as they don't conflict with the current locale.

0 commit comments

Comments
 (0)