Skip to content

Commit fe44130

Browse files
committed
update csv read docs with ways to read locale specific numbers
1 parent 15545cf commit fe44130

File tree

2 files changed

+70
-6
lines changed
  • docs/StardustDocs/topics
  • tests/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api

2 files changed

+70
-6
lines changed

docs/StardustDocs/topics/read.md

Lines changed: 46 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ DataFrame.read("input.csv")
1111

1212
Input string can be a file path or URL.
1313

14-
### Reading CSV
14+
## Reading CSV
1515

1616
All these calls are valid:
1717

@@ -92,7 +92,47 @@ G: *
9292
D: Int
9393
```
9494

95-
### Reading JSON
95+
### Dealing with locale specific numbers
96+
97+
Sometimes columns in your CSV can be interpreted differently depending on your system locale.
98+
99+
<table>
100+
<tr><th>numbers</th></tr>
101+
<tr><td>12,123</td></tr>
102+
<tr><td>41,111</td></tr>
103+
</table>
104+
105+
Here comma can be decimal or thousands separator, thus different values.
106+
You can deal with it in two ways
107+
108+
1) Provide locale as a parser option
109+
110+
<!---FUN readNumbersWithSpecificLocale-->
111+
112+
```kotlin
113+
val df = DataFrame.readCSV(
114+
file,
115+
parserOptions = ParserOptions(locale = Locale.UK),
116+
)
117+
```
118+
119+
<!---END-->
120+
121+
2) Disable type inference for specific column and convert it yourself
122+
123+
<!---FUN readNumbersWithColType-->
124+
125+
```kotlin
126+
val df = DataFrame.readCSV(
127+
file,
128+
colTypes = mapOf("colName" to ColType.String)
129+
)
130+
```
131+
132+
<!---END-->
133+
134+
135+
## Reading JSON
96136

97137
Basics for reading JSONs are the same: you can read from file or from remote URL.
98138

@@ -160,7 +200,7 @@ D: Boolean?
160200
Column A has `String` type because all values are string literals, no implicit conversion is performed. Column C
161201
has `Number` type because it's the least common type for `Int` and `Double`.
162202

163-
#### JSON Reading Options: Type Clash Tactic
203+
### JSON Reading Options: Type Clash Tactic
164204

165205
By default, if a type clash occurs when reading JSON, a new column group is created consisting of: "value", "array", and
166206
any number of object properties:
@@ -211,7 +251,7 @@ For this case, you can set `typeClashTactic = JSON.TypeClashTactic.ANY_COLUMNS`
211251

212252
This option is also possible to set in the Gradle- and KSP plugin by providing `jsonOptions`.
213253

214-
#### JSON Reading Options: Key/Value Paths
254+
### JSON Reading Options: Key/Value Paths
215255

216256
If you have some JSON looking like
217257

@@ -302,7 +342,7 @@ Only the bracket notation of json path is supported, as well as just double quot
302342

303343
For more examples, see the "examples/json" module.
304344

305-
### Reading Excel
345+
## Reading Excel
306346

307347
Add dependency:
308348

@@ -347,7 +387,7 @@ df1["IDS"].type() shouldBe typeOf<String>()
347387

348388
<!---END-->
349389

350-
### Reading Apache Arrow formats
390+
## Reading Apache Arrow formats
351391

352392
Add dependency:
353393

tests/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/Read.kt

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,15 @@ import org.jetbrains.kotlinx.dataframe.api.columnTypes
1010
import org.jetbrains.kotlinx.dataframe.api.convert
1111
import org.jetbrains.kotlinx.dataframe.api.dataFrameOf
1212
import org.jetbrains.kotlinx.dataframe.api.with
13+
import org.jetbrains.kotlinx.dataframe.io.ColType
1314
import org.jetbrains.kotlinx.dataframe.io.readArrowFeather
1415
import org.jetbrains.kotlinx.dataframe.io.readCSV
1516
import org.jetbrains.kotlinx.dataframe.io.readJson
1617
import org.jetbrains.kotlinx.dataframe.testArrowFeather
1718
import org.jetbrains.kotlinx.dataframe.testCsv
1819
import org.jetbrains.kotlinx.dataframe.testJson
1920
import org.junit.Test
21+
import java.util.*
2022
import kotlin.reflect.typeOf
2123

2224
class Read {
@@ -84,4 +86,26 @@ class Read {
8486
df.rowsCount() shouldBe 1
8587
df.columnsCount() shouldBe 4
8688
}
89+
90+
@Test
91+
fun readNumbersWithSpecificLocale() {
92+
val file = testCsv("numbers.csv")
93+
// SampleStart
94+
val df = DataFrame.readCSV(
95+
file,
96+
parserOptions = ParserOptions(locale = Locale.UK),
97+
)
98+
// SampleEnd
99+
}
100+
101+
@Test
102+
fun readNumbersWithColType() {
103+
val file = testCsv("numbers.csv")
104+
// SampleStart
105+
val df = DataFrame.readCSV(
106+
file,
107+
colTypes = mapOf("colName" to ColType.String)
108+
)
109+
// SampleEnd
110+
}
87111
}

0 commit comments

Comments
 (0)