Skip to content

Commit 892a4c2

Browse files
authored
Merge pull request #189 from Kotlin/update-csv-read-docs
update csv read docs with ways to read locale specific numbers
2 parents cf7df72 + fe44130 commit 892a4c2

File tree

7 files changed

+79
-15
lines changed

7 files changed

+79
-15
lines changed

docs/StardustDocs/topics/KPropertiesApi.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ data class Passenger(
1515
val lastName: String
1616
)
1717

18-
val passengers = DataFrame.read("titanic.csv", delimiter = ';')
18+
val passengers = DataFrame.read("titanic.csv")
1919
.add(Passenger::lastName) { "name"<String>().split(",").last() }
2020
.dropNulls(Passenger::age)
2121
.filter {
@@ -39,7 +39,7 @@ data class Passenger(
3939
val name: String
4040
)
4141

42-
val passengers = DataFrame.read("titanic.csv", delimiter = ';')
42+
val passengers = DataFrame.read("titanic.csv")
4343
.filter { it.get(Passenger::city).endsWith("NY") }
4444
.toListOf<Passenger>()
4545
```

docs/StardustDocs/topics/apiLevels.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ In the most of the code snippets in this documentation there's a tab selector th
3434
<!---FUN extensionProperties1-->
3535

3636
```kotlin
37-
val df = DataFrame.read("titanic.csv", delimiter = ';')
37+
val df = DataFrame.read("titanic.csv")
3838
```
3939

4040
<!---END-->
@@ -55,7 +55,7 @@ df.add("lastName") { name.split(",").last() }
5555
<!---FUN strings-->
5656

5757
```kotlin
58-
DataFrame.read("titanic.csv", delimiter = ';')
58+
DataFrame.read("titanic.csv")
5959
.add("lastName") { "name"<String>().split(",").last() }
6060
.dropNulls("age")
6161
.filter {
@@ -79,7 +79,7 @@ val age by column<Int?>()
7979
val name by column<String>()
8080
val lastName by column<String>()
8181

82-
DataFrame.read("titanic.csv", delimiter = ';')
82+
DataFrame.read("titanic.csv")
8383
.add(lastName) { name().split(",").last() }
8484
.dropNulls { age }
8585
.filter { survived() && home().endsWith("NY") && age()!! in 10..20 }
@@ -100,7 +100,7 @@ data class Passenger(
100100
val lastName: String
101101
)
102102

103-
val passengers = DataFrame.read("titanic.csv", delimiter = ';')
103+
val passengers = DataFrame.read("titanic.csv")
104104
.add(Passenger::lastName) { "name"<String>().split(",").last() }
105105
.dropNulls(Passenger::age)
106106
.filter {

docs/StardustDocs/topics/columnAccessorsApi.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Now columns can be accessed in a type-safe way using `invoke` operator:
2121
<!---FUN accessors2-->
2222

2323
```kotlin
24-
DataFrame.read("titanic.csv", delimiter = ';')
24+
DataFrame.read("titanic.csv")
2525
.add(lastName) { name().split(",").last() }
2626
.dropNulls { age }
2727
.filter { survived() && home().endsWith("NY") && age()!! in 10..20 }

docs/StardustDocs/topics/extensionPropertiesApi.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ When `DataFrame` is used within Jupyter Notebooks or Datalore with Kotlin Kernel
77
<!---FUN extensionProperties1-->
88

99
```kotlin
10-
val df = DataFrame.read("titanic.csv", delimiter = ';')
10+
val df = DataFrame.read("titanic.csv")
1111
```
1212

1313
<!---END-->

docs/StardustDocs/topics/read.md

Lines changed: 46 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ DataFrame.read("input.csv")
1111

1212
Input string can be a file path or URL.
1313

14-
### Reading CSV
14+
## Reading CSV
1515

1616
All these calls are valid:
1717

@@ -92,7 +92,47 @@ G: *
9292
D: Int
9393
```
9494

95-
### Reading JSON
95+
### Dealing with locale specific numbers
96+
97+
Sometimes columns in your CSV can be interpreted differently depending on your system locale.
98+
99+
<table>
100+
<tr><th>numbers</th></tr>
101+
<tr><td>12,123</td></tr>
102+
<tr><td>41,111</td></tr>
103+
</table>
104+
105+
Here comma can be decimal or thousands separator, thus different values.
106+
You can deal with it in two ways
107+
108+
1) Provide locale as a parser option
109+
110+
<!---FUN readNumbersWithSpecificLocale-->
111+
112+
```kotlin
113+
val df = DataFrame.readCSV(
114+
file,
115+
parserOptions = ParserOptions(locale = Locale.UK),
116+
)
117+
```
118+
119+
<!---END-->
120+
121+
2) Disable type inference for specific column and convert it yourself
122+
123+
<!---FUN readNumbersWithColType-->
124+
125+
```kotlin
126+
val df = DataFrame.readCSV(
127+
file,
128+
colTypes = mapOf("colName" to ColType.String)
129+
)
130+
```
131+
132+
<!---END-->
133+
134+
135+
## Reading JSON
96136

97137
Basics for reading JSONs are the same: you can read from file or from remote URL.
98138

@@ -160,7 +200,7 @@ D: Boolean?
160200
Column A has `String` type because all values are string literals, no implicit conversion is performed. Column C
161201
has `Number` type because it's the least common type for `Int` and `Double`.
162202

163-
#### JSON Reading Options: Type Clash Tactic
203+
### JSON Reading Options: Type Clash Tactic
164204

165205
By default, if a type clash occurs when reading JSON, a new column group is created consisting of: "value", "array", and
166206
any number of object properties:
@@ -211,7 +251,7 @@ For this case, you can set `typeClashTactic = JSON.TypeClashTactic.ANY_COLUMNS`
211251

212252
This option is also possible to set in the Gradle- and KSP plugin by providing `jsonOptions`.
213253

214-
#### JSON Reading Options: Key/Value Paths
254+
### JSON Reading Options: Key/Value Paths
215255

216256
If you have some JSON looking like
217257

@@ -302,7 +342,7 @@ Only the bracket notation of json path is supported, as well as just double quot
302342

303343
For more examples, see the "examples/json" module.
304344

305-
### Reading Excel
345+
## Reading Excel
306346

307347
Add dependency:
308348

@@ -347,7 +387,7 @@ df1["IDS"].type() shouldBe typeOf<String>()
347387

348388
<!---END-->
349389

350-
### Reading Apache Arrow formats
390+
## Reading Apache Arrow formats
351391

352392
Add dependency:
353393

docs/StardustDocs/topics/stringApi.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ String column names are the easiest way to access data in DataFrame:
77
<!---FUN strings-->
88

99
```kotlin
10-
DataFrame.read("titanic.csv", delimiter = ';')
10+
DataFrame.read("titanic.csv")
1111
.add("lastName") { "name"<String>().split(",").last() }
1212
.dropNulls("age")
1313
.filter {

tests/src/test/kotlin/org/jetbrains/kotlinx/dataframe/samples/api/Read.kt

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,15 @@ import org.jetbrains.kotlinx.dataframe.api.columnTypes
1010
import org.jetbrains.kotlinx.dataframe.api.convert
1111
import org.jetbrains.kotlinx.dataframe.api.dataFrameOf
1212
import org.jetbrains.kotlinx.dataframe.api.with
13+
import org.jetbrains.kotlinx.dataframe.io.ColType
1314
import org.jetbrains.kotlinx.dataframe.io.readArrowFeather
1415
import org.jetbrains.kotlinx.dataframe.io.readCSV
1516
import org.jetbrains.kotlinx.dataframe.io.readJson
1617
import org.jetbrains.kotlinx.dataframe.testArrowFeather
1718
import org.jetbrains.kotlinx.dataframe.testCsv
1819
import org.jetbrains.kotlinx.dataframe.testJson
1920
import org.junit.Test
21+
import java.util.*
2022
import kotlin.reflect.typeOf
2123

2224
class Read {
@@ -84,4 +86,26 @@ class Read {
8486
df.rowsCount() shouldBe 1
8587
df.columnsCount() shouldBe 4
8688
}
89+
90+
@Test
91+
fun readNumbersWithSpecificLocale() {
92+
val file = testCsv("numbers.csv")
93+
// SampleStart
94+
val df = DataFrame.readCSV(
95+
file,
96+
parserOptions = ParserOptions(locale = Locale.UK),
97+
)
98+
// SampleEnd
99+
}
100+
101+
@Test
102+
fun readNumbersWithColType() {
103+
val file = testCsv("numbers.csv")
104+
// SampleStart
105+
val df = DataFrame.readCSV(
106+
file,
107+
colTypes = mapOf("colName" to ColType.String)
108+
)
109+
// SampleEnd
110+
}
87111
}

0 commit comments

Comments
 (0)