You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By nature data frames are dynamic objects, column labels depend on input source and also new columns could be added or deleted while wrangling. Kotlin in contrast is a statically typed language and all types are defined and verified ahead of execution. That's why creating flexible, handy and at the same time, safe API to a data frame is a tricky thing.
5
+
By nature data frames are dynamic objects, column labels depend on the input source and also new columns could be added
6
+
or deleted while wrangling. Kotlin, in contrast, is a statically typed language and all types are defined and verified
7
+
ahead of execution. That's why creating a flexible, handy, and, at the same time, safe API to a data frame is tricky.
6
8
7
-
In `Kotlin Dataframe` we provide four different ways to access data, and while they are essentially different, they look pretty similar in data wrangling DSL.
9
+
In `Kotlin DataFrame` we provide four different ways to access columns, and, while they are essentially different, they
10
+
look pretty similar in the data wrangling DSL.
8
11
9
12
## List of Access APIs
10
-
Here's a list of all API's in the order of increasing their safeness.
13
+
14
+
Here's a list of all APIs in order of increasing safety.
11
15
12
16
*[**String API**](stringApi.md) <br/>
13
-
Columns accessed by `string` representing their name. Type-checking is on runtime, name-checking is also on runtime.
17
+
Columns are accessed by `string` representing their name. Type-checking is done at runtime, name-checking too.
Columns accessed by [`KProperty`](https://kotlinlang.org/docs/reflection.html#property-references) of some class. The name and type of column should match the name and type of property
22
+
*[**KProperties API**](KPropertiesApi.md) <br/>
23
+
Columns accessed by the [`KProperty`](https://kotlinlang.org/docs/reflection.html#property-references) of some class.
24
+
The name and type of column should match the name and type of property, respectively.
Extension access properties are generated based on the dataframe schema. The name and type of properties are inferred
28
+
from the name and type of the corresponding columns.
23
29
24
30
## Example
25
-
Here's an example of how the same operations can be performed via different access APIs
31
+
32
+
Here's an example of how the same operations can be performed via different Access APIs:
26
33
27
34
<note>
28
-
In the most of the code snippets in this documentation there's a tab selector that allows switching across access APIs
35
+
In the most of the code snippets in this documentation there's a tab selector that allows switching across Access APIs.
29
36
</note>
30
37
31
38
<tabs>
32
-
<tabtitle = "Generated Properties">
33
-
34
-
<!---FUN extensionProperties1-->
35
-
36
-
```kotlin
37
-
val df =DataFrame.read("titanic.csv")
38
-
```
39
-
40
-
<!---END-->
41
-
42
-
<!---FUN extensionProperties2-->
43
-
44
-
```kotlin
45
-
df.add("lastName") { name.split(",").last() }
46
-
.dropNulls { age }
47
-
.filter { survived && home.endsWith("NY") && age in10..20 }
48
-
```
49
-
50
-
<!---END-->
51
39
52
-
</tab>
53
-
<tabtitle="Strings">
40
+
<tabtitle="String API">
54
41
55
42
<!---FUN strings-->
56
43
@@ -68,8 +55,9 @@ DataFrame.read("titanic.csv")
68
55
<!---END-->
69
56
70
57
</tab>
71
-
<tabtitle="Accessors">
72
-
58
+
59
+
<tabtitle="Column Accessors API">
60
+
73
61
<!---FUN accessors3-->
74
62
75
63
```kotlin
@@ -88,7 +76,8 @@ DataFrame.read("titanic.csv")
88
76
<!---END-->
89
77
90
78
</tab>
91
-
<tabtitle = "KProperties">
79
+
80
+
<tabtitle = "KProperties API">
92
81
93
82
<!---FUN kproperties1-->
94
83
@@ -114,21 +103,54 @@ val passengers = DataFrame.read("titanic.csv")
114
103
<!---END-->
115
104
116
105
</tab>
106
+
107
+
<tabtitle = "Extension Properties API">
108
+
109
+
<!---FUN extensionProperties1-->
110
+
111
+
```kotlin
112
+
val df =DataFrame.read("titanic.csv")
113
+
```
114
+
115
+
<!---END-->
116
+
117
+
<!---FUN extensionProperties2-->
118
+
119
+
```kotlin
120
+
df.add("lastName") { name.split(",").last() }
121
+
.dropNulls { age }
122
+
.filter { survived && home.endsWith("NY") && age in10..20 }
123
+
```
124
+
125
+
<!---END-->
126
+
127
+
</tab>
128
+
117
129
</tabs>
118
130
119
-
# Comparing of APIs
120
-
[String API](stringApi.md) is the simplest one and the most unsafe of all. The main advantage of it is that it can be used at any time, including accessing new columns in chain calls. So we can write something like:
131
+
# Comparing the APIs
132
+
133
+
The [String API](stringApi.md) is the simplest and unsafest of them all. The main advantage of it is that it can be
134
+
used at any time, including when accessing new columns in chain calls. So we can write something like:
135
+
121
136
```kotlin
122
137
df.add("weight") { ... } // add a new column `weight`, calculated by some expression
123
-
.sortBy("weight") // sorting dataframe rows by its value
138
+
.sortBy("weight") // sorting dataframe rows by its value
124
139
```
125
-
So we don't need to interrupt a method chain and declare a column accessor or generate new properties.
126
140
127
-
In contrast, generated [extension properties](extensionPropertiesApi.md) are the most convenient and safe API. Using it you can be always sure that you work with correct data and types. But its bottleneck — the moment of generation. To get new extension properties you have to run a cell in a notebook, which could lead to unnecessary variable declarations. Currently, we are working on compiler a plugin that generates these properties on the fly while user typing.
141
+
We don't need to interrupt a function call chain and declare a column accessor or generate new properties.
142
+
143
+
In contrast, generated [extension properties](extensionPropertiesApi.md) are the most convenient and the safest API.
144
+
Using it, you can always be sure that you work with correct data and types. But its bottleneck is the moment of generation.
145
+
To get new extension properties you have to run a cell in a notebook, which could lead to unnecessary variable declarations.
146
+
Currently, we are working on compiler a plugin that generates these properties on the fly while typing!
128
147
129
-
[Column Accessors API](columnAccessorsApi.md) is a kind of trade-off between safeness and ahead of the execution type declaration. It was designed to write code in IDE without notebook experience. It provides type-safe access to columns but doesn't ensure that the columns really exist in a particular dataframe.
148
+
The [Column Accessors API](columnAccessorsApi.md) is a kind of trade-off between safety and needs to be written ahead of
149
+
the execution type declaration. It was designed to better be able to write code in an IDE without a notebook experience.
150
+
It provides type-safe access to columns but doesn't ensure that the columns really exist in a particular dataframe.
130
151
131
-
[`KProperty` based API](KPropertiesApi.md) is useful when you have already declared classed in application business logic with fields that correspond columns of dataframe.
152
+
The [KProperties API](KPropertiesApi.md) is useful when you already have declared classed in your application business
153
+
logic with fields that correspond columns of dataframe.
132
154
133
155
<table>
134
156
<tr>
@@ -138,25 +160,25 @@ In contrast, generated [extension properties](extensionPropertiesApi.md) are the
0 commit comments