Skip to content

Commit ac64553

Browse files
authored
Merge pull request #237 from Kotlin/access-api-docs
Cleaned up Access APIs docs
2 parents 332f5a8 + 31de83b commit ac64553

File tree

1 file changed

+69
-47
lines changed

1 file changed

+69
-47
lines changed

docs/StardustDocs/topics/apiLevels.md

Lines changed: 69 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -2,55 +2,42 @@
22

33
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.ApiLevels-->
44

5-
By nature data frames are dynamic objects, column labels depend on input source and also new columns could be added or deleted while wrangling. Kotlin in contrast is a statically typed language and all types are defined and verified ahead of execution. That's why creating flexible, handy and at the same time, safe API to a data frame is a tricky thing.
5+
By nature data frames are dynamic objects, column labels depend on the input source and also new columns could be added
6+
or deleted while wrangling. Kotlin, in contrast, is a statically typed language and all types are defined and verified
7+
ahead of execution. That's why creating a flexible, handy, and, at the same time, safe API to a data frame is tricky.
68

7-
In `Kotlin Dataframe` we provide four different ways to access data, and while they are essentially different, they look pretty similar in data wrangling DSL.
9+
In `Kotlin DataFrame` we provide four different ways to access columns, and, while they are essentially different, they
10+
look pretty similar in the data wrangling DSL.
811

912
## List of Access APIs
10-
Here's a list of all API's in the order of increasing their safeness.
13+
14+
Here's a list of all APIs in order of increasing safety.
1115

1216
* [**String API**](stringApi.md) <br/>
13-
Columns accessed by `string` representing their name. Type-checking is on runtime, name-checking is also on runtime.
17+
Columns are accessed by `string` representing their name. Type-checking is done at runtime, name-checking too.
1418

15-
* [**Column Accessors API**](columnAccessorsApi.md) <br />
16-
Every column has a descriptor, a variable that representing its name and type.
19+
* [**Column Accessors API**](columnAccessorsApi.md) <br/>
20+
Every column has a descriptor; a variable that represents its name and type.
1721

18-
* [**`KProperty` Accessors API**](KPropertiesApi.md) <br />
19-
Columns accessed by [`KProperty`](https://kotlinlang.org/docs/reflection.html#property-references) of some class. The name and type of column should match the name and type of property
22+
* [**KProperties API**](KPropertiesApi.md) <br/>
23+
Columns accessed by the [`KProperty`](https://kotlinlang.org/docs/reflection.html#property-references) of some class.
24+
The name and type of column should match the name and type of property, respectively.
2025

21-
* [**Extension properties API**](extensionPropertiesApi.md)
22-
Extension access properties are generating based on dataframe schema. Name and type of properties infers from name and type of corresponding columns.
26+
* [**Extension Properties API**](extensionPropertiesApi.md)
27+
Extension access properties are generated based on the dataframe schema. The name and type of properties are inferred
28+
from the name and type of the corresponding columns.
2329

2430
## Example
25-
Here's an example of how the same operations can be performed via different access APIs
31+
32+
Here's an example of how the same operations can be performed via different Access APIs:
2633

2734
<note>
28-
In the most of the code snippets in this documentation there's a tab selector that allows switching across access APIs
35+
In the most of the code snippets in this documentation there's a tab selector that allows switching across Access APIs.
2936
</note>
3037

3138
<tabs>
32-
<tab title = "Generated Properties">
33-
34-
<!---FUN extensionProperties1-->
35-
36-
```kotlin
37-
val df = DataFrame.read("titanic.csv")
38-
```
39-
40-
<!---END-->
41-
42-
<!---FUN extensionProperties2-->
43-
44-
```kotlin
45-
df.add("lastName") { name.split(",").last() }
46-
.dropNulls { age }
47-
.filter { survived && home.endsWith("NY") && age in 10..20 }
48-
```
49-
50-
<!---END-->
5139

52-
</tab>
53-
<tab title="Strings">
40+
<tab title="String API">
5441

5542
<!---FUN strings-->
5643

@@ -68,8 +55,9 @@ DataFrame.read("titanic.csv")
6855
<!---END-->
6956

7057
</tab>
71-
<tab title="Accessors">
72-
58+
59+
<tab title="Column Accessors API">
60+
7361
<!---FUN accessors3-->
7462

7563
```kotlin
@@ -88,7 +76,8 @@ DataFrame.read("titanic.csv")
8876
<!---END-->
8977

9078
</tab>
91-
<tab title = "KProperties">
79+
80+
<tab title = "KProperties API">
9281

9382
<!---FUN kproperties1-->
9483

@@ -114,21 +103,54 @@ val passengers = DataFrame.read("titanic.csv")
114103
<!---END-->
115104

116105
</tab>
106+
107+
<tab title = "Extension Properties API">
108+
109+
<!---FUN extensionProperties1-->
110+
111+
```kotlin
112+
val df = DataFrame.read("titanic.csv")
113+
```
114+
115+
<!---END-->
116+
117+
<!---FUN extensionProperties2-->
118+
119+
```kotlin
120+
df.add("lastName") { name.split(",").last() }
121+
.dropNulls { age }
122+
.filter { survived && home.endsWith("NY") && age in 10..20 }
123+
```
124+
125+
<!---END-->
126+
127+
</tab>
128+
117129
</tabs>
118130

119-
# Comparing of APIs
120-
[String API](stringApi.md) is the simplest one and the most unsafe of all. The main advantage of it is that it can be used at any time, including accessing new columns in chain calls. So we can write something like:
131+
# Comparing the APIs
132+
133+
The [String API](stringApi.md) is the simplest and unsafest of them all. The main advantage of it is that it can be
134+
used at any time, including when accessing new columns in chain calls. So we can write something like:
135+
121136
```kotlin
122137
df.add("weight") { ... } // add a new column `weight`, calculated by some expression
123-
.sortBy("weight") // sorting dataframe rows by its value
138+
.sortBy("weight") // sorting dataframe rows by its value
124139
```
125-
So we don't need to interrupt a method chain and declare a column accessor or generate new properties.
126140

127-
In contrast, generated [extension properties](extensionPropertiesApi.md) are the most convenient and safe API. Using it you can be always sure that you work with correct data and types. But its bottleneck — the moment of generation. To get new extension properties you have to run a cell in a notebook, which could lead to unnecessary variable declarations. Currently, we are working on compiler a plugin that generates these properties on the fly while user typing.
141+
We don't need to interrupt a function call chain and declare a column accessor or generate new properties.
142+
143+
In contrast, generated [extension properties](extensionPropertiesApi.md) are the most convenient and the safest API.
144+
Using it, you can always be sure that you work with correct data and types. But its bottleneck is the moment of generation.
145+
To get new extension properties you have to run a cell in a notebook, which could lead to unnecessary variable declarations.
146+
Currently, we are working on compiler a plugin that generates these properties on the fly while typing!
128147

129-
[Column Accessors API](columnAccessorsApi.md) is a kind of trade-off between safeness and ahead of the execution type declaration. It was designed to write code in IDE without notebook experience. It provides type-safe access to columns but doesn't ensure that the columns really exist in a particular dataframe.
148+
The [Column Accessors API](columnAccessorsApi.md) is a kind of trade-off between safety and needs to be written ahead of
149+
the execution type declaration. It was designed to better be able to write code in an IDE without a notebook experience.
150+
It provides type-safe access to columns but doesn't ensure that the columns really exist in a particular dataframe.
130151

131-
[`KProperty` based API](KPropertiesApi.md) is useful when you have already declared classed in application business logic with fields that correspond columns of dataframe.
152+
The [KProperties API](KPropertiesApi.md) is useful when you already have declared classed in your application business
153+
logic with fields that correspond columns of dataframe.
132154

133155
<table>
134156
<tr>
@@ -138,25 +160,25 @@ In contrast, generated [extension properties](extensionPropertiesApi.md) are the
138160
<td> Column existence checking </td>
139161
</tr>
140162
<tr>
141-
<td> Strings </td>
163+
<td> String API </td>
142164
<td> Runtime </td>
143165
<td> Runtime </td>
144166
<td> Runtime </td>
145167
</tr>
146168
<tr>
147-
<td> Column Accessors </td>
169+
<td> Column Accessors API </td>
148170
<td> Compile-time </td>
149171
<td> Compile-time </td>
150172
<td> Runtime </td>
151173
</tr>
152174
<tr>
153-
<td> `KProperty` Accessors </td>
175+
<td> KProperties API </td>
154176
<td> Compile-time </td>
155177
<td> Compile-time </td>
156178
<td> Runtime </td>
157179
</tr>
158180
<tr>
159-
<td> Extension Properties Accessors </td>
181+
<td> Extension Properties API </td>
160182
<td> Generation-time </td>
161183
<td> Generation-time </td>
162184
<td> Generation-time </td>

0 commit comments

Comments
 (0)