Skip to content

Commit 71c3601

Browse files
improve extensionPropertiesApi.md
1 parent 40ed8c0 commit 71c3601

File tree

2 files changed

+155
-15
lines changed

2 files changed

+155
-15
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
name,info
2+
Alice,"{""age"":23,""height"":175.5}"
3+
Bob,"{""age"":27,""height"":160.2}"
Lines changed: 152 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,161 @@
11
[//]: # (title: Extension Properties API)
22

3-
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.ApiLevels-->
4-
5-
Auto-generated extension properties are the safest and easiest way to access columns in a [`DataFrame`](DataFrame.md).
6-
They are generated based on a [dataframe schema](schemas.md),
3+
When working with a DataFrame, the most convenient and reliable way
4+
to access its columns — including for operations and retrieving column values
5+
in row expressions — is through auto-generated extension properties.
6+
They are generated based on a [dataframe schema](schemas.md),
77
with the name and type of properties inferred from the name and type of the corresponding columns.
8+
It also works for all types of hierarchical dataframes
9+
10+
> The behavior of data schema generation differs between the
11+
> [Compiler Plugin](Compiler-Plugin.md) and [Kotlin Notebook](gettingStartedKotlinNotebook.md).
12+
>
13+
> * In the **Kotlin Notebook**, a schema is generated *only after cell execution* for
14+
> `DataFrame` variables defined within that cell.
15+
> * With the **Compiler Plugin**, a new schema is generated *after every operation*
16+
> — but support for all operations is still in progress.
17+
> Retrieving the schema for `DataFrame` read from a file or URL is *not yet supported* either.
18+
>
19+
> This behavior may change in future releases. See the [example](#example) below that demonstrates these differences.
20+
{style="warning"}
21+
22+
## Example
23+
24+
Consider
25+
<resource src="example.csv"></resource>.
26+
This table consists of two columns: `name`, which is a `String` column, and `info`,
27+
which is a **column group** containing two nested value columns —
28+
`age` of type `Int`, and `height` of type `Double`.
29+
30+
<table>
31+
<thead>
32+
<tr>
33+
<th>name</th>
34+
<th colspan="2">info</th>
35+
</tr>
36+
<tr>
37+
<th></th>
38+
<th>age</th>
39+
<th>height</th>
40+
</tr>
41+
</thead>
42+
<tbody>
43+
<tr>
44+
<td>Alice</td>
45+
<td>23</td>
46+
<td>175.5</td>
47+
</tr>
48+
<tr>
49+
<td>Bob</td>
50+
<td>27</td>
51+
<td>160.2</td>
52+
</tr>
53+
</tbody>
54+
</table>
55+
56+
<tabs>
57+
<tab title="Kotlin Notebook">
58+
Read the `DataFrame` from the CSV file:
59+
60+
```kotlin
61+
val df = DataFrame.readCsv("example.csv")
62+
```
63+
64+
*After cell execution* data schema and extensions for this `DataFrame` will be generated
65+
so you can use extensions for accessing columns,
66+
using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
67+
and [DataRow API](DataRow.md):
68+
69+
70+
```kotlin
71+
// Get nested column
72+
df.info.age
73+
// Sort by multiple columns
74+
df.sortBy { name and info.height }
75+
// Filter rows using a row condition.
76+
// These extensions express the exact value in the row
77+
// with the corresponding type:
78+
df.filter { name.startsWith("A") && info.age >= 16 }
79+
```
80+
81+
If you change DataFrame schema by changing any column [name](rename.md)
82+
or [type](convert.md), or [add](add.md) a new one, you need to
83+
run a cell with a new DataFrame declaration first.
84+
For example, rename the "name" column into "firstName":
85+
86+
```kotlin
87+
val dfRenamed = df.rename { name }.into("firstName")
88+
```
89+
90+
After running the cell with the code above, you can use `firstName` extensions in the following cells:
891

9-
Having these, it allows you to work with your dataframe like:
1092
```kotlin
11-
val peopleDf /* : DataFrame<Person> */ = DataFrame.read("people.csv").cast<Person>()
12-
val nameColumn /* : DataColumn<String> */ = peopleDf.name
13-
val ageColumn /* : DataColumn<Int> */ = peopleDf.personData.age
93+
dfRenamed.firstName
94+
dfRenamed.rename { firstName }.into("name")
95+
dfRenamed.filter { firstName == "Nikita" }
1496
```
15-
and of course
97+
98+
See [](quickstart.md) in the Kotlin Notebook with basic Extension Properties API examples.
99+
100+
</tab>
101+
<tab title="Compiler Plugin">
102+
103+
For now, if you read `DatFrame` from a file or URL, you need to define its schema manually.
104+
You can do it fast with [`generate..()` methods](DataSchema-Data-Classes-Generation.md).
105+
106+
Define schemas:
107+
```kotlin
108+
@DataSchema
109+
data class PersonInfo(
110+
val age: Int,
111+
val height: Float
112+
)
113+
114+
@DataSchema
115+
data class Person(
116+
val info: PersonInfo,
117+
val name: String
118+
)
119+
```
120+
121+
Read the `DataFrame` from the CSV file and specify the schema with `convertTo`:
122+
123+
```kotlin
124+
val df = DataFrame.readCsv("example.csv").convertTo<Person>()
125+
```
126+
127+
Extensions for this `DataFrame` will be generated automatically by plugin,
128+
so you can use extensions for accessing columns,
129+
using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
130+
and [DataRow API](DataRow.md).
131+
132+
133+
```kotlin
134+
// Get nested column
135+
df.info.age
136+
// Sort by multiple columns
137+
df.sortBy { name and info.height }
138+
// Filter rows using a row condition.
139+
// These extensions express the exact value in the row
140+
// with the corresponding type:
141+
df.filter { name.startsWith("A") && info.age >= 16 }
142+
```
143+
144+
Moreover, new extensions will be generated on-the-fly after each schema change:
145+
by changing any column [name](rename.md)
146+
or [type](convert.md), or [add](add.md) a new one.
147+
For example, rename the "name" column into "firstName" and then we can use `firstName` extensions
148+
in the following operations:
149+
16150
```kotlin
17-
peopleDf.add("lastName") { name.split(",").last() }
18-
.dropNulls { personData.age }
19-
.filter { survived && home.endsWith("NY") && personData.age in 10..20 }
151+
// Rename "name" column into "firstName"
152+
df.rename { name }.into("firstName")
153+
// Can use `firstName` extension in the row condition
154+
// right after renaming
155+
.filter { firstName == "Nikita" }
20156
```
21157

22-
To find out how to use this API in your environment, check out [Working with Data Schemas](schemas.md)
23-
or jump straight to [Data Schemas in Gradle projects](schemasGradle.md),
24-
or [Data Schemas in Jupyter notebooks](schemasJupyter.md).
158+
See [Kotlin DataFrame Compiler Plugin Example](https://github.com/Kotlin/dataframe/tree/plugin_example/examples/kotlin-dataframe-plugin-example)
159+
IDEA project with basic Extension Properties API examples.
160+
</tab>
161+
</tabs>

0 commit comments

Comments
 (0)