|
1 | 1 | [//]: # (title: Extension Properties API)
|
2 | 2 |
|
3 |
| -<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.ApiLevels--> |
4 |
| - |
5 |
| -Auto-generated extension properties are the safest and easiest way to access columns in a [`DataFrame`](DataFrame.md). |
6 |
| -They are generated based on a [dataframe schema](schemas.md), |
| 3 | +When working with a DataFrame, the most convenient and reliable way |
| 4 | +to access its columns — including for operations and retrieving column values |
| 5 | +in row expressions — is through auto-generated extension properties. |
| 6 | +They are generated based on a [dataframe schema](schemas.md), |
7 | 7 | with the name and type of properties inferred from the name and type of the corresponding columns.
|
| 8 | +It also works for all types of hierarchical dataframes |
| 9 | + |
| 10 | +> The behavior of data schema generation differs between the |
| 11 | +> [Compiler Plugin](Compiler-Plugin.md) and [Kotlin Notebook](gettingStartedKotlinNotebook.md). |
| 12 | +> |
| 13 | +> * In the **Kotlin Notebook**, a schema is generated *only after cell execution* for |
| 14 | +> `DataFrame` variables defined within that cell. |
| 15 | +> * With the **Compiler Plugin**, a new schema is generated *after every operation* |
| 16 | +> — but support for all operations is still in progress. |
| 17 | +> Retrieving the schema for `DataFrame` read from a file or URL is *not yet supported* either. |
| 18 | +> |
| 19 | +> This behavior may change in future releases. See the [example](#example) below that demonstrates these differences. |
| 20 | +{style="warning"} |
| 21 | + |
| 22 | +## Example |
| 23 | + |
| 24 | +Consider |
| 25 | +<resource src="example.csv"></resource>. |
| 26 | +This table consists of two columns: `name`, which is a `String` column, and `info`, |
| 27 | +which is a **column group** containing two nested value columns — |
| 28 | +`age` of type `Int`, and `height` of type `Double`. |
| 29 | + |
| 30 | +<table> |
| 31 | + <thead> |
| 32 | + <tr> |
| 33 | + <th>name</th> |
| 34 | + <th colspan="2">info</th> |
| 35 | + </tr> |
| 36 | + <tr> |
| 37 | + <th></th> |
| 38 | + <th>age</th> |
| 39 | + <th>height</th> |
| 40 | + </tr> |
| 41 | + </thead> |
| 42 | + <tbody> |
| 43 | + <tr> |
| 44 | + <td>Alice</td> |
| 45 | + <td>23</td> |
| 46 | + <td>175.5</td> |
| 47 | + </tr> |
| 48 | + <tr> |
| 49 | + <td>Bob</td> |
| 50 | + <td>27</td> |
| 51 | + <td>160.2</td> |
| 52 | + </tr> |
| 53 | + </tbody> |
| 54 | +</table> |
| 55 | + |
| 56 | +<tabs> |
| 57 | +<tab title="Kotlin Notebook"> |
| 58 | +Read the `DataFrame` from the CSV file: |
| 59 | + |
| 60 | +```kotlin |
| 61 | +val df = DataFrame.readCsv("example.csv") |
| 62 | +``` |
| 63 | + |
| 64 | +*After cell execution* data schema and extensions for this `DataFrame` will be generated |
| 65 | +so you can use extensions for accessing columns, |
| 66 | +using it in operations inside the [Column Selector DSL](ColumnSelectors.md) |
| 67 | +and [DataRow API](DataRow.md): |
| 68 | + |
| 69 | + |
| 70 | +```kotlin |
| 71 | +// Get nested column |
| 72 | +df.info.age |
| 73 | +// Sort by multiple columns |
| 74 | +df.sortBy { name and info.height } |
| 75 | +// Filter rows using a row condition. |
| 76 | +// These extensions express the exact value in the row |
| 77 | +// with the corresponding type: |
| 78 | +df.filter { name.startsWith("A") && info.age >= 16 } |
| 79 | +``` |
| 80 | + |
| 81 | +If you change DataFrame schema by changing any column [name](rename.md) |
| 82 | +or [type](convert.md), or [add](add.md) a new one, you need to |
| 83 | +run a cell with a new DataFrame declaration first. |
| 84 | +For example, rename the "name" column into "firstName": |
| 85 | + |
| 86 | +```kotlin |
| 87 | +val dfRenamed = df.rename { name }.into("firstName") |
| 88 | +``` |
| 89 | + |
| 90 | +After running the cell with the code above, you can use `firstName` extensions in the following cells: |
8 | 91 |
|
9 |
| -Having these, it allows you to work with your dataframe like: |
10 | 92 | ```kotlin
|
11 |
| -val peopleDf /* : DataFrame<Person> */ = DataFrame.read("people.csv").cast<Person>() |
12 |
| -val nameColumn /* : DataColumn<String> */ = peopleDf.name |
13 |
| -val ageColumn /* : DataColumn<Int> */ = peopleDf.personData.age |
| 93 | +dfRenamed.firstName |
| 94 | +dfRenamed.rename { firstName }.into("name") |
| 95 | +dfRenamed.filter { firstName == "Nikita" } |
14 | 96 | ```
|
15 |
| -and of course |
| 97 | + |
| 98 | +See [](quickstart.md) in the Kotlin Notebook with basic Extension Properties API examples. |
| 99 | + |
| 100 | +</tab> |
| 101 | +<tab title="Compiler Plugin"> |
| 102 | + |
| 103 | +For now, if you read `DatFrame` from a file or URL, you need to define its schema manually. |
| 104 | +You can do it fast with [`generate..()` methods](DataSchema-Data-Classes-Generation.md). |
| 105 | + |
| 106 | +Define schemas: |
| 107 | +```kotlin |
| 108 | +@DataSchema |
| 109 | +data class PersonInfo( |
| 110 | + val age: Int, |
| 111 | + val height: Float |
| 112 | +) |
| 113 | + |
| 114 | +@DataSchema |
| 115 | +data class Person( |
| 116 | + val info: PersonInfo, |
| 117 | + val name: String |
| 118 | +) |
| 119 | +``` |
| 120 | + |
| 121 | +Read the `DataFrame` from the CSV file and specify the schema with `convertTo`: |
| 122 | + |
| 123 | +```kotlin |
| 124 | +val df = DataFrame.readCsv("example.csv").convertTo<Person>() |
| 125 | +``` |
| 126 | + |
| 127 | +Extensions for this `DataFrame` will be generated automatically by plugin, |
| 128 | +so you can use extensions for accessing columns, |
| 129 | +using it in operations inside the [Column Selector DSL](ColumnSelectors.md) |
| 130 | +and [DataRow API](DataRow.md). |
| 131 | + |
| 132 | + |
| 133 | +```kotlin |
| 134 | +// Get nested column |
| 135 | +df.info.age |
| 136 | +// Sort by multiple columns |
| 137 | +df.sortBy { name and info.height } |
| 138 | +// Filter rows using a row condition. |
| 139 | +// These extensions express the exact value in the row |
| 140 | +// with the corresponding type: |
| 141 | +df.filter { name.startsWith("A") && info.age >= 16 } |
| 142 | +``` |
| 143 | + |
| 144 | +Moreover, new extensions will be generated on-the-fly after each schema change: |
| 145 | +by changing any column [name](rename.md) |
| 146 | +or [type](convert.md), or [add](add.md) a new one. |
| 147 | +For example, rename the "name" column into "firstName" and then we can use `firstName` extensions |
| 148 | +in the following operations: |
| 149 | + |
16 | 150 | ```kotlin
|
17 |
| -peopleDf.add("lastName") { name.split(",").last() } |
18 |
| - .dropNulls { personData.age } |
19 |
| - .filter { survived && home.endsWith("NY") && personData.age in 10..20 } |
| 151 | +// Rename "name" column into "firstName" |
| 152 | +df.rename { name }.into("firstName") |
| 153 | + // Can use `firstName` extension in the row condition |
| 154 | + // right after renaming |
| 155 | + .filter { firstName == "Nikita" } |
20 | 156 | ```
|
21 | 157 |
|
22 |
| -To find out how to use this API in your environment, check out [Working with Data Schemas](schemas.md) |
23 |
| -or jump straight to [Data Schemas in Gradle projects](schemasGradle.md), |
24 |
| -or [Data Schemas in Jupyter notebooks](schemasJupyter.md). |
| 158 | +See [Kotlin DataFrame Compiler Plugin Example](https://github.com/Kotlin/dataframe/tree/plugin_example/examples/kotlin-dataframe-plugin-example) |
| 159 | +IDEA project with basic Extension Properties API examples. |
| 160 | +</tab> |
| 161 | +</tabs> |
0 commit comments