|
1 |
| -[//]: # (title: Extension properties API) |
| 1 | +[//]: # (title: Extension Properties API) |
2 | 2 |
|
3 | 3 | <!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.ApiLevels-->
|
4 | 4 |
|
5 |
| -When [`DataFrame`](DataFrame.md) is used within Jupyter/Kotlin Notebook or Datalore with the Kotlin Kernel, |
6 |
| -something special happens: |
7 |
| -After every cell execution, all new global variables of type DataFrame are analyzed and replaced |
8 |
| -with a typed [`DataFrame`](DataFrame.md) wrapper along with auto-generated extension properties for data access. |
9 |
| -For instance, say we run: |
10 |
| - |
11 |
| -<!---FUN extensionProperties1--> |
| 5 | +Auto-generated extension properties are the safest and easiest way to access columns in a [`DataFrame`](DataFrame.md). |
| 6 | +They are generated based on a [dataframe schema](schemas.md), |
| 7 | +with the name and type of properties inferred from the name and type of the corresponding columns. |
12 | 8 |
|
| 9 | +Having these, it allows you to work with your dataframe like: |
13 | 10 | ```kotlin
|
14 |
| -val df /* : AnyFrame */ = DataFrame.read("titanic.csv") |
| 11 | +val peopleDf /* : DataFrame<Person> */ = DataFrame.read("people.csv").cast<Person>() |
| 12 | +val nameColumn /* : DataColumn<String> */ = peopleDf.name |
| 13 | +val ageColumn /* : DataColumn<Int> */ = peopleDf.personData.age |
15 | 14 | ```
|
16 |
| - |
17 |
| -<!---END--> |
18 |
| - |
19 |
| - |
20 |
| -<dataFrame src="manual/extensionPropertiesApi1.html"/> |
21 |
| - |
22 |
| - |
23 |
| -In normal Kotlin code, we would now have a variable of type [`AnyFrame` (=`DataFrame<*>`)](DataFrame.md) that doesn't |
24 |
| -have any |
25 |
| -extension properties to access its columns. We would either have to define them manually or use the |
26 |
| -[`@DataSchema`](schemas.md) annotation to [generate them](schemasGradle.md#configuration). |
27 |
| - |
28 |
| -By contrast, after this cell is run in a notebook, the columns of the dataframe are used as a basis |
29 |
| -to generate a hidden `@DataSchema interface TypeX`, |
30 |
| -along with extension properties like `val DataFrame<TypeX>.age` etc. |
31 |
| -Next, the `df` variable is shadowed by a new version cast to `DataFrame<TypeX>`. |
32 |
| - |
33 |
| -As a result, now columns can be accessed directly on `df`! |
34 |
| - |
35 |
| -<!---FUN extensionProperties2--> |
36 |
| - |
| 15 | +and of course |
37 | 16 | ```kotlin
|
38 |
| -df.add("lastName") { name.split(",").last() } |
39 |
| - .dropNulls { age } |
40 |
| - .filter { survived && home.endsWith("NY") && age in 10..20 } |
| 17 | +peopleDf.add("lastName") { name.split(",").last() } |
| 18 | + .dropNulls { personData.age } |
| 19 | + .filter { survived && home.endsWith("NY") && personData.age in 10..20 } |
41 | 20 | ```
|
42 | 21 |
|
43 |
| -<!---END--> |
44 |
| - |
45 |
| -The `titanic.csv` file could be found [here](https://github.com/Kotlin/dataframe/blob/master/data/titanic.csv). |
46 |
| - |
47 |
| -Extension properties can be generated in IntelliJ IDEA using |
48 |
| -the [Kotlin Dataframe Gradle plugin](schemasGradle.md#configuration). |
49 |
| - |
50 |
| -<warning> |
51 |
| -In notebooks generated properties won't appear and be updated until the cell has been executed. |
52 |
| -It often means that you have to introduce new variable frequently to sync extension properties with actual schema. |
53 |
| -</warning> |
| 22 | +To find out how to use this API in your environment, check out [Working with Data Schemas](schemas.md) |
| 23 | +or jump straight to [Data Schemas in Gradle projects](schemasGradle.md), |
| 24 | +or [Data Schemas in Jupyter notebooks](schemasJupyter.md). |
0 commit comments