improve extensionPropertiesApi.md

AndreiKingsley · AndreiKingsley · commit 71c360143a8e · 2025-06-11T18:25:59.000+04:00
diff --git a/docs/StardustDocs/resources/example.csv b/docs/StardustDocs/resources/example.csv
@@ -0,0 +1,3 @@
+name,info
+Alice,"{""age"":23,""height"":175.5}"
+Bob,"{""age"":27,""height"":160.2}"
diff --git a/docs/StardustDocs/topics/extensionPropertiesApi.md b/docs/StardustDocs/topics/extensionPropertiesApi.md
@@ -1,24 +1,161 @@
 [//]: # (title: Extension Properties API)
 
-<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.ApiLevels-->
-
-Auto-generated extension properties are the safest and easiest way to access columns in a [`DataFrame`](DataFrame.md).
-They are generated based on a [dataframe schema](schemas.md), 
+When working with a DataFrame, the most convenient and reliable way 
+to access its columns — including for operations and retrieving column values 
+in row expressions — is through auto-generated extension properties.
+They are generated based on a [dataframe schema](schemas.md),
 with the name and type of properties inferred from the name and type of the corresponding columns.
+It also works for all types of hierarchical dataframes
+
+> The behavior of data schema generation differs between the 
+> [Compiler Plugin](Compiler-Plugin.md) and [Kotlin Notebook](gettingStartedKotlinNotebook.md).
+>
+> * In the **Kotlin Notebook**, a schema is generated *only after cell execution* for 
+> `DataFrame` variables defined within that cell.
+> * With the **Compiler Plugin**, a new schema is generated *after every operation*
+> — but support for all operations is still in progress. 
+> Retrieving the schema for `DataFrame` read from a file or URL is *not yet supported* either.
+>
+> This behavior may change in future releases. See the [example](#example) below that demonstrates these differences.
+{style="warning"}
+
+## Example
+
+Consider 
+<resource src="example.csv"></resource>.
+This table consists of two columns: `name`, which is a `String` column, and `info`, 
+which is a **column group** containing two nested value columns — 
+`age` of type `Int`, and `height` of type `Double`.
+
+<table>
+  <thead>
+    <tr>
+      <th>name</th>
+      <th colspan="2">info</th>
+    </tr>
+    <tr>
+      <th></th>
+      <th>age</th>
+      <th>height</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>Alice</td>
+      <td>23</td>
+      <td>175.5</td>
+    </tr>
+    <tr>
+      <td>Bob</td>
+      <td>27</td>
+      <td>160.2</td>
+    </tr>
+  </tbody>
+</table>
+
+<tabs>
+<tab title="Kotlin Notebook">
+Read the `DataFrame` from the CSV file:
+
+```kotlin
+val df = DataFrame.readCsv("example.csv")
+```
+
+*After cell execution* data schema and extensions for this `DataFrame` will be generated 
+so you can use extensions for accessing columns, 
+using it in operations inside the [Column Selector DSL](ColumnSelectors.md) 
+and [DataRow API](DataRow.md):
+
+
+```kotlin
+// Get nested column
+df.info.age
+// Sort by multiple columns
+df.sortBy { name and info.height }
+// Filter rows using a row condition. 
+// These extensions express the exact value in the row 
+// with the corresponding type:
+df.filter { name.startsWith("A") && info.age >= 16 }
+```
+
+If you change DataFrame schema by changing any column [name](rename.md) 
+or [type](convert.md), or [add](add.md) a new one, you need to 
+run a cell with a new DataFrame declaration first. 
+For example, rename the "name" column into "firstName":
+
+```kotlin
+val dfRenamed = df.rename { name }.into("firstName")
+```
+
+After running the cell with the code above, you can use `firstName` extensions in the following cells:
 
-Having these, it allows you to work with your dataframe like:
 ```kotlin
-val peopleDf /* : DataFrame<Person> */ = DataFrame.read("people.csv").cast<Person>()
-val nameColumn /* : DataColumn<String> */ = peopleDf.name
-val ageColumn /* : DataColumn<Int> */ = peopleDf.personData.age
+dfRenamed.firstName
+dfRenamed.rename { firstName }.into("name")
+dfRenamed.filter { firstName == "Nikita" }
 ```
-and of course
+
+See [](quickstart.md) in the Kotlin Notebook with basic Extension Properties API examples.
+
+</tab>
+<tab title="Compiler Plugin">
+
+For now, if you read `DatFrame` from a file or URL, you need to define its schema manually. 
+You can do it fast with [`generate..()` methods](DataSchema-Data-Classes-Generation.md).
+
+Define schemas:
+```kotlin
+@DataSchema
+data class PersonInfo(
+    val age: Int,
+    val height: Float
+)
+
+@DataSchema
+data class Person(
+    val info: PersonInfo,
+    val name: String
+)
+```
+
+Read the `DataFrame` from the CSV file and specify the schema with `convertTo`:
+
+```kotlin
+val df = DataFrame.readCsv("example.csv").convertTo<Person>()
+```
+
+Extensions for this `DataFrame` will be generated automatically by plugin, 
+so you can use extensions for accessing columns, 
+using it in operations inside the [Column Selector DSL](ColumnSelectors.md)
+and [DataRow API](DataRow.md).
+
+
+```kotlin
+// Get nested column
+df.info.age
+// Sort by multiple columns
+df.sortBy { name and info.height }
+// Filter rows using a row condition. 
+// These extensions express the exact value in the row 
+// with the corresponding type:
+df.filter { name.startsWith("A") && info.age >= 16 }
+```
+
+Moreover, new extensions will be generated on-the-fly after each schema change: 
+by changing any column [name](rename.md)
+or [type](convert.md), or [add](add.md) a new one.
+For example, rename the "name" column into "firstName" and then we can use `firstName` extensions
+in the following operations:
+
 ```kotlin
-peopleDf.add("lastName") { name.split(",").last() }
-    .dropNulls { personData.age }
-    .filter { survived && home.endsWith("NY") && personData.age in 10..20 }
+// Rename "name" column into "firstName"
+df.rename { name }.into("firstName")
+    // Can use `firstName` extension in the row condition 
+    // right after renaming
+    .filter { firstName == "Nikita" }
 ```
 
-To find out how to use this API in your environment, check out [Working with Data Schemas](schemas.md)
-or jump straight to [Data Schemas in Gradle projects](schemasGradle.md), 
-or [Data Schemas in Jupyter notebooks](schemasJupyter.md).
+See [Kotlin DataFrame Compiler Plugin Example](https://github.com/Kotlin/dataframe/tree/plugin_example/examples/kotlin-dataframe-plugin-example) 
+IDEA project with basic Extension Properties API examples.
+</tab>
+</tabs>

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+name,info`
	`2`	`+Alice,"{""age"":23,""height"":175.5}"`
	`3`	`+Bob,"{""age"":27,""height"":160.2}"`