Skip to content

Commit db85e02

Browse files
authored
Extracted subchapters (#391)
* Splitting the page and add to the navigation bar * Updated the pages * Extracted one subchapter
1 parent 8a36774 commit db85e02

12 files changed

+344
-238
lines changed

docs/StardustDocs/d.tree

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,13 @@
1616
</toc-element>
1717
<toc-element topic="hierarchical.md"/>
1818
<toc-element topic="schemas.md">
19-
<toc-element topic="gradle.md"/>
19+
<toc-element topic="schemasGradle.md"/>
20+
<toc-element topic="schemasJupyter.md"/>
21+
<toc-element topic="schemasInheritance.md"/>
22+
<toc-element topic="schemasCustom.md"/>
23+
<toc-element topic="schemasExternalJupyter.md"/>
24+
<toc-element topic="schemasImportOpenApiGradle.md"/>
25+
<toc-element topic="schemasImportOpenApiJupyter.md"/>
2026
</toc-element>
2127
</toc-element>
2228
<toc-element topic="installation.md"/>

docs/StardustDocs/topics/collectionsInterop.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ val df2 = df.add("c") { a + b }
6969

7070
<tip>
7171

72-
To enable extension properties generation you should use [dataframe plugin](gradle.md)
72+
To enable extension properties generation you should use [dataframe plugin](schemasGradle.md)
7373
for Gradle or [Kotlin jupyter kernel](installation.md)
7474

7575
</tip>

docs/StardustDocs/topics/extensionPropertiesApi.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ In notebooks, extension properties are generated for [`DataSchema`](schemas.md)
3232
instance after REPL line execution.
3333
After that [`DataFrame`](DataFrame.md) variable is typed with its own [`DataSchema`](schemas.md), so only valid extension properties corresponding to actual columns in DataFrame will be allowed by the compiler and suggested by completion.
3434

35-
Extension properties can be generated in IntelliJ IDEA using the [Kotlin Dataframe Gradle plugin](gradle.md#configuration).
35+
Extension properties can be generated in IntelliJ IDEA using the [Kotlin Dataframe Gradle plugin](schemasGradle.md#configuration).
3636

3737
<warning>
3838
In notebooks generated properties won't appear and be updated until the cell has been executed. It often means that you have to introduce new variable frequently to sync extension properties with actual schema

docs/StardustDocs/topics/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,7 @@ tasks.withType(org.jetbrains.kotlin.gradle.tasks.KotlinCompile).configureEach {
156156
</tabs>
157157

158158
Note that it's better to use the same version for a library and plugin to avoid unpredictable errors.
159-
After plugin configuration you can try it out with [example](gradle.md#annotation-processing).
159+
After plugin configuration you can try it out with [example](schemasGradle.md#annotation-processing).
160160

161161
### Custom configuration
162162

docs/StardustDocs/topics/schemas.md

Lines changed: 19 additions & 234 deletions
Original file line numberDiff line numberDiff line change
@@ -11,248 +11,33 @@ It ignores order of columns in [`DataFrame`](DataFrame.md), but tracks column hi
1111

1212
In Jupyter environment compile-time [`DataFrame`](DataFrame.md) schema is synchronized with real-time data after every cell execution.
1313

14-
In IDEA projects, you can use the [Gradle plugin](gradle.md#configuration) to extract schema from the dataset
14+
In IDEA projects, you can use the [Gradle plugin](schemasGradle.md#configuration) to extract schema from the dataset
1515
and generate extension properties.
1616

17-
## DataSchema workflow in Jupyter
1817

19-
After execution of cell
18+
## Popular use cases with Data Schemas
2019

21-
<!---FUN createDfNullable-->
20+
Here's a list of the most popular use cases with Data Schemas.
2221

23-
```kotlin
24-
val df = dataFrameOf("name", "age")(
25-
"Alice", 15,
26-
"Bob", null
27-
)
28-
```
22+
* [**Data Schemas in Gradle projects**](schemasGradle.md) <br/>
23+
If you are developing a server application and building it with Gradle.
2924

30-
<!---END-->
25+
* [**DataSchema workflow in Jupyter**](schemasJupyter.md) <br/>
26+
If you prefer Notebooks.
3127

32-
the following actions take place:
28+
* [**Schema inheritance**](schemasInheritance.md) <br/>
29+
It's worth knowing how to reuse Data Schemas generated earlier.
3330

34-
1. Columns in `df` are analyzed to extract data schema
35-
2. Empty interface with [`DataSchema`](schema.md) annotation is generated:
31+
* [**Custom Data Schemas**](schemasCustom.md) <br/>
32+
Sometimes it is necessary to create your own scheme.
3633

37-
```kotlin
38-
@DataSchema
39-
interface DataFrameType
40-
```
34+
* [**Use external Data Schemas in Jupyter**](schemasExternalJupyter.md) <br/>
35+
Sometimes it is convenient to extract reusable code from Jupyter Notebook into the Kotlin JVM library.
36+
Schema interfaces should also be extracted if this code uses Custom Data Schemas.
4137

42-
3. Extension properties for this [`DataSchema`](schema.md) are generated:
38+
* [**Import OpenAPI Schemas in Gradle project**](schemasImportOpenApiGradle.md) <br/>
39+
When you need to take data from the endpoint with OpenAPI Schema.
4340

44-
```kotlin
45-
val ColumnsContainer<DataFrameType>.age: DataColumn<Int?> @JvmName("DataFrameType_age") get() = this["age"] as DataColumn<Int?>
46-
val DataRow<DataFrameType>.age: Int? @JvmName("DataFrameType_age") get() = this["age"] as Int?
47-
val ColumnsContainer<DataFrameType>.name: DataColumn<String> @JvmName("DataFrameType_name") get() = this["name"] as DataColumn<String>
48-
val DataRow<DataFrameType>.name: String @JvmName("DataFrameType_name") get() = this["name"] as String
49-
```
50-
51-
Every column produces two extension properties:
52-
53-
* Property for `ColumnsContainer<DataFrameType>` returns column
54-
* Property for `DataRow<DataFrameType>` returns cell value
55-
56-
4. `df` variable is typed by schema interface:
57-
58-
```kotlin
59-
val temp = df
60-
```
61-
62-
```kotlin
63-
val df = temp.cast<DataFrameType>()
64-
```
65-
66-
> _Note, that object instance after casting remains the same. See [cast](cast.md).
67-
68-
To log all these additional code executions, use cell magic
69-
70-
```
71-
%trackExecution -all
72-
```
73-
74-
## Schema inheritance
75-
76-
In order to reduce amount of generated code, previously generated [`DataSchema`](schema.md) interfaces are reused and only new
77-
properties are introduced
78-
79-
Let's filter out all `null` values from `age` column and add one more column of type `Boolean`:
80-
81-
```kotlin
82-
val filtered = df.filter { age != null }.add("isAdult") { age!! > 18 }
83-
```
84-
85-
New schema interface for `filtered` variable will be derived from previously generated `DataFrameType`:
86-
87-
```kotlin
88-
@DataSchema
89-
interface DataFrameType1 : DataFrameType
90-
```
91-
92-
Extension properties for data access are generated only for new and overriden members of `DataFrameType1` interface:
93-
94-
```kotlin
95-
val ColumnsContainer<DataFrameType1>.age: DataColumn<Int> get() = this["age"] as DataColumn<Int>
96-
val DataRow<DataFrameType1>.age: Int get() = this["age"] as Int
97-
val ColumnsContainer<DataFrameType1>.isAdult: DataColumn<Boolean> get() = this["isAdult"] as DataColumn<Boolean>
98-
val DataRow<DataFrameType1>.isAdult: String get() = this["isAdult"] as Boolean
99-
```
100-
101-
Then variable `filtered` is cast to new interface:
102-
103-
```kotlin
104-
val temp = filtered
105-
```
106-
107-
```kotlin
108-
val filtered = temp.cast<DataFrameType1>()
109-
```
110-
111-
## Custom data schemas
112-
113-
You can define your own [`DataSchema`](schema.md) interfaces and use them in functions and classes to represent [`DataFrame`](DataFrame.md) with
114-
specific set of columns:
115-
116-
```kotlin
117-
@DataSchema
118-
interface Person {
119-
val name: String
120-
val age: Int
121-
}
122-
```
123-
124-
After execution of this cell in Jupyter or annotation processing in IDEA, extension properties for data access will be
125-
generated. Now we can use these properties to create functions for typed [`DataFrame`](DataFrame.md):
126-
127-
```kotlin
128-
fun DataFrame<Person>.splitName() = split { name }.by(",").into("firstName", "lastName")
129-
fun DataFrame<Person>.adults() = filter { age > 18 }
130-
```
131-
132-
In Jupyter these functions will work automatically for any [`DataFrame`](DataFrame.md) that matches `Person` schema:
133-
134-
<!---FUN extendedDf-->
135-
136-
```kotlin
137-
val df = dataFrameOf("name", "age", "weight")(
138-
"Merton, Alice", 15, 60.0,
139-
"Marley, Bob", 20, 73.5
140-
)
141-
```
142-
143-
<!---END-->
144-
145-
Schema of `df` is compatible with `Person`, so auto-generated schema interface will inherit from it:
146-
147-
```kotlin
148-
@DataSchema(isOpen = false)
149-
interface DataFrameType : Person
150-
151-
val ColumnsContainer<DataFrameType>.weight: DataColumn<Double> get() = this["weight"] as DataColumn<Double>
152-
val DataRow<DataFrameType>.weight: Double get() = this["weight"] as Double
153-
```
154-
155-
Despite `df` has additional column `weight`, previously defined functions for `DataFrame<Person>` will work for it:
156-
157-
<!---FUN splitNameWorks-->
158-
159-
```kotlin
160-
df.splitName()
161-
```
162-
163-
<!---END-->
164-
165-
```text
166-
firstName lastName age weight
167-
Merton Alice 15 60.000
168-
Marley Bob 20 73.125
169-
```
170-
171-
<!---FUN adultsWorks-->
172-
173-
```kotlin
174-
df.adults()
175-
```
176-
177-
<!---END-->
178-
179-
```text
180-
name age weight
181-
Marley, Bob 20 73.5
182-
```
183-
184-
In JVM project you will have to [cast](cast.md) [`DataFrame`](DataFrame.md) explicitly to the target interface:
185-
186-
```kotlin
187-
df.cast<Person>().splitName()
188-
```
189-
190-
## Use external data schemas in Jupyter
191-
192-
Sometimes it is convenient to extract reusable code from Jupyter notebook into Kotlin JVM library. If this code
193-
uses [Custom data schemas](#custom-data-schemas), schema interfaces should also be extracted. In order to enable support
194-
them in Jupyter, you should register them in
195-
library [integration class](https://github.com/Kotlin/kotlin-jupyter/blob/master/docs/libraries.md) with `useSchema`
196-
function:
197-
198-
```kotlin
199-
@DataSchema
200-
interface Person {
201-
val name: String
202-
val age: Int
203-
}
204-
205-
fun DataFrame<Person>.countAdults() = count { it[Person::age] > 18 }
206-
207-
@JupyterLibrary
208-
internal class Integration : JupyterIntegration() {
209-
210-
override fun Builder.onLoaded() {
211-
onLoaded {
212-
useSchema<Person>()
213-
}
214-
}
215-
}
216-
```
217-
218-
After loading this library into Jupyter notebook, schema interfaces for all [`DataFrame`](DataFrame.md) variables that match `Person`
219-
schema will derive from `Person`
220-
221-
<!---FUN createDf-->
222-
223-
```kotlin
224-
val df = dataFrameOf("name", "age")(
225-
"Alice", 15,
226-
"Bob", 20
227-
)
228-
```
229-
230-
<!---END-->
231-
232-
Now `df` is assignable to `DataFrame<Person>` and `countAdults` is available:
233-
234-
```kotlin
235-
df.countAdults()
236-
```
237-
238-
## Import Data Schemas, e.g. from OpenAPI, in Jupyter
239-
240-
Similar to [importing OpenAPI data schemas in Gradle projects](gradle.md#openapi-schemas), you can also
241-
do this in Jupyter notebooks. There is only a slight difference in notation:
242-
243-
Import the schema using any path (`String`), `URL`, or `File`:
244-
245-
```kotlin
246-
val PetStore = importDataSchema("https://petstore3.swagger.io/api/v3/openapi.json")
247-
```
248-
249-
and then from next cell you run and onwards, you can call, for example:
250-
251-
```kotlin
252-
val df = PetStore.Pet.readJson("https://petstore3.swagger.io/api/v3/pet/findByStatus?status=available")
253-
```
254-
255-
So, very similar indeed!
256-
257-
(Note: The type of `PetStore` will be generated as `PetStoreDataSchema`, but this doesn't affect the way you can use
258-
it.)
41+
* [**Import Data Schemas, e.g. from OpenAPI, in Jupyter**](schemasImportOpenApiJupyter.md) <br/>
42+
Similar to [importing OpenAPI Data Schemas in Gradle projects](schemasImportOpenApiGradle.md),
43+
you can also do this in Jupyter Notebooks.
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
[//]: # (title: Custom Data Schemas)
2+
3+
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->
4+
5+
You can define your own [`DataSchema`](schema.md) interfaces and use them in functions and classes to represent [`DataFrame`](DataFrame.md) with
6+
specific set of columns:
7+
8+
```kotlin
9+
@DataSchema
10+
interface Person {
11+
val name: String
12+
val age: Int
13+
}
14+
```
15+
16+
After execution of this cell in Jupyter or annotation processing in IDEA, extension properties for data access will be
17+
generated. Now we can use these properties to create functions for typed [`DataFrame`](DataFrame.md):
18+
19+
```kotlin
20+
fun DataFrame<Person>.splitName() = split { name }.by(",").into("firstName", "lastName")
21+
fun DataFrame<Person>.adults() = filter { age > 18 }
22+
```
23+
24+
In Jupyter these functions will work automatically for any [`DataFrame`](DataFrame.md) that matches `Person` schema:
25+
26+
<!---FUN extendedDf-->
27+
28+
```kotlin
29+
val df = dataFrameOf("name", "age", "weight")(
30+
"Merton, Alice", 15, 60.0,
31+
"Marley, Bob", 20, 73.5
32+
)
33+
```
34+
35+
<!---END-->
36+
37+
Schema of `df` is compatible with `Person`, so auto-generated schema interface will inherit from it:
38+
39+
```kotlin
40+
@DataSchema(isOpen = false)
41+
interface DataFrameType : Person
42+
43+
val ColumnsContainer<DataFrameType>.weight: DataColumn<Double> get() = this["weight"] as DataColumn<Double>
44+
val DataRow<DataFrameType>.weight: Double get() = this["weight"] as Double
45+
```
46+
47+
Despite `df` has additional column `weight`, previously defined functions for `DataFrame<Person>` will work for it:
48+
49+
<!---FUN splitNameWorks-->
50+
51+
```kotlin
52+
df.splitName()
53+
```
54+
55+
<!---END-->
56+
57+
```text
58+
firstName lastName age weight
59+
Merton Alice 15 60.000
60+
Marley Bob 20 73.125
61+
```
62+
63+
<!---FUN adultsWorks-->
64+
65+
```kotlin
66+
df.adults()
67+
```
68+
69+
<!---END-->
70+
71+
```text
72+
name age weight
73+
Marley, Bob 20 73.5
74+
```
75+
76+
In JVM project you will have to [cast](cast.md) [`DataFrame`](DataFrame.md) explicitly to the target interface:
77+
78+
```kotlin
79+
df.cast<Person>().splitName()
80+
```

0 commit comments

Comments
 (0)